Python算法设计篇(2) Chapter 2: The basics

Tracey: I didn’t know you were out there.
Zoe: Sort of the point. Stealth—you may have heard of it.
Tracey: I don’t think they covered that in basic.
—— From “The Message,” episode 14 of Firefly

1.计算模型

RAM模型(random-access machine)：标准的单核计算机，它大致有下面三个性质

• We don’t have access to any form of concurrent execution; the machine simply executes one instruction after the other.

• Standard, basic operations (such as arithmetic, comparisons, and memory access) all take constant (although possibly different) amounts of time. There are no more complicated basic operations (such as sorting).

• One computer word (the size of a value that we can work with in constant time) is not unlimited but is big enough to address all the memory locations used to represent our problem, plus an extra percentage for our variables.

the notion of running time complexity (as described in the next section) is based on knowing how big a problem instance is, and that size is simply the amount of memory needed to encode it.
[算法的运行时间是基于问题的大小，这个大小是指问题的输入占用的内存空间大小]

2.算法渐近运行时间

$O$ = $\le$，$\Omega$ = $\ge$， $\Theta$ = $=$

image

3.算法性能评估的经验

(1)Tip 1: If possible, don’t worry about it.

(2)Tip 2: For timing things, use timeit.

#timeit模块简单使用实例
timeit.timeit("x = sum(range(10))")


(3)Tip 3: To find bottlenecks, use a profiler.

#cProfile模块简单使用实例
import cProfile
import re
cProfile.run('re.compile("foo|bar")')

#运行结果：
194 function calls (189 primitive calls) in 0.000 seconds
Ordered by: standard name
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
1    0.000    0.000    0.000    0.000 <string>:1(<module>)
1    0.000    0.000    0.000    0.000 re.py:188(compile)
1    0.000    0.000    0.000    0.000 re.py:226(_compile)
1    0.000    0.000    0.000    0.000 sre_compile.py:178(_compile_charset)
1    0.000    0.000    0.000    0.000 sre_compile.py:207(_optimize_charset)
...


(4)Tip 4: Plot your results.

image

(5)Tip 5: Be careful when drawing conclusions based on timing comparisons.

First, any differences you observe may be because of random variations.

Second, there are issues when comparing averages.

At the very least, you should stick to comparing averages of actual timings. A common practice to get more meaningful numbers when performing timing experiments is to normalize the running time of each program, dividing it by the running time of some standard, simple algorithm. This can indeed be useful but can in some cases make your results less than meaningful. See the paper “How not to lie with statistics: The correct way to summarize benchmark results” by Fleming and Wallace for a few pointers. For some other perspectives, you could read Bast and Weber’s “Don’t compare averages,” or the more recent paper by Citron et al., “The harmonic or geometric mean: does it really matter?”

Third, your conclusions may not generalize.

(6)Tip 6: Be careful when drawing conclusions about asymptotics from experiments.

If you want to say something conclusively about the asymptotic behavior of an algorithm, you need to analyze it, as described earlier in this chapter. Experiments can give you hints, but they are by their nature finite, and asymptotics deal with what happens for arbitrarily large data sizes. On the other hand, unless you’re working in theoretical computer science, the purpose of asymptotic analysis is to say something about the behavior of the algorithm when implemented and run on actual problem instances, meaning that experiments should be relevant.

4.在Python中实现树和图
[Python中的dict和set]
Python中很多地方都使用了hash策略，在前面的数据结构篇中的查找部分已经介绍了hash的内容。Python提供了hash函数，例如hash("Hello, world!")得到-943387004357456228 (结果不一定相同)。Python中的dict和set都使用了hash机制，所以平均情况下它们获取元素都是常数时间的。

(1)图的表示：最常用的两种表示方式是邻接表和邻接矩阵 [假设要表示的图如下]

image

① adjacency lists 表示形式

# A Straightforward Adjacency List Representation
a, b, c, d, e, f, g, h = range(8)
N = [
[b, c, d, e, f],    # a
[c, e],             # b
[d],                # c
[e],                # d
[f],                # e
[c, g, h],          # f
[f, h],             # g
[f, g]              # h
]

b in N[a] # Neighborhood membership -> True
len(N[f]) # Degree -> 3


② adjacency sets 表示形式

# A Straightforward Adjacency Set Representation
a, b, c, d, e, f, g, h = range(8)
N = [
{b, c, d, e, f},    # a
{c, e},             # b
{d},                # c
{e},                # d
{f},                # e
{c, g, h},          # f
{f, h},             # g
{f, g}              # h
]

b in N[a] # Neighborhood membership -> True
len(N[f]) # Degree -> 3


③ adjacency dicts 表示形式，这种情况下如果边是带权值的都没有问题！

# A Straightforward Adjacency Dict Representation
a, b, c, d, e, f, g, h = range(8)
N = [
{b:2, c:1, d:3, e:9, f:4},    # a
{c:4, e:3},                   # b
{d:8},                        # c
{e:7},                        # d
{f:5},                        # e
{c:2, g:2, h:2},              # f
{f:1, h:6},                   # g
{f:9, g:8}                    # h
]

b in N[a] # Neighborhood membership -> True
len(N[f]) # Degree -> 3
N[a][b] # Edge weight for (a, b) -> 2


`python
N = {
'a': set('bcdef'),
'b': set('ce'),
'c': set('d'),
'd': set('e'),
'e': set('f'),
'f': set('cgh'),
'g': set('fh'),
'h': set('fg')