bisect --- 数组二分查找算法_百度小程序

性能说明

当使用 bisect() 和 insort() 编写时间敏感的代码时，请记住以下概念。

二分法对于搜索一定范围的值是很高效的。对于定位特定的值，则字典的性能更好。
insort() 函数的时间复杂度为 O(n) 因为对数时间的搜索步骤被线性时间的插入步骤所主导。

这些搜索函数都是无状态的并且会在它们被使用后丢弃键函数的结果。因此，如果在一个循环中使用搜索函数，则键函数可能会在同一个数据元素上被反复调用。如果键函数速度不够快，请考虑用 functools.cache() 来包装它以避免重复计算。另外，也可以考虑搜索一个预先计算好的键数组来定位插入点（如下面的示例小节所演示的）。

参见

Sorted Collections 是一个使用 bisect 来管理数据的已排序多项集的高性能模块。

SortedCollection recipe 使用 bisect 构建了一个功能完整的多项集类，拥有直观的搜索方法和对键函数的支持。所有键函数都是预先计算好的以避免在搜索期间对键函数的不必要的调用。

搜索有序列表

上面的 bisect functions 对于找到插入点是有用的，但在一般的搜索任务中可能会有点尴尬。下面的五个函数展示了如何将其转换为针对有序列表的标准查找函数:

def index(a, x):
    'Locate the leftmost value exactly equal to x'
    i = bisect_left(a, x)
    if i != len(a) and a[i] == x:
        return i
    raise ValueError
def find_lt(a, x):
    'Find rightmost value less than x'
    i = bisect_left(a, x)
    if i:
        return a[i-1]
    raise ValueError
def find_le(a, x):
    'Find rightmost value less than or equal to x'
    i = bisect_right(a, x)
    if i:
        return a[i-1]
    raise ValueError
def find_gt(a, x):
    'Find leftmost value greater than x'
    i = bisect_right(a, x)
    if i != len(a):
        return a[i]
    raise ValueError
def find_ge(a, x):
    'Find leftmost item greater than or equal to x'
    i = bisect_left(a, x)
    if i != len(a):
        return a[i]
    raise ValueError

例子

bisect() 函数对于数字表查询也是适用的。这个例子使用 bisect() 根据一组有序的数字划分点来查找考试成绩对应的字母等级: (如) 90 及以上为 'A'，80 至 89 为 'B'，依此类推:

>>>

>>> def grade(score, breakpoints=[60, 70, 80, 90], grades='FDCBA'):
...     i = bisect(breakpoints, score)
...     return grades[i]
...
>>> [grade(score) for score in [33, 99, 77, 70, 89, 90, 100]]
['F', 'A', 'C', 'C', 'B', 'A', 'A']

bisect() 和 insort() 对于列表和元组也是适用的。 key 参数可以提取用于表中记录排序的字段:

>>>

>>> from collections import namedtuple
>>> from operator import attrgetter
>>> from bisect import bisect, insort
>>> from pprint import pprint
>>> Movie = namedtuple('Movie', ('name', 'released', 'director'))
>>> movies = [
...     Movie('Jaws', 1975, 'Spielberg'),
...     Movie('Titanic', 1997, 'Cameron'),
...     Movie('The Birds', 1963, 'Hitchcock'),
...     Movie('Aliens', 1986, 'Cameron')
... ]
>>> # Find the first movie released after 1960
>>> by_year = attrgetter('released')
>>> movies.sort(key=by_year)
>>> movies[bisect(movies, 1960, key=by_year)]
Movie(name='The Birds', released=1963, director='Hitchcock')
>>> # Insert a movie while maintaining sort order
>>> romance = Movie('Love Story', 1970, 'Hiller')
>>> insort(movies, romance, key=by_year)
>>> pprint(movies)
[Movie(name='The Birds', released=1963, director='Hitchcock'),
 Movie(name='Love Story', released=1970, director='Hiller'),
 Movie(name='Jaws', released=1975, director='Spielberg'),
 Movie(name='Aliens', released=1986, director='Cameron'),
 Movie(name='Titanic', released=1997, director='Cameron')]

如果键函数较为消耗资源，可以通过搜索一个预先计算的键列表来查找记录的索引以避免重复的函数调用:

>>>

>>> data = [('red', 5), ('blue', 1), ('yellow', 8), ('black', 0)]
>>> data.sort(key=lambda r: r[1])       # Or use operator.itemgetter(1).
>>> keys = [r[1] for r in data]         # Precompute a list of keys.
>>> data[bisect_left(keys, 0)]
('black', 0)
>>> data[bisect_left(keys, 1)]
('blue', 1)
>>> data[bisect_left(keys, 5)]
('red', 5)
>>> data[bisect_left(keys, 8)]
('yellow', 8)

上一篇：【数学建模】《实战数学建模：例题与讲解》第十讲-时间序列预测（含Matlab代码）