09-查找最小生成树

选择背景色：黄橙洋红淡粉水蓝草绿白色选择字体：宋体黑体微软雅黑楷体选择字体大小：小中大特恢复默认

4.4.2　查找最小生成树

树是一种特殊的图，它在任意两个顶点之间只存在一条路径，这意味着树中没有环路（cycle），有时被称为无环（acyclic）。环路可以被视作循环。如果可以从一个起始顶点开始遍历图，不会重复经过任何边，并返回到起始顶点，则称存在一条环路。任何不是树的图都可以通过修剪边而成为树。图4-6演示了通过修剪边把图转换为树的过程。

图4-6　在左图中，在顶点B、C和D之间存在一个环路，因此它不是树。在右图中，连通C和D的边已被修剪掉了，因此它是一棵树

连通图（connected graph）是指从图的任一顶点都能以某种路径到达其他任何顶点的图。本章中的所有图都是连通图。生成树（spanning tree）是把图所有顶点都连接起来的树。最小生成树（minimum spanning tree）是以最小总权重把加权图的每个顶点都连接起来的树（相对于其他的生成树而言）。对于每张加权图，我们都能高效地找到其最小生成树。

这里出现了一大堆术语！“查找最小生成树”和“以权重最小的方式连接加权图中的所有顶点”的意思相同，这是关键点。对任何设计网络（交通网络、计算机网络等）的人来说，这都是一个重要而实际的问题：如何能以最低的成本连接网络中的所有节点呢？这里的成本可能是电线、轨道、道路或其他任何东西。以电话网络来说，这个问题的另一种提法就是：连通每个电话机所需要的最短电缆长度是多少？

1．重温优先队列

优先队列在第2章中已经介绍过了。Jarník算法将需要用到优先队列。我们可以从第2章的程序包中导入 PriorityQueue 类，要获得详情请参阅紧挨着代码清单4-5之前的注意事项，也可以把该类复制为一个新文件并放入本章的程序包中。为完整起见，在代码清单4-9中，我们将重新创建第2章中的 PriorityQueue ，这里假定 import 语句会被放入单独的文件中。

代码清单4-9　priority_queue.py

from typing import TypeVar, Generic, List
from heapq import heappush, heappop
T = TypeVar('T')
class PriorityQueue(Generic[T]):
    def __init__(self) -> None:
         self._container: List[T] = []
    @property
    def empty(self) -> bool:
        return not self._container  # not is true for empty container
    def push(self, item: T) -> None:
        heappush(self._container, item)  # in by priority
    def pop(self) -> T:
        return heappop(self._container)  # out by priority
    def __repr__(self) -> str:
        return repr(self._container)

2．计算加权路径的总权重

在开发查找最小生成树的方法之前，我们需要开发一个用于检测某个解的总权重的函数。最小生成树问题的解将由组成树的加权边列表构成。首先，我们会将 WeightedPath 定义为 WeightedEdge 的列表，然后会定义一个 total_weight() 函数，该函数以 WeightedPath 的列表为参数并把所有边的权重相加，以便得到总权重。具体代码如代码清单4-10所示。

代码清单4-10　mst.py

from typing import TypeVar, List, Optional
from weighted_graph import WeightedGraph
from weighted_edge import WeightedEdge
from priority_queue import PriorityQueue
V = TypeVar('V') # type of the vertices in the graph
WeightedPath = List[WeightedEdge] # type alias for paths
def total_weight(wp: WeightedPath) -> float:
    return sum([e.weight for e in wp])

3．Jarník算法

查找最小生成树的Jarník算法把图分为两部分：正在生成的最小生成树的顶点和尚未加入最小生成树的顶点。其工作步骤如下所示。

（1）选择要被包含于最小生成树中的任一顶点。

（2）找到连通最小生成树与尚未加入树的顶点的权重最小的边。

（3）将权重最小边末端的顶点添加到最小生成树中。

（4）重复第2步和第3步，直到图中的每个顶点都加入了最小生成树。

注意　Jarník算法常被称为Prim算法。在20世纪20年代末，两位捷克数学家OtakarBorůvka和VojtěchJarník致力于尽量降低铺设电线的成本，提出了解决最小生成树问题的算法。他们提出的算法在几十年后又被其他人“重新发现”^[3]。

为了高效地运行Jarník算法，需要用到优先队列。每次将新的顶点加入最小生成树时，所有连接到树外顶点的出边都会被加入优先队列中。从优先队列中弹出的一定是权重最小的边，算法将持续运行直至优先队列为空为止。这样就确保了权重最小的边一定会优先加入树中。如果被弹出的边与树中的已有顶点相连，则它将被忽略。

代码清单4-11中的 mst() 完整实现了Jarník算法^[4]，它还带了一个用来打印 WeightedPath 的实用函数。

警告　Jarník算法在有向图中不一定能正常工作，它也不适用于非连通图。

代码清单4-11　mst.py（续）

def mst(wg: WeightedGraph[V], start: int = 0) -> Optional[WeightedPath]:
    if start > (wg.vertex_count - 1) or start < 0:
        return None
    result: WeightedPath = [] # holds the final MST
    pq: PriorityQueue[WeightedEdge] = PriorityQueue()
    visited: [bool] = [False] * wg.vertex_count # where we've been
    def visit(index: int):
        visited[index] = True # mark as visited
        for edge in wg.edges_for_index(index): 
            # add all edges coming from here to pq
            if not visited[edge.v]:
                pq.push(edge)
    visit(start) # the first vertex is where everything begins
    while not pq.empty: # keep going while there are edges to process
        edge = pq.pop()
        if visited[edge.v]:
            continue # don't ever revisit
        # this is the current smallest, so add it to solution
        result.append(edge) 
        visit(edge.v) # visit where this connects
    return result
def print_weighted_path(wg: WeightedGraph, wp: WeightedPath) -> None:
    for edge in wp:
        print(f"{wg.vertex_at(edge.u)} {edge.weight}> {wg.vertex_at(edge.v)}")
    print(f"Total Weight: {total_weight(wp)}")

下面逐行过一遍 mst() 。

def mst(wg: WeightedGraph[V], start: int = 0) -> Optional[WeightedPath]:
    if start > (wg.vertex_count - 1) or start < 0:
        return None

本算法将返回某一个代表最小生成树的 WeightedPath 对象。运算本算法的起始位置无关紧要（假定图是连通和无向的），因此默认设为索引为0的顶点。如果 start 无效，则 mst() 返回 None 。

result: WeightedPath = [] # holds the final MST
pq: PriorityQueue[WeightedEdge] = PriorityQueue()
visited: [bool] = [False] * wg.vertex_count # where we've been

result 将是最终存放加权路径的地方，也即包含了最小生成树。随着权重最小的边不断被弹出以及图中新的区域不断被遍历， WeightedEdge 会不断被添加到 result 中。因为Jarník算法总是选择权重最小的边，所以被视为贪婪算法（greedy algorithm）之一。 pq 用于存储新发现的边并弹出次低权重的边。 visited 用于记录已经到过的顶点索引，这用 Set 也可以实现，类似于 bfs() 中的 explored 。

def visit(index: int):
    visited[index] = True # mark as visited
    for edge in wg.edges_for_index(index): 
        # add all edges coming from here
        if not visited[edge.v]:
            pq.push(edge)

visit() 是一个便于内部使用的函数，用于把顶点标记为已访问，并把尚未访问过的顶点所连的边都加入 pq 中。不妨注意一下，使用邻接表模型能够轻松地找到属于某个顶点的边。

visit(start) # the first vertex is where everything begins

除非图是非连通的，否则先访问哪个顶点是无所谓的。如果图是非连通的，是由多个不相连的部分组成的，那么 mst() 返回的树只会涵盖图的某一部分，也就是起始节点所属的那部分图。

while not pq.empty: # keep going while there are edges to process
    edge = pq.pop()
    if visited[edge.v]:
        continue # don't ever revisit
    # this is the current smallest, so add it
    result.append(edge) 
    visit(edge.v) # visit where this connects
return result

只要优先队列中还有边存在，我们就将它们弹出并检查它们是否会引出尚未加入树的顶点。因为优先队列是以升序排列的，所以会先弹出权重最小的边。这就确保了结果确实具有最小总权重。如果弹出的边不会引出未探索过的顶点，那么就会被忽略，否则，因为该条边是目前为止权重最小的边，所以会被添加到结果集中，并且对其引出的新顶点进行探索。如果已没有边可供探索了，则返回结果。

最后再回到用轨道最少的超级高铁网络连接美国15个最大的MSA的问题吧。结果路径就是 city_graph2 的最小生成树。下面尝试对 city_graph2 运行一下 mst() ，具体代码如代码清单4-12所示。

代码清单4-12　mst.py（续）

if __name__ == "__main__":
    city_graph2: WeightedGraph[str] = WeightedGraph(["Seattle", "San Francisco", "Los 
       Angeles", "Riverside", "Phoenix", "Chicago", "Boston", "New York", "Atlanta",
      "Miami", "Dallas", "Houston", "Detroit", "Philadelphia", "Washington"])
    city_graph2.add_edge_by_vertices("Seattle", "Chicago", 1737)
    city_graph2.add_edge_by_vertices("Seattle", "San Francisco", 678)
    city_graph2.add_edge_by_vertices("San Francisco", "Riverside", 386)
    city_graph2.add_edge_by_vertices("San Francisco", "Los Angeles", 348)
    city_graph2.add_edge_by_vertices("Los Angeles", "Riverside", 50)
    city_graph2.add_edge_by_vertices("Los Angeles", "Phoenix", 357)
    city_graph2.add_edge_by_vertices("Riverside", "Phoenix", 307)
    city_graph2.add_edge_by_vertices("Riverside", "Chicago", 1704)
    city_graph2.add_edge_by_vertices("Phoenix", "Dallas", 887)
    city_graph2.add_edge_by_vertices("Phoenix", "Houston", 1015)
    city_graph2.add_edge_by_vertices("Dallas", "Chicago", 805)
    city_graph2.add_edge_by_vertices("Dallas", "Atlanta", 721)
    city_graph2.add_edge_by_vertices("Dallas", "Houston", 225)
    city_graph2.add_edge_by_vertices("Houston", "Atlanta", 702)
    city_graph2.add_edge_by_vertices("Houston", "Miami", 968)
    city_graph2.add_edge_by_vertices("Atlanta", "Chicago", 588)
    city_graph2.add_edge_by_vertices("Atlanta", "Washington", 543)
    city_graph2.add_edge_by_vertices("Atlanta", "Miami", 604)
    city_graph2.add_edge_by_vertices("Miami", "Washington", 923)
    city_graph2.add_edge_by_vertices("Chicago", "Detroit", 238)
    city_graph2.add_edge_by_vertices("Detroit", "Boston", 613)
    city_graph2.add_edge_by_vertices("Detroit", "Washington", 396)
    city_graph2.add_edge_by_vertices("Detroit", "New York", 482)
    city_graph2.add_edge_by_vertices("Boston", "New York", 190)
    city_graph2.add_edge_by_vertices("New York", "Philadelphia", 81)
    city_graph2.add_edge_by_vertices("Philadelphia", "Washington", 123)
    result: Optional[WeightedPath] = mst(city_graph2)
    if result is None:
        print("No solution found!")
    else:
        print_weighted_path(city_graph2, result)

好在有美观打印方法 printWeightedPath() ，最小生成树的可读性很不错。

Seattle 678> San Francisco
San Francisco 348> Los Angeles
Los Angeles 50> Riverside
Riverside 307> Phoenix
Phoenix 887> Dallas
Dallas 225> Houston
Houston 702> Atlanta
Atlanta 543> Washington
Washington 123> Philadelphia
Philadelphia 81> New York
New York 190> Boston
Washington 396> Detroit
Detroit 238> Chicago
Atlanta 604> Miami
Total Weight: 5372

换句话说，这是加权图中连通所有MSA的总边长最短的组合，至少需要轨道8645 km。图4-7呈现了这棵最小生成树。

图4-7　高亮的边代表连通全部15个MSA的最小生成树

Previous 上一篇本节目录 Next 下一篇

09-查找最小生成树

4.4.2 查找最小生成树

1．重温优先队列

2．计算加权路径的总权重

3．Jarník算法

4.4.2　查找最小生成树