05-单词搜索

选择背景色：黄橙洋红淡粉水蓝草绿白色选择字体：宋体黑体微软雅黑楷体选择字体大小：小中大特恢复默认

3.4　单词搜索

单词搜索问题是一个填满了字母的网格，沿着行、列和对角线隐藏着一些单词。单词搜索问题的玩家要通过仔细扫描网格来找到隐藏的单词。找到位置放置这些单词使其正好能填入网格，这就是一种约束满足问题。变量就是单词，值域则是这些单词可能的位置。单词搜索问题如图3-4所示。

图3-4　经典的单词搜索问题可能出现在儿童益智图书中

为方便起见，这里的单词搜索问题将不包含重叠的单词。不妨作为习题对问题进行改进，以允许单词重叠。

这个单词搜索问题的网格与第2章的迷宫有点儿类似。代码清单3-11中有一些数据类型应该看起来很眼熟。

代码清单3-11　word_search.py

from typing import NamedTuple, List, Dict, Optional
from random import choice
from string import ascii_uppercase
from csp import CSP, Constraint
Grid = List[List[str]]  # type alias for grids
class GridLocation(NamedTuple):
    row: int
    column: int

一开始我们将用英文字母（ ascii_uppercase ）填充网格，还需要一个显示网格的函数。具体代码如代码清单3-12所示。

代码清单3-12　word_search.py（续）

def generate_grid(rows: int, columns: int) -> Grid:
    # initialize grid with random letters
    return [[choice(ascii_uppercase) for c in range(columns)] for r in range(rows)]
def display_grid(grid: Grid) -> None:
    for row in grid:
        print("".join(row))

为了将单词在网格中的位置标识出来，需要生成其值域。单词的值域是其全部字母可能放置的位置的列表的列表（ List[List[GridLocation]] ）。但是，单词不能任意放置，它们必须位于网格范围内的行、列或对角线上。换句话说，单词长度不能超过网格的边界。 generate_domain() 的目标就是为每个单词创建这些列表。具体代码如代码清单3-13所示。

代码清单3-13　word_search.py（续）

def generate_domain(word: str, grid: Grid) -> List[List[GridLocation]]:
   domain: List[List[GridLocation]] = []
   height: int = len(grid)
   width: int = len(grid[0])
   length: int = len(word)
   for row in range(height):
       for col in range(width):
           columns: range = range(col, col + length + 1)
           rows: range = range(row, row + length + 1)
           if col + length <= width:
               # left to right
               domain.append([GridLocation(row, c) for c in columns])
               # diagonal towards bottom right
               if row + length <= height:
                   domain.append([GridLocation(r, col + (r - row)) for r in rows])
           if row + length <= height:
               # top to bottom
               domain.append([GridLocation(r, col) for r in rows])
               # diagonal towards bottom left
               if col - length >= 0:
                   domain.append([GridLocation(r, col - (r - row)) for r in rows])
    return domain

对于单词可能的位置范围（沿着行、列或对角线），列表推导式用类的构造函数将范围转换为 GridLocation 的列表。因为 generate_domain() 对每个单词都会循环遍历从左上角到右下角的每一个网格位置，所以它会涉及大量的计算。请问你能想出一种更高效的方法吗？如果在循环中把长度相同的单词一次遍历完，又会怎样？

若要检查可能的解是否有效，必须为单词搜索问题定制约束。 WordSearchConstraint 的 satisfied() 方法只会检查为某个单词推荐的任何位置是否与为其他单词推荐的位置相同，这一点用 set 来实现。将 list 转换为 set 将移除所有重复项。如果从 list 转换而来的 set 中的数据项少于原 list 中的数据项，则表示原 list 中包含一些重复项。为了准备数据以进行此项检查，将用到稍微复杂一些的列表推导式，以便把赋值中每个单词的多个位置子列表组合为一个大的位置列表。具体代码如代码清单3-14所示。

代码清单3-14　word_search.py（续）

class WordSearchConstraint(Constraint[str, List[GridLocation]]):
    def __init__(self, words: List[str]) -> None:
      super().__init__(words)
      self.words: List[str] = words
    def satisfied(self, assignment: Dict[str, List[GridLocation]]) -> bool:
        # if there are any duplicates grid locations, then there is an overlap
        all_locations = [locs for values in assignment.values() for locs in values]
        return len(set(all_locations)) == len(all_locations)

一切就绪，现在可以运行了。在本例中，在9×9的网格中包含5个单词。这里求得的解应该包含每个单词与其字母在网格中的位置之间的映射关系。具体代码如代码清单3-15所示。

代码清单3-15　word_search.py（续）

if __name__ == "__main__":
    grid: Grid = generate_grid(9, 9)
    words: List[str] = ["MATTHEW", "JOE", "MARY", "SARAH", "SALLY"]
    locations: Dict[str, List[List[GridLocation]]] = {}
    for word in words:
        locations[word] = generate_domain(word, grid)
    csp: CSP[str, List[GridLocation]] = CSP(words, locations)
    csp.add_constraint(WordSearchConstraint(words))
    solution: Optional[Dict[str, List[GridLocation]]] = csp.backtracking_ search()
    if solution is None:
        print("No solution found!")
    else:
        for word, grid_locations in solution.items():
            # random reverse half the time
            if choice([True, False]):
                grid_locations.reverse()
            for index, letter in enumerate(word):
                (row, col) = (grid_locations[index].row, grid_ locations[index].column)
                grid[row][col] = letter
        display_grid(grid)

上述代码的底部有一处点睛之笔，就是用单词填充网格的语句。其中随机选取了几个单词并对它们做了逆序处理。因为此示例不允许单词重叠，所以这是合理的。最终的输出应如下所示。能找到Matthew、Joe、Mary、Sarah和Sally吗？

LWEHTTAMJ
MARYLISGO
DKOJYHAYE
IAJYHALAG
GYZJWRLGM
LLOTCAYIX
PEUTUSLKO
AJZYGIKDU
HSLZOFNNR

Previous 上一篇本节目录 Next 下一篇

05-单词搜索

3.4 单词搜索

3.4　单词搜索