30-第4步_保存图像，找到前一张漫画

选择背景色：黄橙洋红淡粉水蓝草绿白色选择字体：宋体黑体微软雅黑楷体选择字体大小：小中大特恢复默认

第4步：保存图像，找到前一张漫画

让你的代码看起来像这样：

#! python3
# downloadXkcd.py - Downloads every single XKCD comic.
import requests, os, bs4
--snip--
 # Save the image to ./xkcd.
 imageFile = open(os.path.join('xkcd', os.path.basename(comicUrl)),'wb')
 for chunk in res.iter_content(100000):
 imageFile.write(chunk)
 imageFile.close()
 # Get the Prev button's url.
 prevLink = soup.select('a[rel="prev"]')[0]
 url = 'https://' + prevLink.get('href')
print('Done.')

这时，漫画的图像文件保存在变量 res 中。你需要将图像数据写入硬盘的文件。

你需要为本地的图像文件准备一个文件名，并将其传递给 open() 。 comicUrl 的值类似 'http://imgs.****/comics/heartbleed_explanation.png' 。你可能注意到，它看起来很像文件路径。实际上，调用 os.path.basename() 时传入 comicUrl ，它只返回URL的最后部分： 'heartbleed_explanation.png' 。当将图像保存到硬盘时，你可以用它作为文件名。用 os.path.join() 连接这个名称和 xkcd 文件夹的名称，这样程序就会在Windows操作系统下使用倒斜杠 （\） ，在macOS和Linux操作系统下使用正斜杠（/）。既然你最后得到了文件名，就可以调用 open() ，并用 'wb' （写二进制）模式打开一个新文件。

回忆一下之前的内容，保存利用 Requests 下载的文件时，你需要循环处理 iter_content() 方法的返回值。 for 循环中的代码将一段图像数据写入文件（每次最多10万字节），然后关闭该文件。图像现在保存到硬盘。

选择器 'a[rel="prev"]' 识别出 rel 属性中设置为 prev 的 <a> 元素，利用这个 <a> 元素的 href 属性可取得前一张漫画的URL，然后将它保存在 url 中。接着， while 循环针对这张漫画，再次开始整个下载过程。

这个程序的输出看起来像这样：

Downloading page https://...
Downloading image https:///comics/phone_alarm.png...
Downloading page https:///1358/...
Downloading image https://imgs./comics/nro.png...
Downloading page https:///1357/...
Downloading image https://imgs./comics/free_speech.png...
Downloading page https:///1356/...
Downloading mage https://imgs./comics/orbital_mechanics.png...
Downloading page https:///1355/...
Downloading image https://imgs./comics/airplane_message.png...
Downloading page https:///1354/...
Downloading image https://imgs./comics/heartbleed_explanation.png...
--snip--

这个项目是一个很好的例子，说明程序可以自动顺着链接从网络上抓取大量的数据。你可以从Beautiful Soup的文档了解它的更多功能。

Previous 上一篇本节目录 Next 下一篇