不用写代码,浏览器扩展神器webscraper 自动抓取数据
webscraper 是个数据抓取神器浏览器扩展,如果你不会代码可以用它来抓取网站数据,我之前写过文章不用写代码,Chrome 扩展神器 web scraper 抓取知乎热榜/话题/回答/专栏,豆瓣电影,比如抓取b站上木鱼水心的所有视频 ,你可以直接导入我的代码抓取。
{"_id":"bilibili_videos","startUrl":["https://space.bilibili.com/927587/video?tid=0&pn=[1-42:1]&keyword=&order=pubdate"],"selectors":[{"id":"row","parentSelectors":["_root"],"type":"SelectorElement","selector":"li.small-item","multiple":true},{"id":"视频标题","parentSelectors":["row"],"type":"SelectorText","selector":"a.title","multiple":false,"regex":""},{"id":"视频链接","parentSelectors":["row"],"type":"SelectorElementAttribute","selector":"a.cover","multiple":false,"extractAttribute":"href"},{"id":"视频封面","parentSelectors":["row"],"type":"SelectorElementAttribute","selector":"a.cover div.b-img picture img","multiple":false,"extractAttribute":"src"},{"id":"视频播放量","parentSelectors":["row"],"type":"SelectorText","selector":".play span","multiple":false,"regex":""},{"id":"视频长度","parentSelectors":["row"],"type":"SelectorText","selector":" a.cover span.length","multiple":false,"regex":""},{"id":"发布时间","parentSelectors":["row"],"type":"SelectorText","selector":"span.time","multiple":false,"regex":""}]}

导出的excel数据包含视频标题,链接,封面,播放量,长度,时间等,他从2013到2023年发布视频1200多个。

知乎的回答和文章也一样,理论上能在网页上看到的数据都可以抓取。
{"_id":"zhihu_zhuanlan","startUrl":["https://www.zhihu.com/people/zhi-shi-ku-21-42/posts?page=[1-4]"],"selectors":[{"id":"row","type":"SelectorElement","parentSelectors":["_root"],"selector":"div.List-item","multiple":true,"delay":0},{"id":"知乎标题","type":"SelectorText","parentSelectors":["row"],"selector":"h2.ContentItem-title","multiple":false,"regex":"","delay":0},{"id":"知乎链接","type":"SelectorElementAttribute","parentSelectors":["row"],"selector":"h2.ContentItem-title span a ","multiple":false,"extractAttribute":"href","delay":0}]}

再次更新:2023批量下载公众号文章内容/话题/图片/封面/视频/音频,导出文章pdf,文章数据含阅读数/点赞数/在看数/留言数
微博图床又搞事情不能用了,盘它,我顺便写了个微博图片/视频/内容/文章批量下载工具