600字范文,内容丰富有趣,生活中的好帮手!
600字范文 > Python之起点中文网爬虫

Python之起点中文网爬虫

时间:2023-05-25 05:59:12

相关推荐

Python之起点中文网爬虫

Python之起点中文网爬虫

注:请勿用于其他用途,仅供学习使用

import requestsimport reimport osfrom lxml import etreehead = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) ""Chrome/78.0.3904.97 Safari/537.36"}def get_page(book_id):"""获取章节字段"""# 所有章节接口https:/ajax/book/category?bookId=1013414929b_url = '/ajax/book/category?bookId='# 获取文章内容/chapter/9KwLON5H3DQKgXB091LLaA2/EftiSjrby1j6ItTi_ILQ7A2b_p_url = '/chapter/'# 小说名//*[@id="j_textWrap"]/div/div/h1# 章节名//div//h3/span/text()[1]# 内容//div//p/span/text()url = b_url + book_idtry:r = requests.get(url, headers=head)r.raise_for_status()r.encoding = 'utf-8'print('获取完成!。。。')# 章节详细页page_list = re.findall(r'{"uuid":\d+,"cN":".+?","uT":".+?","cnt":\d+,"cU":"(.+?)","id":\d+,"sS":\d}', r.text)# 创建小说文件夹n_url = '/info/' + book_idr1 = requests.get(n_url, headers=head)r1.raise_for_status()r1.encoding = 'utf-8'novel_name = etree.HTML(r1.text).xpath('/html/body/div/div[6]/div[1]/div[2]/h1/em/text()')[0]os.mkdir('./%s' % novel_name)# d_url = b_p_url + page_list[0]# r2 = requests.get(d_url, headers=head)# r2.encoding = 'utf-8'# ttt = etree.HTML(r2.text).xpath('//div//h3/span/text()[1]')[1]# j = '\n'# content = j.join(ttt)# print(content)# print(ttt)for each in page_list:d_url = b_p_url + eachtry:r2 = requests.get(d_url, header=head)r2.raise_for_status()r2.encoding = 'utf-8'# 内容ttt = etree.HTML(r2.text).xpath('//div[@class="read-content j_readContent"]/p/text()')j = '\n'content = j.join(ttt)# 章节名p_name = etree.HTML(r2.text).xpath('//div//h3/span/text()[1]')[1]# 写入文件with open('./%s/%s.txt' % (novel_name, p_name), 'w') as f:f.write(content)print("finish")except Exception as results:print(results)except Exception as result:print(result)def main():# u = input('将起点中文网小说主页链接粘贴此处(请以+结尾):')u = 'https:/ajax/book/category?bookId=1013414929'# 匹配出小说id:/info/1013414929#catlog来得到bookidb_id = re.findall(r'(\d+)', u)[0]get_page(b_id)if __name__ == '__main__':main()

是的,未经许可,禁止转载!

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。