抓取网页新闻(我想结合两个库（例如报纸和beautifulsoup4）的代码 )

优采云发布时间: 2022-01-06 00:09

　　抓取网页新闻(我想结合两个库（例如报纸和beautifulsoup4）的代码

)

　　我一直试图从新闻中获取新闻头条网站。为此，我遇到了两个 python 库，即 news 和 beautifulsoup4。使用美汤库，我已经能够从特定新闻网站中获取到新闻文章的所有链接。从下面的代码中，我已经能够从单个链接中提取新闻标题文章。

　　from newspaper import Article

url= "https://www.ndtv.com/india-news/tamil-nadu-government-reverses-decision-to-reopen-schools-from-november-16-for-classes-9-12-news-agency-pti-2324199"

article=Article(url)

article.download()

article.parse()

print(article.title)

　　我想把两个库的代码（比如newspaper和beautifulsoup4），这样我就可以把我从beautifulsoup库输出的所有链接放到报纸库的url命令中，得到所有的头条。链接。下面是beautfulsoup的代码，我可以从中提取新闻文章的所有链接。

　　from bs4 import BeautifulSoup

from bs4.dammit import EncodingDetector

import requests

parser = 'html.parser' # or 'lxml' (preferred) or 'html5lib', if installed

resp = requests.get("https://www.ndtv.com/coronavirus?pfrom=home-mainnavgation")

http_encoding = resp.encoding if 'charset' in resp.headers.get('content-type', '').lower() else None

html_encoding = EncodingDetector.find_declared_encoding(resp.content, is_html=True)

encoding = html_encoding or http_encoding

soup = BeautifulSoup(resp.content, parser, from_encoding=encoding)

for link in soup.find_all('a', href=True):

print(link['href'])

0

2022-01-06

抓取网页新闻

0 个评论

要回复文章请先登录或注册

AI时代内容工厂

抓取网页新闻(我想结合两个库（例如报纸和beautifulsoup4）的代码 )

0 个评论

发起人

AI时代内容工厂

抓取网页新闻(我想结合两个库（例如报纸和beautifulsoup4）的代码 )

0 个评论

发起人

相关问题