菜谱网站内容( 有图翻页菜谱库)

优采云发布时间: 2022-01-10 16:04

　　菜谱网站内容(

有图翻页菜谱库)

　　search2015_cpitem

　　第 1 步：了解您需要使用的工具

　　1. requests 库：用于获取网页内容

　　2.BeautifulSoup 库：用于解析网页并提取想要的内容

　　3.selenium 库：Selenium 测试直接在浏览器中运行，就像真正的用户一样

　　第二步：代码说明

　　以Gourmet网站为例，第一步是获取页面中的所有网页链接

　　def each_page(html):

　　　# 传递进去网页信息，然后获取beautifulsoup解析对象。

soup = BeautifulSoup(html, 'lxml')

　　　# 在解析内容中寻找类为search2015_cpitem的字段

a = soup.find_all(class_='search2015_cpitem')

for li in a:

url.append(li.find('a').get('href'))

　　打开检查，可以看到源码中有很多带有search2015_cbitem类的标签。那是因为界面中有很多recipe，所以使用find_all()来获取，然后循环获取到的内容先获取tag，再获取href属性，就可以获取recipe链接。

　　第二步：翻页

　　从图中可以看出，大部分菜谱网站都不是一页的，所以需要翻页才能自动获取所有的url。首先需要查询网页的下一页信息，获取准确的按钮信息，然后才能成功翻页。

　　有图就知道源码中有下一页关键字，根据这个关键字翻页即可。#代码显示如下：

　　def next_page():

for i in fenlei:

browser = webdriver.Chrome()

browser.get(i)

while True:

if '下一页' in browser.page_source:

html = browser.page_source

each_page(html)

a = browser.find_element_by_link_text('下一页')

a.click()

continue

else:

# return urls

html = browser.page_source

each_page(html)

browser.close()

break

return url

　　代码主要内容是调用each_page函数获取所有页面的url。

　　当然，最后不仅要爬取url，还要输入url获取相关的数据内容。

　　def get_message(urls):

# tongjititle 菜谱名称

# tongjind 菜谱难度

# tongjiprsj 菜谱烹饪时间

# 用料

# 做法

s=''

l=''

shicaizhu=''

shicaifu=''

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0'}

response = requests.get(urls, headers=headers)

soup = BeautifulSoup(response.text, 'html.parser')

# 获取菜谱的名称

if soup.find(id='tongji_title')==None:

tongjititle=''

else:

tongjititle=soup.find(id='tongji_title').string

# 获取难度

if soup.find(id='tongji_nd')==None:

tongjind=''

else:

tongjind = soup.find(id='tongji_nd').string

# 获取烹饪口味

if soup.find(id='tongji_kw')==None:

tongjikw=''

else:

tongjikw = soup.find(id='tongji_kw').string

# 获取烹饪时间

if soup.find('li',class_='w270 bb0 br0')==None:

tongjiprsj=None

else:

tongjiprsj=soup.find('li',class_='w270 bb0 br0').contents[1].text

# print(tongjiprsj)

# 获取烹饪的食材

# 获取辅料

for fuliao in soup.find_all(class_='yl fuliao clearfix'):

shicaifu=fuliao.find(class_='clearfix')

for zhuliao in soup.find_all(class_='yl zl clearfix'):

shicaizhu=zhuliao.find(class_='clearfix')

# 获取烹饪步骤

for ls in soup.find_all(class_='content clearfix'):

l=l+ls.contents[1].string+ls.contents[3].text

l=l.replace('\n','')

if shicaifu=='':

if shicaizhu!='':

s=shicaizhu.text.replace('\n','')

elif shicaizhu=='':

s='没有食材'

else:

s=shicaizhu.text.replace('\n','')+shicaifu.text.replace('\n','')

return tongjititle,tongjind,tongjikw,tongjiprsj, s, l

　　该代码主要用于获取有关单个配方的信息。

　　主要功能只有这三个模块。你可以尝试实现它，当然如果你有一个好主意

　　交流QQ群：515458373

　　项目地址：%E7%BE%8E%E9%A3%9F%E6%9D%B0

0

2022-01-10

菜谱网站内容

0 个评论

要回复文章请先登录或注册

AI时代内容工厂

菜谱网站内容( 有图翻页菜谱库)

0 个评论

发起人

AI时代内容工厂

菜谱网站内容( 有图翻页菜谱库)

0 个评论

发起人

相关问题