抓取网页数据 php(利用phpxpath解析php网页中的数据获取方法进行web开发)

优采云发布时间: 2021-12-07 05:01

　　抓取网页数据php使用selenium方法进行web开发时，在使用seleniumdriver对网页进行抓取时，如果不依赖于某个模块或库，只使用命令行接口的话，代码比较繁琐，且抓取效率相对较低。因此我们在php的代码中，在抓取网页数据时，可以使用一些工具将数据保存起来，写入mysql数据库中。使用mysql抓取图片时可以获取图片的尺寸，边框等信息。

　　今天带来的文章是利用phpxpath解析php网页中的数据，获取我们想要的内容，如：链接，图片等，并保存下来，下面一起来看看：importrequestsfrombs4importbeautifulsoupfrompymysqlimportmysqliteimportjsonfromlxmlimportetreedefload_file(url):filename='./libai/ued.php'#文件的路径try:r=requests.get(url,headers=headers).text.encode('utf-8')#解码print('请求成功')exceptexceptionase:print('请求失败')returnnonedefmain():url=';cat=img'headers={'user-agent':'mozilla/5.0(windowsnt6.1;wow6。

　　4)applewebkit/537。36(khtml,likegecko)chrome/48。2802。132safari/537。36'}r=requests。get(url,headers=headers)。text。encode('utf-8')u=r。textimg=mysqlite。userdb。

　　myimage('image',connection='keep-alive')f=open(u,'w')foriinrange(。

　　3):f。write(u。read())f。write('\n')img=img。contents。replace('','。png')#图片文件名reg=etree。etree。html('/')try:print('请求成功')exceptexceptionase:print('请求失败')f。write(r'\n')f。

　　write(r。text)f。close()defload_img(url):img=img。contents。replace('','。png')img_delete=[img。index()foriinrange(1,len(img))]#删除图片列表,返回列表delete=[img。contents[0]foriinrange(1,len(img_delete))]foriinrange(1,len(img_delete)+。

　　1):img_delete[i]=img_delete[i].group

　　1)returnimg_deletereturnimg从上面的代码可以看出，链接，图片等都存放在目录中，解析出图片的路径存入数据库即可。

0

2021-12-07

抓取网页数据 php

0 个评论

要回复文章请先登录或注册

AI时代内容工厂

抓取网页数据 php(利用phpxpath解析php网页中的数据获取方法进行web开发)

0 个评论

发起人

AI时代内容工厂

抓取网页数据 php(利用phpxpath解析php网页中的数据获取方法进行web开发)

0 个评论

发起人

相关问题