jquery抓取网页内容有两种实现：+axios来网页

优采云发布时间: 2022-09-20 12:18

　　jquery抓取网页内容有两种实现：

　　1、继承webqueryinternet.gethtmlstream('js')的prequery方法，根据传入的代码请求页面内容。

　　2、第三方js库，把页面内容以html格式合成到js格式中，再合成回string。从而实现网页抓取功能。抓取js部分要抓取jquery源码，这个太难了，一般是要自己写脚本去解析，这样的效率并不高。现在可以用ajax实现，但ajax抓取只抓取js，对css提取没有办法。所以采用phantomjs+axios来抓取网页内容。

　　首先我们要先在浏览器中创建一个窗口，我将创建一个名叫phantomjs的浏览器，代码为constphantomjs=newwebdriver。phantomjs({profileurl:phantomjs。url。create('phantomjs'),pages:[],documentopen:function(request,response){console。log('请求'+request。url);}});然后在web浏览器中编写网页内容抓取方法。

　　1、网址采用get方法：constqueryobject={//获取本页所有的html元素//1个object.link结尾是指该内容是一个html元素。不支持长锚链接}constresult={};//数据url是指网页urlconstdataurl={//获取网页源码信息fileurl:'/webstorm/sharp_xml.js'}constmyheader='title'constresultheader={}//获取网页全部内容fromjson.node_env.array.map(constobj=newdate('y-m-d-1'),consttype='string',constpageslink={maxratio:type.getratio(),maxrows:0,maxly:0,alllocations:type.getratio(),origin:type.getratio(),relativeregion:0,followregion:1,origin:type.getratio(),dropload:type.getratio(),nocached:null,useragent:null,scrollfilename:'',//获取地址filename:'',//去掉隐藏的parent字段usercontent:'',//去掉默认图片字段pageurl:''})//获取第一页内容fromjson.node_env.array.map(constobj=newdate('y-m-d-1'),constclienturl='',constcontenturl='')//设置username=''exports.clienturl='/webstorm/sharp_xml.js'exports.clienturl=''exports.myheader=''exports.m。

0

2022-09-20

jquery抓取网页内容

0 个评论

要回复文章请先登录或注册

AI时代内容工厂

jquery抓取网页内容有两种实现：+axios来网页

0 个评论

发起人