jquery抓取网页内容(jquery.js就可以抓取所有的站点名称名称代码。)
优采云 发布时间: 2021-12-06 02:02jquery抓取网页内容(jquery.js就可以抓取所有的站点名称名称代码。)
jquery抓取网页内容---popular-fullpage.js,关键是要修改名字。通过按位置搜索,获取网页中存在的站点页面名称或者代码中出现过的页面名称,再利用javascript正则表达式,获取页面代码,然后获取这些链接的href,最后将href解析出来,存入数据库中即可,实际上这种做法类似于requests库中的xpath的做法。popular-fullpage.js就可以抓取所有的站点名称代码。
html中可以得到结构化数据,网页中提供了一些javascript,去解析网页,并转换为json文件。
正则表达式解析,
这里popular_title=['hello','new','blog','blogspot','doc','document','docs','general','editor','imgdata','sticky','subtitle','tag','timeline','target','gravity','clipboard','contents','content','div','div-content','a','ul','li','span','span','p','table','table-all','l','w','li','span','cell','cell-content','cell-repeat','child','children','a','td','td-layout','a','a','span','p','img','img','img-src','img-target','img','span','span','l','r','r','r','span','p','bl','br','text','bu','span','span','col','ul','li','li','div','div-id','div-val','div-all','div','div-btn','el','el-col','id','li','a','td','td-layout','td-title','div-style','li','span','r','r','h1','ji','crlf','li','h3','ri','span','cr','span','cord','tr','td','td-input','td-text','td-span','td-layout','td-position','td-title','td-text','td-line','td-col','span','span','li','span','span','td-content','span','span','p','span','crlf','li','img','img-src','img-target','img-rel','img-position','tr','span','ji'。