js 抓取网页内容(爬虫如何获取执行完js后的html源文件如何查看 )

优采云发布时间: 2022-01-03 15:06

　　js 抓取网页内容(爬虫如何获取执行完js后的html源文件如何查看

)

　　爬虫执行js后如何获取html源文件？比如我在页面上点击查询后，会自动生成一张表来携带数据，但是在源文件上右击却无法查看JS生成的表。

　　可以使用 Firefox 调试。参考网址

　　可以看到生成的表格。但是看源文件，数字是看不到的。

　　在线【------解决方案--------

　　通过设置webBrowser的url，将得到的源码给webBrowser.Document，webBrowser.DocumentCompleted后，得到ebBrowser.Document应该就OK了。】

　　尝试按F12后，菜单【缓存】——【清除此域...】，发现问题解决了，可以得到js执行后的完整html数据。下次执行前一定要手动【Clear】，不然还是拿不到post-js的数据。于是，我找到了一个突破口，用代码清除了缓存，问题就解决了。

　　这个问题困扰了两天，终于找到了解决办法：

　　///

/// 针对js页面，获取页面内容。火狐的“查看元素”也可以获取。

///

private void PrintHelpPage()

{

// Create a WebBrowser instance.

WebBrowser webBrowserForPrinting = new WebBrowser();

// Add an event handler that prints the document after it loads.

webBrowserForPrinting.DocumentCompleted +=

new WebBrowserDocumentCompletedEventHandler(PrintDocument);

//删除缓存为关键一步，必须进行；不然得不到js执行后的数据

string cachePath = Environment.GetFolderPath(Environment.SpecialFolder.InternetCache);//获取缓存路径

DirectoryInfo di = new DirectoryInfo(cachePath);

foreach (FileInfo fi in di.GetFiles("*.*", SearchOption.AllDirectories))//遍历所有的文件夹删除里面的文件

{

try

{

fi.Delete();

}

catch { }

}

// Set the Url property to load the document.

webBrowserForPrinting.Url = new Uri("http://218.23.98.205:8080/aqi/components/aqi/explainDay.jsp");

}

private void PrintDocument(object sender, WebBrowserDocumentCompletedEventArgs e)

{

//MessageBox.Show("000");

//foreach (HtmlElement he in ((WebBrowser)sender).Document.GetElementById("sljaqi"))

//{

// //if (he.GetAttribute("classname") == "co_yl")

// //{

// // //然后网页信息格式，来分解出你要的信息。

// //}

// MessageBox.Show(he.OuterText);

// MessageBox.Show(he.Name);

//}

MessageBox.Show(((WebBrowser)sender).Document.GetElementById("sljaqi").InnerHtml);

// Print the document now that it is fully loaded.

//((WebBrowser)sender).Print();

// Dispose the WebBrowser now that the task is complete.

((WebBrowser)sender).Dispose();

}

0

2022-01-03

js 抓取网页内容

0 个评论

要回复文章请先登录或注册

AI时代内容工厂

js 抓取网页内容(爬虫如何获取执行完js后的html源文件如何查看 )

0 个评论

发起人

AI时代内容工厂

js 抓取网页内容(爬虫如何获取执行完js后的html源文件如何查看 )

0 个评论

发起人

相关问题