无规则采集器列表算法(使用C#采集网页：%E7%80%%E6% )

优采云发布时间: 2021-11-01 21:18

　　无规则采集器列表算法(使用C#采集网页：%E7%80%%E6%

)

　　使用 C#采集网页：%E7%AE%80%E7%A7%B0%E5%8F%98%E5%8A%A8%E6%97%A5%E6%98%AF2010%E5 %B9%B4%E4%BB%A5%E6%9D%A5&queryarea=

　　本来可以返回带有数据的html，可以是采集token值

　　来自 html

　　但现在只能返回：

“

window.location.href="http://search.10jqka.com.cn/stockpick/search?typed=1&preParams=&ts=1&f=1&qs=result_rewrite&selfsectsn=&querytype=stock&searchfilter=&tid=stockpick&w=%E7%AE%80%E7%A7%B0%E5%8F%98%E5%8A%A8%E6%97%A5%E6%98%AF2010%E5%B9%B4%E4%BB%A5%E6%9D%A5&queryarea=";

”

请问该问题怎么解决？

以下是我使用的方法，另外使用System.Net.WebClient方法返回为空。

public string GetMoths(string url, string WebCodeStr){

Encoding WebCode = Encoding.GetEncoding(WebCodeStr);

System.GC.Collect(); // 避免操作超时

HttpWebRequest wReq = (HttpWebRequest)WebRequest.Create(@url);

System.Net.ServicePointManager.DefaultConnectionLimit = 200;

wReq.KeepAlive = false;

wReq.UserAgent = @"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.0; .NET CLR 1.1.4322; .NET CLR 2.0.50215;)";

wReq.Method = "GET"; // HttpWebRequest.Method 属性获取或设置请求的方法。

wReq.Timeout = 30000; //设置页面超时时间为30秒

HttpWebResponse wResp = null;

try { wResp = (HttpWebResponse)wReq.GetResponse(); }

catch (WebException ex) { var e1=ex; return null; } //

Stream respStream = wResp.GetResponseStream();

//判断网页编码，如果判断编码和读取流不放在一个方法，使用StreamReader会出现无法读取流的错误

StreamReader reader = new StreamReader(respStream, WebCode);

string strWebHtml = reader.ReadToEnd(); // 从流的当前位置到末尾读取流。

respStream.Close();reader.Close();reader.Dispose();

if (wReq != null) { wReq.Abort(); wReq = null; }

if (wResp != null) { wResp.Close(); wResp.Dispose(); wResp = null;}

return strWebHtml;

}

0

2021-11-01

无规则采集器列表算法

0 个评论

要回复文章请先登录或注册

AI时代内容工厂

无规则采集器列表算法(使用C#采集网页：%E7%80%%E6% )

0 个评论

发起人

AI时代内容工厂

无规则采集器列表算法(使用C#采集网页：%E7%80%%E6% )

0 个评论

发起人

相关问题