java抓取网页数据(STM32的基本入门吧！（一） )

优采云发布时间: 2022-01-12 22:12

　　java抓取网页数据(STM32的基本入门吧！（一）

)

　　以前用python写爬虫，这次用的是java。虽然代码有点多，但是对于静态类型的语言代码提示还是舒服一点。获取网页源代码是爬虫的基本介绍。

　　我们使用 Apache 的 commons-httpclient 包进行爬取。需要三个包：commons-httpclient、commons-codec、commons-logging。使用maven，只需要添加以下依赖即可：

commons-httpclient

3.1

　　核心代码如下：

　　import org.apache.commons.httpclient.HttpClient;

import org.apache.commons.httpclient.methods.PostMethod;

import java.io.IOException;

public class Main {

public static String readUrl(String url) {

PostMethod method = new PostMethod(url);

String res = null;

try {

new HttpClient().executeMethod(method);

res = new String(method.getResponseBodyAsString().getBytes(), "utf8");

} catch (IOException e) {

e.printStackTrace();

}

return res;

}

public static void main(String[] args) {

System.out.println(readUrl("http://blog.zzkun.com"));

}

　　爬取这个博客网站，执行结果如下：

0

2022-01-12

java抓取网页数据

0 个评论

要回复文章请先登录或注册