c 抓取网页数据(第一种自带的库（别人要求要用windows平台）直接使用libcurl)

优采云发布时间: 2021-10-26 23:04

　　最近帮忙做了一个小程序，获取指定网页的内容。事实上，这很容易做到。

　　第一种windows平台可以使用MFC自带的库（别人要求使用windows平台），使用libcurl配置比较麻烦，第二种linux平台直接使用强大的libcurl，linux是易于使用的 libcurl。

　　先说windows平台的第一种情况：我在网上找到了代码，使用了MFC库。需要修改多字节集才能在控制台下使用：

　　#include

int main()

{

CInternetSession session("HttpClient");

char * url = "http://www.baidu.com";

CHttpFile *pfile = (CHttpFile *)session.OpenURL(url);

DWORD dwStatusCode;

pfile->QueryInfoStatusCode(dwStatusCode);

if(dwStatusCode == HTTP_STATUS_OK)

{

CString content;

CString data;

while (pfile->ReadString(data))

{

content += data + "\r\n";

}

content.TrimRight();

printf(" %s\n ", content);

}

pfile->Close();

delete pfile;

session.Close();

return 0 ;

}

　　第二种情况，Linux下直接使用libcurl：

　　1.先在linux下下载libcurl，解压：

　　# wget https://curl.haxx.se/download/curl-7.54.0.tar.gz

# tar -zxf curl-7.54.0.tar.gz

　　2.进入解压目录并安装：

　　# cd curl-7.54.0/

# ./configure

# make

# make install

　　3.使用如下命令查看libcurl版本：

　　# curl --version

　　这是安装。

　　以下是简单的获取网页内容的方法：

　　test.cpp

　　#include

using namespace std;

static size_t downloadCallback(void *buffer, size_t sz, size_t nmemb, void *writer)

{

string* psResponse = (string*) writer;

size_t size = sz * nmemb;

psResponse->append((char*) buffer, size);

return sz * nmemb;

}

int main()

{

string strUrl = "http://www.baidu.com";

string strTmpStr;

CURL *curl = curl_easy_init();

curl_easy_setopt(curl, CURLOPT_URL, strUrl.c_str());

curl_easy_setopt(curl, CURLOPT_NOSIGNAL, 1L);

curl_easy_setopt(curl, CURLOPT_TIMEOUT, 2);

curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, downloadCallback);

curl_easy_setopt(curl, CURLOPT_WRITEDATA, &strTmpStr);

CURLcode res = curl_easy_perform(curl);

curl_easy_cleanup(curl);

string strRsp;

if (res != CURLE_OK)

{

strRsp = "error";

}

else

{

strRsp = strTmpStr;

}

printf("strRsp is |%s|\n", strRsp.c_str());

return 0;

}

　　使用如下命令编译运行：

　　# g++ -o http test.cpp -lcurl

# ./http

　　libcurl 很强大，这里只是获取网页信息，我刚开始工作时使用libcurl 下载、上传、可续传等功能。

0

2021-10-26

c 抓取网页数据

0 个评论

要回复文章请先登录或注册

AI时代内容工厂

c 抓取网页数据(第一种自带的库（别人要求要用windows平台）直接使用libcurl)

0 个评论

发起人

AI时代内容工厂

c 抓取网页数据(第一种自带的库（别人要求要用windows平台）直接使用libcurl)

0 个评论

发起人

相关问题