當前位置:編程學習大全網 - 編程軟體 - java網絡爬蟲程序怎麽運行

java網絡爬蟲程序怎麽運行

用HTTPclient或者htmlunit工具包,他們都可以做爬蟲獲取網頁的工具。比如htmlunit,樓主可以這樣獲取網頁源碼:

import?com.gargoylesoftware.htmlunit.WebClient;

import?com.gargoylesoftware.htmlunit.html.HtmlPage;

import?com.gargoylesoftware.htmlunit.BrowserVersion;

import?com.gargoylesoftware.htmlunit.html.HtmlDivision;

import?com.gargoylesoftware.htmlunit.html.HtmlAnchor;

import?com.gargoylesoftware.htmlunit.*;

import?com.gargoylesoftware.htmlunit.WebClientOptions;

import?com.gargoylesoftware.htmlunit.html.HtmlInput;

import?com.gargoylesoftware.htmlunit.html.HtmlBody;

import?java.util.List;

public?class?helloHtmlUnit{

public?static?void?main(String[]?args)?throws?Exception{

String?str;

//創建壹個webclient

WebClient?webClient?=?new?WebClient();

//htmlunit?對css和javascript的支持不好,所以請關閉之

webClient.getOptions().setJavaScriptEnabled(false);

webClient.getOptions().setCssEnabled(false);

//獲取頁面

HtmlPage?page?=?webClient.getPage("/");

//獲取頁面的TITLE

str?=?page.getTitleText();

System.out.println(str);

//獲取頁面的XML代碼

str?=?page.asXml();

System.out.println(str);

//獲取頁面的文本

str?=?page.asText();

System.out.println(str);

//關閉webclient

webClient.closeAllWindows();

}

}

如果用HTTPclient,樓主可以百度它的教程,有本書叫做《自己動手寫網絡爬蟲》,裏面是以java語言為基礎講的,作為壹個爬蟲入門者可以去看看

  • 上一篇:做軟件測試工程師需要學習什麽課程?
  • 下一篇:樹葉壹年四季有什麽變化?
  • copyright 2024編程學習大全網