#?-*-?編碼:utf-8?-*-
進口?要求
進口?urlparse
進口?操作系統(Operating System)
從哪裏?bs4?進口?美麗的聲音
def?流程(url):
標題?=?{ '內容類型':?'應用程序/json ',
用戶代理':Mozilla/5.0?(x 11;?Ubuntu?Linux?x86 _ 64?rv:22.0)?壁虎/20100101?Firefox/22.0'}
pageSourse?=requests.get(url,標題=標題)。文本
page _湯?=?BeautifulSoup(頁面源)
a_all?=?page_soup.findAll("a ")
link_urls=[i.get('href ')?為了什麽?我?在?A_all]#有些是javascript觸發事件,過濾方法寫下來。
img_all?=?page_soup.findAll("img ")
img_urls=[i.get("src ")?為了什麽?我?在?img_all]
打印?鏈接網址,img網址
回歸?(link_urls,?img _ urls)
進程("")的結果如下:
[u'/',?u ' JavaScript:;' ,?u ' JavaScript:;',?u ' JavaScript:;',?u'/',?u ' JavaScript:;' ,?u'/v2/?登錄& amptpl = mn & ampu=%2F ',?u'/?cid=002540 ',?u ' ',?u ' ',?u ' ',?u ' ',?u ' ',?u'/v2/?登錄& amptpl = mn & ampu=%2F ',?u'/gaoji/preferences.html ',?u'/more/',?u'/ns?cl = 2 & amprn=20。tn =新聞與娛樂。word= ',?u'/f?千瓦= & ampfr=wwwt ',?u'/q?CT = 17 & amp;pn = 0 & ampikaslist & amprn=10。word = & ampfr=wwwt ',?u '/搜索?fr = ps & ampie=utf-8。key= ',?u '/搜索/索引?tn = baiduimage & ampPS = 1 & amp;ct=201326592。lm=-1。cl = 2 & ampnc=1。ie=utf-8。word= ',?u'/v?ct=301989888。rn=20。pn = 0 & ampdb = 0 & amps = 25 & ampie=utf-8。word= ',?u'/m?word = & ampfr=ps01000 ',?u '/搜索?word = & amplm = 0 & ampod = 0 & ampie=utf-8 ',?u'//www.baidu.com/more/',?u'/',?u '//www . Baidu . com/cache/set help/help . html ',?u ' ',?u ' ',?u'/duty/',?u'/']?[u '//www . Baidu . com/img/BD _ logo 1 . png ',?u '//www . Baidu . com/img/Baidu _ jgy llog 3 . gif ']如有疑問,滿意請采納。