欧美三级电影完整|亚洲一二三四久久|性爱视频精品一区二区免费在线观看|国产精品啪啪视频|婷婷六月综合操人妻视频网站|99爱免费视频在线观看|美女一级片在线观看|北京熟女88av|免费看黄色A级电影|欧美黄色毛片儿

Python 爬蟲(chóng)框架 looter

2023-04-12

我們常用的pyspider,scrapy就不多介紹了,今天咱們玩looter框架的爬蟲(chóng),其實(shí)爬蟲(chóng)很有意思,看看下面的代碼就秒懂。



安裝


先安裝好python3,需要3.6以上,然后執(zhí)行 pip install looter


λ looter -h
Looter, a python package designed for web crawler lovers :)
Author: alphardex  QQ:2582347430
If any suggestion, please contact me.
Thank you for cooperation!

Usage:
  looter genspider  [--async]
  looter shell []
  looter (-h | --help | --version)

Options:
  -h --help        Show this screen.
  --version        Show version.
  --async          Use async instead of concurrent.

圖片爬蟲(chóng)


λ looter shell https://konachan.com/post

Available objects:
    url           The url of the site you crawled.
    res           The response of the site.
    tree          The element source tree to be parsed.

Available functions:
    fetch         Send HTTP request to the site and parse it as a tree. [has async version]
    view          View the page in your browser. (test rendering)
    links         Get the links of the page.
    save          Save what you crawled as a file. (json or csv)

Examples:
    Get all the 
  • elements of a
      table: >>> items = tree.css('ul li') Get the links with a regex pattern: >>> items = links(res, pattern=r'.*/(jpeg|image)/.*') For more info, plz refer to documentation: [looter]: https://looter.readthedocs.io/en/latest/ imgs = tree.css('a.directlink::attr(href)').extract() >>> imgs[1:10] ['https://konachan.com/jpeg/c67d38b73df6e32199127998fc0f3338/Konachan.com%20-%20283270%20ass%20bed%20blush%20breasts%20clover_%28sakura_gamer%29%20game_cg%20nipples%20pussy_juice%20red_hair%20sakura_gamer%20wanaca%20winged_cloud.jpg', 'https://konachan.com/image/a0952daaf9aa94cd676901203680fec4/Konachan.com%20-%20283269%20aliasing%20anus%20azur_lane%20blush%20breasts%20cum%20gray_hair%20group%20long_hair%20nipples%20nude%20penis%20pussy%20rak_%28kuraga%29%20red_eyes%20twintails%20uncensored.jpg', 'https://konachan.com/image/e8ea71c93a895d87338ebf17e3aef5b3/Konachan.com%20-%20283268%20aliasing%20anthropomorphism%20azur_lane%20blush%20breasts%20gray_hair%20group%20long_hair%20nipples%20nude%20penis%20pussy%20rak_%28kuraga%29%20red_eyes%20sex%20twintails%20uncensored.jpg', 'https://konachan.com/image/8ffb6f968ffe372ea90a339934a9749d/Konachan.com%20-%20283267%20bed%20blush%20brown_eyes%20brown_hair%20condom%20inanaki_shiki%20long_hair%20navel%20no_bra%20open_shirt%20original%20panties%20tie%20underwear.jpg', 'https://konachan.com/jpeg/0d1de5c59eaf6fc717d63912e076de1d/Konachan.com%20-%20283266%20ass%20bed%20black_hair%20brown_eyes%20long_hair%20matsuzaki_miyuki%20original%20ponytail%20shorts.jpg', 'https://konachan.com/jpeg/7b34654c53e43879f20a8fd642c32acc/Konachan.com%20-%20283264%20aqua_eyes%20bed%20blonde_hair%20blush%20breasts%20censored%20dark_skin%20navel%20nipples%20no_bra%20original%20penis%20pubic_hair%20pussy%20sex%20shirt_lift%20spread_legs%20tan_lines.jpg', 'https://konachan.com/image/00a0eb43c07e9361679b5389e284ef7f/Konachan.com%20-%20283263%20ass%20ball%20brown_eyes%20cameltoe%20dress%20erect_nipples%20gray_hair%20kokkoro%20loli%20panties%20pizanuko%20pointed_ears%20princess_connect%21%20underwear%20upskirt%20wristwear.jpg', 'https://konachan.com/jpeg/889214118e9a891c63f0cb759d809775/Konachan.com%20-%20283262%202girls%20animal%20bow%20brown_eyes%20brown_hair%20clouds%20dress%20feathers%20flowers%20gloves%20green_eyes%20headdress%20idolmaster%20loli%20ribbons%20rose%20short_hair%20sky%20tiara.jpg', 'https://konachan.com/image/c7a3f7f9d6a2c1dc17c4c13733f72aed/Konachan.com%20-%20283261%20bikini_top%20black_hair%20blue_eyes%20boots%20chain%20flat_chest%20gloves%20hoodie%20inosia%20kuroi_mato%20long_hair%20magic%20navel%20scar%20shorts%20signed%20sword%20twintails%20weapon.jpg'] Path('konachan.txt').write_text('\n'.join(imgs)) wget -i konachan.txt
  • 抓取 v2


    import time
    import looter as lt
    from pprint import pprint
    from concurrent import futures
    
    domain = 'https://www.v2ex.com'
    total = []
    
    
    def crawl(url):
        tree = lt.fetch(url)
        items = tree.css('#TopicsNode .cell')
        for item in items:
            data = {}
            data['title'] = item.css('span.item_title a::text').extract_first()
            data['author'] = item.css('span.small.fade strong a::text').extract_first()
            data['source'] = f"{domain}{item.css('span.item_title a::attr(href)').extract_first()}"
            reply = item.css('a.count_livid::text').extract_first()
            data['reply'] = int(reply) if reply else 0
            pprint(data)
            total.append(data)
        time.sleep(1)
    
    
    if __name__ == '__main__':
        tasklist = [f'{domain}/go/python?p={n}' for n in range(1, 10)]
        [crawl(task) for task in tasklist]
        lt.save(total, name='v2ex.csv', sort_by='reply', order='desc')

    抓取10頁(yè)python主題的數(shù)據(jù),按照回復(fù)數(shù)倒序排列


    ,author,reply,source,title
    0,chinesehuazhou,127,https://www.v2ex.com/t/562327#reply127,10 行 Python 代碼,批量壓縮圖片 500 張,簡(jiǎn)直太強(qiáng)大了(內(nèi)有公號(hào)宣傳,不喜勿進(jìn))
    1,chinesehuazhou,103,https://www.v2ex.com/t/557286#reply103,len(x) 擊敗 x.len(),從內(nèi)置函數(shù)看 Python 的設(shè)計(jì)思想(內(nèi)有公號(hào)宣傳,不喜勿進(jìn))
    2,nfroot,73,https://www.v2ex.com/t/555249#reply73,面對(duì) Python 的強(qiáng)大和難用性表示深深的迷茫,莫非打開(kāi)方式不對(duì)?
    3,css3,58,https://www.v2ex.com/t/554724#reply58,你們用什么工具來(lái)管理 Python 的庫(kù)???
    4,Northxw,54,https://www.v2ex.com/t/558529#reply54,花式反爬之某眾點(diǎn)評(píng)網(wǎng)
    5,akmonde,48,https://www.v2ex.com/t/559926#reply48,Python 項(xiàng)目移植到其他機(jī)器,要求全 Linux 系統(tǒng)適配
    6,kayseen,47,https://www.v2ex.com/t/562683#reply47,這道 Python 題目有大神會(huì)做嗎?
    7,hellomacos,41,https://www.v2ex.com/t/562413#reply41,老生常談的問(wèn)題:如何學(xué)好 Python

    本文僅代表作者觀點(diǎn),版權(quán)歸原創(chuàng)者所有,如需轉(zhuǎn)載請(qǐng)?jiān)谖闹凶⒚鱽?lái)源及作者名字。

    免責(zé)聲明:本文系轉(zhuǎn)載編輯文章,僅作分享之用。如分享內(nèi)容、圖片侵犯到您的版權(quán)或非授權(quán)發(fā)布,請(qǐng)及時(shí)與我們聯(lián)系進(jìn)行審核處理或刪除,您可以發(fā)送材料至郵箱:service@tojoy.com