scrapy爬取企查查多次爬取导致需要进行用户验证怎么办

下面是爬取结果可以看到我前面还是能爬取到信息的爬了一会就报412了然后我点击那个链接就让我进行用户验证怎么解决？

import scrapy
import xlrd
from qcc_excel.items import QccExcelItem
 
 
class QccExcel1Spider(scrapy.Spider):
    name = 'qcc_excel_1'
    allowed_domains = ['qcc.com']
 
    # start_urls = ['http://qcc.com/']
    def start_requests(self):
        excel = xlrd.open_workbook('company.xlsx')
        work_book = excel.sheet_by_name('Sheet1')
        for k, v in work_book.get_rows():
            company_list_url = 'https://www.qcc.com/web/search?key={}'.format(k.value)
            yield scrapy.Request(url=company_list_url, callback=self.parse)
 
    def parse(self, response, *args, **kwargs):
        second_url = response.xpath(
            '//table[@class="ntable ntable-list"]/tr[1]/td[3]/div/div/span/a/@href').extract_first()
        yield scrapy.Request(url=second_url, callback=self.handle_second_url)
 
    def handle_second_url(self, response):
        info = QccExcelItem()
        info["location"] = response.xpath('//table[@class="ntable"]/tr[6]/td[4]/text()').extract_first()
        info["trades"] = response.xpath('//table[@class="ntable"]/tr[6]/td[2]/text()').extract_first()
        info["business"] = response.xpath('//table[@class="ntable"]/tr[10]/td[2]/text()').extract_first()
        yield info

后知后觉469874 2022-04-20

源自：Scrapy爬虫项目实战 2-4 解析详情页并启动爬虫项目

收起

1回答

好帮手慕凡 2022-04-20 16:19:53

同学，你好！

爬取次数太多会触发反爬，商业化的网站反爬措施都比较复杂，爬取比较困难，会有各种反爬的措施，需要不同的方案去解决，现在是已经被反爬了，同学可以尝试下使用IP代理：https://class.imooc.com/lesson/2198#mid=55544，如果反爬措施比较复杂无法爬取，则需要有针对性的研究一下反反爬技术，祝学习愉快~

收起回答