为什么我获取到信息是这样子的啊,是被反爬了吗。怎么解决啊

为什么我获取到信息是这样子的啊,是被反爬了吗。怎么解决啊

<!doctype html>
<!--[if lt IE 7]><html class="ie ie6"><![endif]-->
<!--[if IE 7]><html class="ie ie7"><![endif]-->
<!--[if IE 8]><html class="ie ie8"><![endif]-->
<!--[if IE 9]><html class="ie9"><![endif]-->
<!--[if (gt IE 9)|!(IE)]><!--><html class="standard"><!--<![endif]-->
  <head>
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
    
    <title>BOSS直聘</title>
    <script>var staticPath="https://static.zhipin.com/zhipin-geek/chat/v365/"</script>
    <link rel="shortcut icon" href="https://www.zhipin.com/favicon.ico">
    <link rel="stylesheet" href="https://static.zhipin.com/v2/web/geek/css/prop-mall.css">
    
    <script src="https://z.zhipin.com/H5/js/plugins/web-report.min-1.28.js"></script>
    <script>try{performanceReport({action:"action_js_monitor",appKey:"ObsSRskiryhn60pf",isAjax:!(-1<navigator.userAgent.indexOf("Atlantis"))})}catch(t){}</script>
    
  <link href="https://static.zhipin.com/zhipin-geek/chat/v365/static/css/vendor.4e4fcdef.css" rel="stylesheet" crossorigin="anonymous"><link href="https://static.zhipin.com/zhipin-geek/chat/v365/static/css/app.4e4fcdef.css" rel="stylesheet" crossorigin="anonymous"></head>

  <body>
    <script>!function(c,b,d,a){with(c[a]||(c[a]={}),c[a].config={pid:"h9614n96f5@7f36ca53ae12344",appType:"web",imgUrl:"https://arms-retcode.aliyuncs.com/r.png?",disableHook:!0,enableSPA:!0},b)with(body)with(insertBefore(createElement("script"),firstChild))setAttribute("crossorigin","",src=d)}(window,document,"https://retcode.alicdn.com/retcode/bl.js","__bl")</script>
    <div id="app">
      <div class="data-tips">
        <div class="page-loading">
          <span class="component-b"></span>
          <span class="component-o"></span>
          <span class="component-s1"></span>
          <span class="component-s2"></span>
          <p class="gray">加载中,请稍候</p>
        </div>
      </div>
    </div>
    <input type="hidden" id="page_key_name" value="">
    <script src="https://static.zhipin.com/library/js/lib/jquery-1.12.2.min.js" crossorigin="anonymous"></script>
    <script src="https://static.zhipin.com/v2/web/common/mqtt-v2.1.min.js" crossorigin="anonymous"></script>
    <script>function get_share_datas_from_html_inapp(){var e="shdefault",t=getQueryString("shid"),n=getQueryString("sid");t?e=t:n&&(e=n);var r="pkdefault",a="ppdefault",u=document.getElementById("page_key_name");if(u){var i=u.value;if(i){var l=i.split("|");1==l.length?r=l[0]:2<=l.length&&(r=l[0],a=l[1])}}var g=new Array;return g.shid=e,g.pk=r,g.pp=a,g}function getQueryString(e){var t=new RegExp("(^|&)"+e+"=([^&]*)(&|$)"),n=window.location.search.substr(1).match(t);return null!=n?unescape(n[2]):null}</script>
    <script>var _T=_T||[];!function(){var t=document.createElement("script");t.src="https://static.zhipin.com/library/js/analytics/ka.zhipin.min.js";var e=document.getElementsByTagName("script")[0];e.parentNode.insertBefore(t,e)}()</script>
    
    <script src="https://static.zhipin.com/library/js/lib/vue-core-v1.0.0.min.js"></script>
    
    <script src="https://static.zhipin.com/library/js/utils/lottie_light-v5.9.4.min.js"></script>
    
  <script src="https://static.zhipin.com/zhipin-geek/chat/v365/static/js/vendor.4e4fcdef.js" crossorigin="anonymous"></script><script src="https://static.zhipin.com/zhipin-geek/chat/v365/static/js/app.4e4fcdef.js" crossorigin="anonymous"></script><script src="https://static.zhipin.com/zhipin-geek/chat/v365/static/js/boss-ui.4e4fcdef.js" crossorigin="anonymous"></script></body>
  
</html>

https://img1.sycdn.imooc.com//climg/651ecb1009d1a24712020579.jpg

正在回答 回答被采纳积分+1

登陆购买课程后可参与讨论,去登陆

1回答
好帮手慕小猿 2023-10-06 15:40:35

同学,你好!是同学没找对链接。boss不适合使用spider 爬取,boss 搜索职位使用的是接口。使用接口返回的数据,链接如下:

https://img1.sycdn.imooc.com//climg/651fb92f09a9f93b17340954.jpg

找对url ,请求时添加headers 可请求成功,参考代码如下:

import requests,json
headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.5735.289 Safari/537.36',
         'Cookie':"lastCity=101030100; __g=-; __l=l=%2Fwww.zhipin.com%2Ftianjin%2F&r=&g=&s=3&friend_source=0; wd_guid=b1063da9-69c1-4dc4-9634-03917ed74432; historyState=state; _bl_uid=gjlv6n60ety9eIpF23pCqX51LLRC; __c=1696576196; __a=70432027.1696576196..1696576196.4.1.4.4; __zp_stoken__=a086eSGQtC0FXKF54dQZZaU1GE34nN0hnRDx5HF9RNxpgaWw%2FXlYbN00qU3llBCg0X2dsWCFYZH4PYW8IUEdcIGB2C35qBhI7ZD1JFzUjRisqWkMqcSYAS0RDMgYaH0pxRwMGUlBDRAAcagMsUwdyPyotKBl1ZzRPI0VDHyJ4T0QoPR5HJXUzHCRWEQxYJngONX1sWGAGFg%3D%3D"}
response=requests.get("https://www.zhipin.com/wapi/zpgeek/search/joblist.json?scene=1&query=python&city=101030100&experience=&payType=&partTime=&degree=&industry=&scale=&stage=&position=&jobType=&salary=&multiBusinessDistrict=&multiSubway=&page=1&pageSize=30",headers=headers)
a=json.loads(response.text)
print(a['zpData']['jobList'])

返回数据如下:

https://img1.sycdn.imooc.com//climg/651fb9bd09de922218350654.jpg

祝学习愉快~

  • 提问者 吴老师在线叛逃 #1
    请问这里所说的使用接口返回的数据是什么意思呢
    2023-10-07 00:02:06
  • 同学,你好!接口是向目标url 发送请求,返回json格式的字符串。就和我们爬虫爬取数据是一样的:向url 发送请求,返回请求的内容。不同的是发送url 返回的是字符串,接口返回的是json 格式的字符串

    祝学习愉快~

    2023-10-07 07:09:50
  • 同学,你好!接口是向目标url 发送请求,返回json格式的字符串。就和我们爬虫爬取数据是一样的:向url 发送请求,返回请求的内容。不同的是发送url 返回的是字符串,接口返回的是json 格式的字符串

    祝学习愉快~

    2023-10-07 07:09:50
问题已解决,确定采纳
还有疑问,暂不采纳

恭喜解决一个难题,获得1积分~

来为老师/同学的回答评分吧

0 星
请稍等 ...
意见反馈 帮助中心 APP下载
官方微信

在线咨询

领取优惠

免费试听

领取大纲

扫描二维码,添加
你的专属老师