请问1)为什么有的网页打不开xpathhelper啊2)如果打不开应该在哪先尝试xpath语句呢

请问1)为什么有的网页打不开xpathhelper啊2)如果打不开应该在哪先尝试xpath语句呢

老师,请问(1)为什么有的网页打不开xpath helper啊?

(2)如果打不开的话应该在哪里先尝试xpath语句呢?


老师您好,遇到问题的过程是这样:((3)老师您先看我这个思路对不对)

1)通过两次对https://www.guazi.com/www/buy请求,通过第二次requests.get().text()方法并返回了数据,(这里我试过:将返回的内容response.text存到一个html文件中,但是网页打不开,更别说使用xpath语句了)

2)通过guazi_html = lxml.etree.HTML(requests返回的数据response.text)对返回的html格式化,并转化为HTML对象,

3)通过lxml.etree.tostring(格式化后的HTML对象guazi_html).decode()将html转化为文本数据,经过html美化/格式化后,并保存到一个新建的名为test2.html的文件中

4)我想通过浏览器打开test2.html,并在这个html文件中使用浏览器提供的xpath helper工具,使用xpath语句来定位需要查询并获取的城市名称、url信息,但是xpath helper打不开,

请看下面的图:(有字符限制,我删掉了后面的一部分)

http://img1.sycdn.imooc.com//climg/5e5bda6e09037f2a13650678.jpg

以下是test2.html文件:

<html>&#13;

&#13;

<head _tracker="{&quot;pagetype&quot;:&quot;list&quot;,&quot;city&quot;:&quot;www&quot;,&quot;qpres&quot;:&quot;247324030100119552&quot;,&quot;cpres&quot;:&quot;search cpres&quot;,&quot;expids&quot;:&quot;{ranker_id=0, predictor_id=15, retriever_id=0, rewriter_id=0, rank_sorter_id=1}&quot;,&quot;line&quot;:&quot;c2c&quot;,&quot;platform&quot;:&quot;web&quot;,&quot;ca_city&quot;:&quot;nj&quot;}">

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>

<meta name="renderer" content="webkit"/>

<title>&#12304;&#20840;&#22269;&#20108;&#25163;&#36710;&#12305;&#20840;&#22269;&#20108;&#25163;&#36710;&#20132;&#26131;&#24066;&#22330;_&#20840;&#22269;&#20108;&#25163;&#36710;&#25253;&#20215;_&#20840;&#22269;&#20108;&#25163;&#36710;&#24066;&#22330;-&#20840;&#22269;&#29916;&#23376;&#20108;&#25163;&#36710;</title>

<meta name="keywords" content="&#20840;&#22269;&#20108;&#25163;&#36710;,&#20840;&#22269;&#20108;&#25163;&#36710;&#20132;&#26131;&#24066;&#22330;,&#20840;&#22269;&#20108;&#25163;&#36710;&#25253;&#20215;,&#20840;&#22269;&#20108;&#25163;&#36710;&#24066;&#22330;"/>

<meta name="description" content="&#20840;&#22269;&#20108;&#25163;&#36710;,&#20840;&#22269;&#20108;&#25163;&#36710;&#20132;&#26131;&#24066;&#22330;,&#20840;&#22269;&#20108;&#25163;&#36710;&#25253;&#20215;,&#20840;&#22269;&#20108;&#25163;&#36710;&#24066;&#22330;&#25552;&#20379;&#20840;&#22269;&#20108;&#25163;&#36710;&#25253;&#20215;&#31561;&#20108;&#25163;&#36710;&#20449;&#24687;&#12290;&#29916;&#23376;&#20108;&#25163;&#36710;&#20026;&#24744;&#25552;&#20379;&#20840;&#22269;&#20108;&#25163;&#36710;&#36710;&#28304;,&#20080;&#21334;&#20108;&#25163;&#36710;&#23601;&#19978;&#29916;&#23376;&#20108;&#25163;&#36710;&#12290;"/>


    <meta http-equiv="P3P" content="CP=&quot;CAO PSA OUR&quot;"/>

    <meta name="baidu-site-verification" content="3r3nh4dkLA"/>

    <meta name="360-site-verification" content="f045c917619b6b3dc82ad5f699a09474"/>

    <meta name="google-site-verification" content="FQph3WEY6ZqNqVXCB5PT4_u8f-WjfF14l2OOdFiOEmg"/>

    <meta http-equiv="Cache-Control" content="no-transform "/>

    <meta name="sogou_site_verification" content="qY406sTreO"/>

    <meta name="shenma-site-verification" content="7b096264bff0cf1031a570c37abed00c_1476775946"/>

    <link rel="shortcut icon" type="image/x-icon" href="https://www.guazi.com/favicon.ico" media="screen"/>

</head>

<script>

        var logged = false;

    var host = 'www.guazi.com';

    var termUrl = 'https://image.guazistatic.com/gz01190926/17/21/21c7f57e0f6b77e4b7dd3b608e44d04f.pdf';

    var privacyUrl = 'https://image.guazistatic.com/gz01191219/14/22/29a34d8443b4a998e9b96592797f27d5.pdf';

    window.userPhone = '';

    window.phone400 = '400-023-1529';

</script>

<script>&#13;

    var subInitInfo = {"brand":0,"series":0,"type":0,"price":0,"licenseDate":0,"roadHaul":0,"licenseCity":-1,"colour":0,"gearbox":0,"country":0,"emission":0};&#13;

    var handpickSelect = false;&#13;

    var financeType = 0;&#13;

    var listlogArr = {"city_filter":"-1","num":121212};&#13;

    var cityDomain = 'www';&#13;

</script>&#13;

    <link rel="canonical" href="https://www.guazi.com/www/buy"/>&#13;

&#13;

&#13;

<script type="text/javascript" src="//cli-sta.guazistatic.com/c2c_web/base.bbda0bee2fcd9ac614ac.js"/><script type="text/javascript" src="//cli-sta.guazistatic.com/c2c_web/list_v4.abb8c7171ff71e6200d7.js"/><body class="list">&#13;

&#13;

&#13;

&#13;

<input type="hidden" id="skipKindNew" value="0"/>&#13;

&#13;

<input type="hidden" id="clueData" data-puid="" data-city-id="-1"/>&#13;

&#13;

&#13;

&#13;

&#13;

<div id="jstop" class="j-header header-2 ">&#13;

    <div class="header">&#13;

        <h1>&#13;

            <a href="/www/" title="&#29916;&#23376;&#20108;&#25163;&#36710;">&#29916;&#23376;&#20108;&#25163;&#36710;</a>&#13;

        </h1>&#13;

                <div class="city">&#13;

            &#13;

            <p class="city-curr">&#13;

                &#20840;&#22269;<i/>&#13;

            </p>&#13;

        &#13;

        </div>&#13;

                <div class="uc js-uc js-uc-new" data-gzlog="tracking_type=click&amp;eventid=1015123400000003">&#13;

                <a href="javascript:" class="uc-my" id="js-login-new">&#30331;&#24405;</a>&#13;

                <div class="uc-app" style="display:none">&#13;

                    <a href="/www/userstore" class="js-loginElem1" data-gzlog="tracking_type=click&amp;eventid=1015123400000004">&#25910;&#34255;&#36710;&#36742;</a>&#13;

                    <a href="/www/userreduce" class="js-loginElem2" data-gzlog="tracking_type=click&amp;eventid=1015123400000005">&#38477;&#20215;&#25552;&#37266;</a>&#13;

                    <a href="/www/userhistory" class="js-loginElem3" data-gzlog="tracking_type=click&amp;eventid=1015123400000006">&#27983;&#35272;&#35760;&#24405;</a>&#13;

                    <a href="javascript:;" class="js-logout js-loginElem4" data-gzlog="tracking_type=click&amp;eventid=1015123400000007">&#36864;&#20986;</a>&#13;

                    <i/>&#13;

                </div>&#13;

        </div>&#13;

        <div class="header-phone">&#13;

            &#13;

            &#28909;&#32447;&#30005;&#35805;  400-023-1529        </div>&#13;

                <div class="nav-list">&#13;

            <a class="fl" href="https://www.maodou.com?ca_s=xcsop_guazipc&amp;ca_n=topbar" data-gzlog="tracking_type=click&amp;eventid=1015083000000001" title="&#27611;&#35910;&#26032;&#36710;" target="_blank">&#27611;&#35910;&#26032;&#36710;</a>&#13;

            <a class="fr " href="https://jr.guazi.com/www/?jr_from=web_loanindex&amp;platform=web" data-gzlog="tracking_type=click&amp;eventid=0010000000000011">&#29916;&#23376;&#37329;&#34701;</a>&#13;

            <a class="fl " href="/www/intro/" data-gzlog="tracking_type=click&amp;eventid=0010000000000010">&#29916;&#23376;&#26381;&#21153;</a>&#13;

            <a class="fl " href="/www/sell/?clueS=01" data-gzlog="tracking_type=click&amp;eventid=0010050000000009">&#25105;&#35201;&#21334;&#36710;</a>&#13;

            <a class="fl active" href="/www/buy/" data-gzlog="tracking_type=click&amp;eventid=0010000000000008">&#25105;&#35201;&#20080;&#36710;</a>&#13;

            <a class="fl " href="/www/" data-gzlog="tracking_type=click&amp;eventid=0010000000000007">&#39318;&#39029;</a>&#13;

        </div>&#13;

    </div>&#13;

</div>&#13;

&#13;

&#13;

&#13;

&#13;

&#13;

<div class="pop-box pop-login" id="login1"/>&#13;

<iframe name="guazi_login" style="display: none;"/>&#13;

&#13;

&#13;

&#13;

&#13;

&#13;

<div class="pop-mask"/>&#13;

&#13;

&#13;

<script>&#13;

    var domain = 'www';&#13;

    var cityId = "-1";&#13;

    var cityName = "\u5168\u56fd";&#13;

    var disOtherCity = 1;&#13;

</script>&#13;

&#13;

&#13;

&#13;

<input type="hidden" value="{&quot;city_id&quot;:&quot;-1&quot;}" id="listFilterNew"/>&#13;

<script>&#13;

    $(document).ready(function(){&#13;

&#13;

        if($("#listFilterNew").val() != ''){&#13;

            var data = $.extend(&#13;

                {&#13;

                    tracking_type : 'show',&#13;

                    eventid:'901545643109',&#13;

                },&#13;

                JSON.parse($("#listFilterNew").val())&#13;

            );&#13;

            window.tracker.send(data);&#13;

        }&#13;

    });&#13;

</script>&#13;

&#13;

&#13;

<div class="city-box-parent">&#13;

    <div class="city-scroll">&#13;

    <div class="city-box all-city">&#13;

        <div class="city-box-left">&#13;

            <dl class="bdb-n">&#13;

                <dt class="green-tit">&#160;</dt>&#13;

                <dd>&#13;

                                    </dd>&#13;

            </dl>&#13;

            <div id="cityLeft"/>&#13;

        </div>&#13;

        <div class="city-box-right">&#13;

            <dl class="bdb-s">&#13;

                <dt class="green-tit">&#28909;&#38376;</dt>&#13;

                <dd>&#13;

                                                                        <a data-gzlog="tracking_type=click&amp;eventid=0020060000000021" href="/www/buy" title="&#20840;&#22269;&#20108;&#25163;&#36710;">&#20840;&#22269;</a>&#13;

                                                                                                <a data-gzlog="tracking_type=click&amp;eventid=0020060000000019&amp;select_city=bj" class="" href="/bj/buy" title="&#21271;&#20140;&#20108;&#25163;&#36710;">&#21271;&#20140;</a>&#13;

                                                                                                <a data-gzlog="tracking_type=click&amp;eventid=0020060000000019&amp;select_city=sh" class="" href="/sh/buy" title="&#19978;&#28023;&#20108;&#25163;&#36710;">&#19978;&#28023;</a>&#13;

                                                                                                <a data-gzlog="tracking_type=click&amp;eventid=0020060000000019&amp;select_city=gz" class="" href="/gz/buy" title="&#24191;&#24030;&#20108;&#25163;&#36710;">&#24191;&#24030;</a>&#13;

                                                                                                <a data-gzlog="tracking_type=click&amp;eventid=0020060000000019&amp;select_city=sz" class="" href="/sz/buy" title="&#28145;&#22323;&#20108;&#25163;&#36710;">&#28145;&#22323;</a>&#13;

                                                                                                <a data-gzlog="tracking_type=click&amp;eventid=0020060000000019&amp;select_city=cd" class="" href="/cd/buy" title="&#25104;&#37117;&#20108;&#25163;&#36710;">&#25104;&#37117;</a>&#13;

                                                                                                <a data-gzlog="tracking_type=click&amp;eventid=0020060000000019&amp;select_city=cq" class="" href="/cq/buy" title="&#37325;&#24198;&#20108;&#25163;&#36710;">&#37325;&#24198;</a>&#13;

                                                                                                <a data-gzlog="tracking_type=click&amp;eventid=0020060000000019&amp;select_city=hz" class="" href="/hz/buy" title="&#26477;&#24030;&#20108;&#25163;&#36710;">&#26477;&#24030;</a>&#13;

                                                                                                <a data-gzlog="tracking_type=click&amp;eventid=0020060000000019&amp;select_city=su" class="" href="/su/buy" title="&#33487;&#24030;&#20108;&#25163;&#36710;">&#33487;&#24030;</a>&#13;

                                                                                                <a data-gzlog="tracking_type=click&amp;eventid=0020060000000019&amp;select_city=sy" class="" href="/sy/buy" title="&#27784;&#38451;&#20108;&#25163;&#36710;">&#27784;&#38451;</a>&#13;

                                                                                                <a data-gzlog="tracking_type=click&amp;eventid=0020060000000019&amp;select_city=wh" class="" href="/wh/buy" title="&#27494;&#27721;&#20108;&#25163;&#36710;">&#27494;&#27721;</a>&#13;

                                                            </dd>&#13;

            </dl>&#13;

            <div id="cityRight"/>&#13;

        </div>&#13;

    </div>&#13;

    </div>&#13;

</div>&#13;

<script>&#13;

    $().ready(function initCity(){&#13;

        showCity();&#13;

        }&#13;

    );&#13;

    //&#26174;&#31034;&#22478;&#24066;&#36873;&#25321;&#26694;&#13;

    function showCity() {&#13;

        var url = 'buy';&#13;

        var cityLeft = {"A":[{"id":102089,"domain":"anji","name":"\u5b89\u5409","firstC":"A","active":false},{"id":100651,"domain":"anyue","name":"\u5b89\u5cb3","firstC":"A","active":false},{"id":1001901,"domain":"anyanxian","name":"\u5b89\u9633\u53bf","firstC":"A","active":false},{"id":39,"domain":"anshun","name":"\u5b89\u987a","firstC":"A","active":false},{"id":57,"domain":"anshan","name":"\u978d\u5c71","firstC":"A","active":false},{"id":109,"domain":"anyang","name":"\u5b89\u9633","firstC":"A","active":false},{"id":127,"domain":"anqing","name":"\u5b89\u5e86","firstC":"A","active":false},{"id":184,"domain":"ankang","name":"\u5b89\u5eb7","firstC":"A","active":false}],"B":[{"id":1001523,"domain":"baoying","name":"\u5b9d\u5e94","firstC":"B","active":false},{"id":6,"domain":"baoding","name":"\u4fdd\u5b9a","firstC":"B","active":false},{"id":12,"domain":"bj","name":"\u5317\u4eac","firstC":"B","active":false},{"id":41,"domain":"bijie","name":"\u6bd5\u8282","firstC":"B","active":false},{"id":89,"domain":"baishan","name":"\u767d\u5c71","firstC":"B","active":false},{"id":91,"domain":"baicheng","name":"\u767d\u57ce","firstC":"B","active":false},{"id":122,"domain":"binzhou","name":"\u6ee8\u5dde","firstC":"B","active":false},{"id":125,"domain":"bengbu","name":"\u868c\u57e0","firstC":"B","active":false},{"id":146,"domain":"baotou","name":"\u5305\u5934","firstC":"B","active":false},{"id":152,"domain":"bayanchuoer","name":"\u5df4\u5f66\u6dd6\u5c14","firstC":"B","active":false},{"id":168,"domain":"baiyin","name":"\u767d\u94f6","firstC":"B","active":false},{"id":178,"domain":"baoji","name":"\u5b9d\u9e21","firstC":"B","active":false},{"id":281,"domain":"bazhong","name":"\u5df4\u4e2d","firstC":"B","active":false},{"id":286,"domain":"benxi","name":"\u672c\u6eaa","firstC":"B","active":false},{"id":314,"domain":"bozhou","name":"\u4eb3\u5dde","firstC":"B","active":false},{"id":317,"domain":"beihai","name":"\u5317\u6d77","firstC":"B","active":false}],"C":[{"id":102866,"domain":"changshou","name":"\u957f\u5bff","firstC":"C","active":false},{"id":1002275,"domain":"zengdu","name":"\u66fe\u90fd","firstC":"C","active":false},{"id":1002073,"domain":"chunan","name":"\u6df3\u5b89","firstC":"C","active":false},{"id":1002082,"domain":"cangnan","name":"\u82cd\u5357","firstC":"C","active":false},{"id":8,"domain":"chengde","name":"\u627f\u5fb7","firstC":"C","active":false},{"id":9,"domain":"cangzhou","name":"\u6ca7\u5dde","firstC":"C","active":false},{"id":15,"domain":"cq","name":"\u91cd\u5e86","firstC":"C","active":false},{"id":45,"domain":"cd","name":"\u6210\u90fd","firstC":"C","active":false},{"id":69,"domain":"changzhou","name":"\u5e38\u5dde","firstC":"C","active":false},{"id":84,"domain":"cc","name":"\u957f\u6625","firstC":"C","active":false},{"id":128,"domain":"chuzhou","name":"\u6ec1\u5dde","firstC":"C","active":false},{"id":148,"domain":"chifeng","name":"\u8d64\u5cf0","firstC":"C","active":false},{"id":158,"domain":"changzhi","name":"\u957f\u6cbb","firstC":"C","active":false},{"id":204,"domain":"cs","name":"\u957f\u6c99","firstC":"C","active":false},{"id":210,"domain":"changde","name":"\u5e38\u5fb7","firstC":"C","active":false},{"id":211,"domain":"chenzhou","name":"\u90f4\u5dde","firstC":"C","active":false},{"id":236,"domain":"chuxiong","name":"\u695a\u96c4","firstC":"C","active":false},{"id":289,"domain":"chaoyang","name":"\u671d\u9633","firstC":"C","active":false},{"id":315,"domain":"\u6c60\u5dde","name":"\u6c60\u5dde","firstC":"C","active":false}],"D":[{"id":100596,"domain":"dayi","name":"\u5927\u9091","firstC":"D","active":false},{"id":100587,"domain":"dujiangyan","name":"\u90fd\u6c5f\u5830","firstC":"D","active":false},{"id":101581,"domain":"danyang","name":"\u4e39\u9633","firstC":"D","active":false},{"id":100949,"domain":"shanxian","name":"\u5355\u53bf","firstC":"D","active":false},{"id":100763,"domain":"yangshan","name":"\u7800\u5c71","firstC":"D","active":false},{"id":1002007,"domain":"dengfeng","name":"\u767b\u5c01","firstC":"D","active":false},{"id":1002110,"domain":"dongyang","name":"\u4e1c\u9633","firstC":"D","active":false},{"id":1001550,"domain":"dongtai","name":"\u4e1c\u53f0","firstC":"D","active":false},{"id":1001575,"domain":"donghai","name":"\u4e1c\u6d77","firstC":"D","active":false},{"id":1002729,"domain":"duyun","name":"\u90fd\u5300","firstC":"D","active":false},{"id":1001207,"domain":"dongyuan","name":"\u4e1c\u6e90","firstC":"D","active":false},{"id":1001687,"domain":"dingzhou","name":"\u5b9a\u5dde","firstC":"D","active":false},{"id":24,"domain":"dg","name":"\u4e1c\u839e","firstC":"D","active":false},{"id":48,"domain":"deyang","name":"\u5fb7\u9633","firstC":"D","active":false},{"id":53,"domain":"dazhou","name":"\u8fbe\u5dde","firstC":"D","active":false},{"id":56,"domain":"dl","name":"\u5927\u8fde","firstC":"D","active":false},{"id":59,"domain":"dandong","name":"\u4e39\u4e1c","firstC":"D","active":false},{"id":98,"domain":"daqing","name":"\u5927\u5e86","firstC":"D","active":false},{"id":117,"domain":"dongying","name":"\u4e1c\u8425","firstC":"D","active":false},{"id":156,"domain":"datong","name":"\u5927\u540c","firstC":"D","active":false},{"id":237,"domain":"dali","name":"\u5927\u7406","firstC":"D","active":false},{"id":308,"domain":"dezhou","name":"\u5fb7\u5dde","firstC":"D","active":false}],"E":[{"id":100506,"domain":"emeishan","name":"\u5ce8\u7709\u5c71","firstC":"E","active":false},{"id":150,"domain":"eerduosi","name":"\u9102\u5c14\u591a\u65af","firstC":"E","active":false},{"id":331,"domain":"enshi","name":"\u6069\u65bd","firstC":"E","active":false}],"F":[{"id":102864,"domain":"fuling","name":"\u6daa\u9675","firstC":"F","active":false},{"id":1002886,"domain":"fengjie","name":"\u5949\u8282","firstC":"F","active":false},{"id":100997,"domain":"lvliang","name":"\u6c7e\u9633","firstC":"F","active":false},{"id":100793,"domain":"fengyang","name":"\u51e4\u9633","firstC":"F","active":false},{"id":1002533,"domain":"fuan","name":"\u798f\u5b89","firstC":"F","active":false},{"id":1001513,"domain":"fengxian","name":"\u4e30\u53bf","firstC":"F","active":false},{"id":20,"domain":"foshan","name":"\u4f5b\u5c71","firstC":"F","active":false},{"id":58,"domain":"fushun","name":"\u629a\u987a","firstC":"F","active":false},{"id":75,"domain":"fz","name":"\u798f\u5dde","firstC":"F","active":false},{"id":129,"domain":"fuyang","name":"\u961c\u9633","firstC":"F","active":false},{"id":223,"domain":"\u629a\u5dde","name":"\u629a\u5dde","firstC":"F","active":false},{"id":318,"domain":"fangchenggang","name":"\u9632\u57ce\u6e2f","firstC":"F","active":false}],"G":[{"id":100758,"domain":"guangde","name":"\u5e7f\u5fb7","firstC":"G","active":false},{"id":100609,"domain":"guli","name":"\u53e4\u853a","firstC":"G","active":false},{"id":1001689,"domain":"gaobeidian","name":"\u9ad8\u7891\u5e97","firstC":"G","active":false},{"id":100452,"domain":"gongzhuling","name":"\u516c\u4e3b\u5cad","firstC":"G","active":false},{"id":1001576,"domain":"guanyun","name":"\u704c\u4e91","firstC":"G","active":false},{"id":1001858,"domain":"gushi","name":"\u56fa\u59cb","firstC":"G","active":false},{"id":16,"domain":"gz","name":"\u5e7f\u5dde","firstC":"G","active":false},{"id":36,"domain":"gy","name":"\u8d35\u9633","firstC":"G","active":false},{"id":134,"domain":"gl","name":"\u6842\u6797","firstC":"G","active":false},{"id":137,"domain":"guigang","name":"\u8d35\u6e2f","firstC":"G","active":false},{"id":220,"domain":"ganzhou","name":"\u8d63\u5dde","firstC":"G","active":false},{"id":275,"domain":"guangyuan","name":"\u5e7f\u5143","firstC":"G","active":false},{"id":278,"domain":"guangan","name":"\u5e7f\u5b89","firstC":"G","active":false}],"H":[{"id":102869,"domain":"hechuan","name":"\u5408\u5ddd","firstC":"H","active":false},{"id":1001193,"domain":"haifeng","name":"\u6d77\u4e30","firstC":"H","active":false},{"id":100733,"domain":"huoqiu","name":"\u970d\u90b1","firstC":"H","active":false},{"id":1001068,"domain":"hejin","name":"\u6cb3\u6d25","firstC":"H","active":false},{"id":1003039,"domain":"huangyuan","name":"\u6e5f\u6e90","firstC":"H","active":false},{"id":100935,"domain":"haiyang","name":"\u6d77\u9633","firstC":"H","active":false},{"id":1001493,"domain":"haian","name":"\u6d77\u5b89","firstC":"H","active":false},{"id":1001242,"domain":"huaiji","name":"\u6000\u96c6","firstC":"H","active":false},{"id":1001903,"domain":"huaxian","name":"\u6ed1\u53bf","firstC":"H","active":false},{"id":1001859,"domain":"huangchuan","name":"\u6f62\u5ddd","firstC":"H","active":false},{"id":4,"domain":"handan","name":"\u90af\u90f8","firstC":"H","active":false},{"id":11,"domain":"hengshui","name":"\u8861\u6c34","firstC":"H","active":false},{"id":23,"domain":"huizhou","name":"\u60e0\u5dde","firstC":"H","active":false},{"id":26,"domain":"hz","name":"\u676d\u5dde","firstC":"H","active":false},{"id":30,"domain":"huzhou","name":"\u6e56\u5dde","firstC":"H","active":false},{"id":64,"domain":"huludao","name":"\u846b\u82a6\u5c9b","firstC":"H","active":false},{"id":72,"domain":"huaian","name":"\u6dee\u5b89","firstC":"H","active":false},{"id":93,"domain":"hrb","name":"\u54c8\u5c14\u6ee8","firstC":"H","active":false},{"id":107,"domain":"hebi","name":"\u9e64\u58c1","firstC":"H","active":false},{"id":123,"domain":"hf","name":"\u5408\u80a5","firstC":"H","active":false},{"id":140,"domain":"hechi","name":"\u6cb3\u6c60","firstC":"H","active":false},{"id":145,"domain":"nmg","name":"\u547c\u548c\u6d69\u7279","firstC":"H","active":false},{"id":182,"domain":"hanzhong","name":"\u6c49\u4e2d","firstC":"H","active":false},{"id":195,"domain":"huangshi","name":"\u9ec4\u77f3","firstC":"H","active":false},{"id":207,"domain":"hengyang","name":"\u8861\u9633","firstC":"H","active":false},{"id":268,"domain":"heyuan","name":"\u6cb3\u6e90","firstC":"H","active":false},{"id":310,"domain":"huainan","name":"\u6dee\u5357","firstC":"H","active":false},{"id":311,"domain":"huaibei","name":"\u6dee\u5317","firstC":"H","active":false},{"id":313,"domain":"huangshan","name":"\u9ec4\u5c71","firstC":"H","active":false},{"id":328,"domain":"huanggang","name":"\u9ec4\u5188","firstC":"H","active":false},{"id":336,"domain":"huaihua","name":"\u6000\u5316","firstC":"H","active":false},{"id":338,"domain":"heze","name":"\u83cf\u6cfd","firstC":"H","active":false}],"J":[{"id":102868,"domain":"jiangjin","name":"\u6c5f\u6d25","firstC":"J","active":false},{"id":100866,"domain":"juxian","name":"\u8392\u53bf","firstC":"J","active":false},{"id":1003272,"domain":"jiyang","name":"\u6d4e\u9633","firstC":"J","active":false},{"id":1002953,"domain":"jingbian","name":"\u9756\u8fb9","firstC":"J","active":false},{"id":1002069,"domain":"jiande","name":"\u5efa\u5fb7","firstC":"J","active":false},{"id":100987,"domain":"jiaocheng","name":"\u4ea4\u57ce","firstC":"J","active":false},{"id":1001547,"domain":"jinhu","name":"\u91d1\u6e56","firstC":"J","active":false},{"id":100509,"domain":"jiajiang","name":"\u5939\u6c5f","firstC":"J","active":false},{"id":100334,"domain":"jinghong","name":"\u666f\u6d2a","firstC":"J","active":false},{"id":1002102,"domain":"jiangshan","name":"\u6c5f\u5c71","firstC":"J","active":false},{"id":1002027,"domain":"jinyun","name":"\u7f19\u4e91","firstC":"J","active":false},{"id":21,"domain":"jiangmen","name":"\u6c5f\u95e8","firstC":"J","active":false},{"id":29,"domain":"jiaxing","name":"\u5609\u5174","firstC":"J","active":false},{"id":32,"domain":"jinhua","name":"\u91d1\u534e","firstC":"J","active":false},{"id":60,"domain":"jinzhou","name":"\u9526\u5dde","firstC":"J","active":false},{"id":85,"domain":"jilin","name":"\u5409\u6797","firstC":"J","active":false},{"id":100,"domain":"jiamusi","name":"\u4f73\u6728\u65af","firstC":"J","active":false},{"id":106,"domain":"jiaozuo","name":"\u7126\u4f5c","firstC":"J","active":false},{"id":113,"domain":"jn","name":"\u6d4e\u5357","firstC":"J","active":false},{"id":159,"domain":"jincheng","name":"\u664b\u57ce","firstC":"J","active":false},{"id":161,"domain":"jinzhong","name":"\u664b\u4e2d","firstC":"J","active":false},{"id":173,"domain":"jiuquan","name":"\u9152\u6cc9","firstC":"J","active":false},{"id":198,"domain":"jingzhou","name":"\u8346\u5dde","firstC":"J","active":false},{"id":200,"domain":"jingmen","name":"\u8346\u95e8","firstC":"J","active":false},{"id":215,"domain":"jingdezhen","name":"\u666f\u5fb7\u9547","firstC":"J","active":false},{"id":217,"domain":"jiujiang","name":"\u4e5d\u6c5f","firstC":"J","active":false},{"id":221,"domain":"jian","name":"\u5409\u5b89","firstC":"J","active":false},{"id":272,"domain":"jieyang","name":"\u63ed\u9633","firstC":"J","active":false},{"id":304,"domain":"jining","name":"\u6d4e\u5b81","firstC":"J","active":false}],"K":[{"id":101199,"domain":"kaiping","name":"\u5f00\u5e73","firstC":"K","active":false},{"id":1001564,"domain":"kunshan","name":"\u6606\u5c71","firstC":"K","active":false},{"id":1002713,"domain":"kaili","name":"\u51ef\u91cc","firstC":"K","active":false},{"id":225,"domain":"km","name":"\u6606\u660e","firstC":"K","active":false},{"id":293,"domain":"kaifeng","name":"\u5f00\u5c01","firstC":"K","active":false}],"L":[{"id":102037,"domain":"linhai","name":"\u4e34\u6d77","firstC":"L","active":false},{"id":100538,"domain":"langzhong","name":"\u9606\u4e2d","firstC":"L","active":false},{"id":1003281,"domain":"liuyang","name":"\u6d4f\u9633","firstC":"L","active":false},{"id":100729,"domain":"lixin","name":"\u5229\u8f9b","firstC":"L","active":false},{"id":100103988,"domain":"lujiang","name":"\u5e90\u6c5f","firstC":"L","active":false},{"id":1002463,"domain":"liangzhou","name":"\u51c9\u5dde","firstC":"L","active":false},{"id":1001719,"domain":"laoting","name":"\u4e50\u4ead","firstC":"L","active":false},{"id":100929,"domain":"longkou","name":"\u9f99\u53e3","firstC":"L","active":false},{"id":1002105,"domain":"longyou","name":"\u9f99\u6e38","firstC":"L","active":false},{"id":100931,"domain":"laizhou","name":"\u83b1\u5dde","firstC":"L","active":false},{"id":100930,"domain":"laiyang","name":"\u83b1\u9633","firstC":"L","active":false},{"id":1001504,"domain":"liyang","name":"\u6ea7\u9633","firstC":"L","active":false},{"id":100569,"domain":"linshui","name":"\u90bb\u6c34","firstC":"L","active":false},{"id":1002312,"domain":"lixian","name":"\u6fa7\u53bf","firstC":"L","active":false},{"id":1002376,"domain":"leiyang","name":"\u8012\u9633","firstC":"L","active":false},{"id":1001216,"domain":"lianzhou","name":"\u8fde\u5dde","firstC":"L","active":false},{"id":10,"domain":"langfang","name":"\u5eca\u574a","firstC":"L","active":false},{"id":37,"domain":"liupanshui","name":"\u516d\u76d8\u6c34","firstC":"L","active":false},{"id":47,"domain":"luzhou","name":"\u6cf8\u5dde","firstC":"L","active":false},{"id":50,"domain":"leshan","name":"\u4e50\u5c71","firstC":"L","active":false},{"id":62,"domain":"liaoyang","name":"\u8fbd\u9633","firstC":"L","active":false},{"id":71,"domain":"lianyungang","name":"\u8fde\u4e91\u6e2f","firstC":"L","active":false},{"id":82,"domain":"longyan","name":"\u9f99\u5ca9","firstC":"L","active":false},{"id":104,"domain":"luoyang","name":"\u6d1b\u9633","firstC":"L","active":false},{"id":110,"domain":"luohe","name":"\u6f2f\u6cb3","firstC":"L","active":false},{"id":132,"domain":"luan","name":"\u516d\u5b89","firstC":"L","active":false},{"id":133,"domain":"liuzhou","name":"\u67f3\u5dde","firstC":"L","active":false},{"id":141,"domain":"laibin","name":"\u6765\u5bbe","firstC":"L","active":false},{"id":164,"domain":"linfen","name":"\u4e34\u6c7e","firstC":"L","active":false},{"id":166,"domain":"lz","name":"\u5170\u5dde","firstC":"L","active":false},{"id":213,"domain":"loudi","name":"\u5a04\u5e95","firstC":"L","active":false},{"id":230,"domain":"lijiang","name":"\u4e3d\u6c5f","firstC":"L","active":false},{"id":232,"domain":"lincang","name":"\u4e34\u6ca7","firstC":"L","active":false},{"id":285,"domain":"lishui","name":"\u4e3d\u6c34","firstC":"L","active":false},{"id":307,"domain":"linyi","name":"\u4e34\u6c82","firstC":"L","active":false},{"id":309,"domain":"liaocheng","name":"\u804a\u57ce","firstC":"L","active":false},{"id":323,"domain":"lvliang1","name":"\u5415\u6881","firstC":"L","active":false},{"id":324,"domain":"lonnan","name":"\u9647\u5357","firstC":"L","active":false}],"M":[{"id":100728,"domain":"mengcheng","name":"\u8499\u57ce","firstC":"M","active":false},{"id":1002966,"domain":"mianxian","name":"\u52c9\u53bf","firstC":"M","active":false},{"id":100601,"domain":"miyi","name":"\u7c73\u6613","firstC":"M","active":false},{"id":22,"domain":"maoming","name":"\u8302\u540d","firstC":"M","active":false},{"id":49,"domain":"mianyang","name":"\u7ef5\u9633","firstC":"M","active":false},{"id":126,"domain":"maanshan","name":"\u9a6c\u978d\u5c71","firstC":"M","active":false},{"id":266,"domain":"meizhou","name":"\u6885\u5dde","firstC":"M","active":false},{"id":279,"domain":"meishan","name":"\u7709\u5c71","firstC":"M","active":false},{"id":302,"domain":"mudanjiang","name":"\u7261\u4e39\u6c5f","firstC":"M","active":false}]};&#13;

        var cityRight = {"N":[{"id":101664,"domain":"nankang","name":"\u5357\u5eb7","firstC":"N","active":false},{"id":1002058,"domain":"ninghai","name":"\u5b81\u6d77","firstC":"N","active":false},{"id":27,"domain":"nb","name":"\u5b81\u6ce2","firstC":"N","active":false}

        strCityLeft = '';&#13;

        strCityRight = '';&#13;

&#13;

        if (cityLeft) {&#13;

            $.each(cityLeft, function (keyLeft, objLeft) {&#13;

                strCityLeft += '&lt;dl&gt;&lt;dt&gt;' + keyLeft + '&lt;/dt&gt;&lt;dd&gt;';&#13;

                $.each(objLeft, function (keyLeftInfo, objLeftInfo) {&#13;

                    strCityLeft += '&lt;a data-gzlog="tracking_type=click&amp;amp;eventid=0020060000000017&amp;amp;select_city=' + objLeftInfo.domain + '" class="' + (objLeftInfo.active ? 'on' : '') + '" href="/' + objLeftInfo.domain + '/' + url + '" title="' + objLeftInfo.name + '&#20108;&#25163;&#36710;"&gt;' + objLeftInfo.name + '&lt;/a&gt;';&#13;

                });&#13;

                strCityLeft += '&lt;/dd&gt;&lt;/dl&gt;';&#13;

            });&#13;

        }&#13;

        if (cityRight) {&#13;

            $.each(cityRight, function (keyRight, objRight) {&#13;

                strCityRight += '&lt;dl&gt;&lt;dt&gt;' + keyRight + '&lt;/dt&gt;&lt;dd&gt;';&#13;

                $.each(objRight, function (keyRightInfo, objRightInfo) {&#13;

                    strCityRight += '&lt;a data-gzlog="tracking_type=click&amp;amp;eventid=0020060000000017&amp;amp;select_city=' + objRightInfo.domain + '" class="' + (objRightInfo.active ? 'on' : '') + '" href="/' + objRightInfo.domain + '/' + url + '" title="' + objRightInfo.name + '&#20108;&#25163;&#36710;"&gt;' + objRightInfo.name + '&lt;/a&gt;';&#13;

                });&#13;

                strCityRight += '&lt;/dd&gt;&lt;/dl&gt;';&#13;

            });&#13;

        }&#13;

&#13;

        $('#cityLeft').html(strCityLeft);&#13;

        $('#cityRight').html(strCityRight);&#13;

    }&#13;

</script>&#13;

&#13;

<div class="top-banner-app"/>&#13;

&#13;

<div class="list-wrap js-post">&#13;

    &#13;

    &#13;

    <div class="crumbs-search" id="bread">&#13;

                <div class="crumbs">&#13;

        <a href="//www.guazi.com/www/">&#29916;&#23376;&#20108;&#25163;&#36710;</a>&gt;&#20840;&#22269;&#20108;&#25163;&#36710;        </div>&#13;

        &#13;

        <div class="search js-search">&#13;

            <div class="search-box suggestion_widget" data-default-count="9">&#13;

                <input type="text" class="search-input js_search_input_index" placeholder="&#25628;&#32034;&#24744;&#24819;&#35201;&#30340;&#36710;" data-role="keywordInput" name="keyword" autocomplete="off" data-domain="www"/>&#13;

                <button class="search-btn" data-gzlog="tracking_type=click&amp;eventid=0020070000000022" type="button"/>&#13;

                <input type="hidden" value="www" name="hiddenCity"/>&#13;

            </div>&#13;

            <ul class="search-select" style="display: none;">&#13;

                <li class="select-tit">&#28909;&#38376;&#25512;&#33616;</li>&#13;

                <li>&#22823;&#20247;</li>&#13;

                <li>&#22823;&#20247;</li>&#13;

                <li>&#22823;&#20247;</li>&#13;

                <li>&#22823;&#20247;</li>&#13;

            </ul>&#13;

        </div>&#13;

    </div>&#13;

    &#13;

    &#13;


            


                        

</body>&#13;

</html>


正在回答

登陆购买课程后可参与讨论,去登陆

3回答

同学你好:

1、xpath是一个扩展工具,适应所有的网站,不会出现打不开的现象。

http://img1.sycdn.imooc.com//climg/5e5ca24909b3186613570329.jpg

2、如果真的打不开,建议同学检查一下扩展程序。或者需要同学自己去确定了。

3、根据同学提供的html页面,同学的xpath写的是正确的,可能是由于html不太完整造成无法获取对应的值。

建议同学执行下面的代码后,从得到的html中重新获取:

import json

import requests
#通过execjs这个包,来解析js
import execjs
import re
# from guazi_scrapy_project.guazi_scrapy_project.handle_mongo import mongo
#我们请求城市的接口
from lxml import etree

url = 'https://www.guazi.com/www/buy'
#cookie值要删掉,否则对方会根据这个值发现我们,并且屏蔽我们
#要通过正则表达式处理请求头,里面有空格,大家一定要注意
header = {
    "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
    "Accept-Encoding":"gzip, deflate, br",
    "Accept-Language":"zh-CN,zh;q=0.9",
    "Connection":"keep-alive",
    "Host":"www.guazi.com",
    "Referer":"https://www.guazi.com/www/buy",
    "Upgrade-Insecure-Requests":"1",
    "User-Agent":"Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3610.2 Safari/537.36",
}
response = requests.get(url=url,headers=header)
#设置返回的编码
response.encoding = 'utf-8'
if '正在打开中,请稍后' in response.text:
    #通过正则表达式获取了相关的字段和值
    value_search = re.compile(r"anti\('(.*?)','(.*?)'\);")
    string = value_search.search(response.text).group(1)
    key = value_search.search(response.text).group(2)
    #读取,我们破解的js文件
    with open('guazi.js','r',encoding="utf-8") as f:
        f_read = f.read()
    #使用execjs包来封装这段JS,传入的是读取后的js文件
    js = execjs.compile(f_read)
    js_return = js.call('anti',string,key)
    cookie_value = 'antipas='+js_return
    header['Cookie'] = cookie_value
    response_second = requests.get(url=url,headers=header)

    with open('guazi.html','w',encoding="utf-8") as f:
        f.write(response_second.text)

然后通过xpath和正则获取对应的城市列表:

with open('guazi.html','r',encoding="utf-8") as f:

    r = f.read()

guazi_html = etree.HTML(r)
script_js = guazi_html.xpath("//script[3]/text()")[0]
city_search = re.compile(r'({.*?});')
city = city_search.findall(script_js)

如果我解决了同学的问题,请采纳!学习愉快^_^。

  • 霸气小肆毛 提问者 #1
    老师,好像本地的HTML文件都打不开xpath helper,您再看看呢0.0
    2020-03-02 16:15:05
  • 好帮手乔木 回复 提问者 霸气小肆毛 #2
    同学你好:xpath是无法作用在本地内容的,同学可以去搜搜在线的xpath解析。学习愉快^_^。
    2020-03-02 19:24:16
  • 霸气小肆毛 提问者 回复 好帮手乔木 #3
    好的,谢谢老师
    2020-03-02 19:30:57
提问者 霸气小肆毛 2020-03-02 19:40:33

老师,对于城市名和汽车品牌名,这种方式行不行啊?

我的思路是找出所有的汽车名与城市名,两者做嵌套循环http://img1.sycdn.imooc.com//climg/5e5cf04209fa8e8212910635.jpg

http://img1.sycdn.imooc.com//climg/5e5cf081097bb4a912650529.jpg

  • 同学你好:可以按照同学的想法去实现,但是后面获取数据时要根据同学定义的方式去提取数据。学习愉快^_^。
    2020-03-03 09:42:23
提问者 霸气小肆毛 2020-03-02 00:32:52

老师我想要从test2.html获得cityLeft中的数据

这两个xpath语句的返回结果不应该没有数据啊,第一个是我是从浏览器中拷贝下来的:第二个是我自己写的,

麻烦老师在帮我看下错在哪里:两个打印rest的结果都是IndexError: list index out of range,肯定是没有数据,那是哪里不对呢

1)rest = guazi_html.xpath("/html/body/div/div/script/text()")[0]、

2)rest = guazi_html.xpath("//div[@class='city-box-parent']/div[@class='city-scroll']/script/text()")[0]


问题已解决,确定采纳
还有疑问,暂不采纳

恭喜解决一个难题,获得1积分~

来为老师/同学的回答评分吧

0 星
4.入门主流框架Scrapy与爬虫项目实战
  • 参与学习           人
  • 提交作业       107    份
  • 解答问题       1672    个

Python最广为人知的应用就是爬虫了,有趣且酷的爬虫技能并没有那么遥远,本阶段带你学会利用主流Scrapy框架完成爬取招聘网站和二手车网站的项目实战。

了解课程
请稍等 ...
意见反馈 帮助中心 APP下载
官方微信

在线咨询

领取优惠

免费试听

领取大纲

扫描二维码,添加
你的专属老师