{"id":1646,"date":"2022-05-17T03:22:12","date_gmt":"2022-05-17T03:22:12","guid":{"rendered":"https:\/\/www.ipcpu.com\/?p=1646"},"modified":"2022-07-31T03:23:04","modified_gmt":"2022-07-31T03:23:04","slug":"python-requests-encoding","status":"publish","type":"post","link":"https:\/\/c.ipcpu.com\/2022\/05\/python-requests-encoding\/","title":{"rendered":"Python\u722c\u866brequests\u722c\u53d6\u9875\u9762\u7684\u7f16\u7801\u95ee\u9898"},"content":{"rendered":"

Python\u722c\u866brequests\u722c\u53d6\u9875\u9762\u7684\u7f16\u7801\u95ee\u9898.md<\/p>\n

\u53d1\u73b0\u7684\u95ee\u9898<\/h2>\n

\u5f88\u591a\u65f6\u5019\uff0c\u6211\u4eec\u53d1\u73b0\uff0crequests\u5e93\u8fd4\u56de\u7684\u9875\u9762\u7f16\u7801\u90fd\u662fISO-8859-1\uff0c\u9700\u8981\u6307\u5b9a\u4e3aUTF-8\u624d\u80fd\u6b63\u786e\u8bfb\u53d6\u3002\u8fd9\u662f\u4e3a\u4ec0\u4e48\u5462\uff1f\u6211\u4eec\u6765\u770b\u4e0brequests\u6587\u6863\u4e2d\u7684\u63cf\u8ff0\u3002<\/p>\n

https:\/\/requests.readthedocs.io\/en\/latest\/user\/advanced\/<\/a>
\n\u8fd9\u91cc\u53ea\u8d34\u4e0a\u4e2d\u6587\u7ffb\u8bd1
\n<\u5f15\u7528\u5f00\u59cb>
\n\u5f53\u60a8\u6536\u5230\u54cd\u5e94\u65f6\uff0cRequests \u4f1a\u5728\u60a8\u8bbf\u95ee\u5c5e\u6027\u65f6\u731c\u6d4b\u7528\u4e8e\u89e3\u7801\u54cd\u5e94\u7684\u7f16\u7801\u3002\u8bf7\u6c42\u5c06\u9996\u5148\u68c0\u67e5 HTTP \u6807\u5934\u4e2d\u7684\u7f16\u7801\uff0c\u5982\u679c\u4e0d\u5b58\u5728\uff0c\u5c06\u4f7f\u7528 charset_normalizer \u6216chardet\u5c1d\u8bd5\u731c\u6d4b\u7f16\u7801\u3002
\n\u5982\u679cchardet\u5df2\u5b89\u88c5\uff0crequests\u5219\u4f7f\u7528\u5b83\uff0c\u4f46\u662f\u5bf9\u4e8e python3 chardet\u4e0d\u518d\u662f\u5f3a\u5236\u4f9d\u8d56\u9879\u3002\u8be5chardet \u5e93\u662f\u4e00\u4e2a LGPL \u8bb8\u53ef\u7684\u4f9d\u8d56\u9879\uff0c\u4e00\u4e9b\u8bf7\u6c42\u7528\u6237\u4e0d\u80fd\u4f9d\u8d56\u4e8e\u5f3a\u5236\u6027 LGPL \u8bb8\u53ef\u7684\u4f9d\u8d56\u9879\u3002
\n\u5f53\u60a8\u5b89\u88c5\u65f6requests\u6ca1\u6709\u6307\u5b9a[use_chardet_on_py3]\u989d\u5916\u7684\uff0c\u5e76\u4e14chardet\u5c1a\u672a\u5b89\u88c5\uff0crequests\u4f7f\u7528charset-normalizer \uff08MIT-licensed\uff09\u6765\u731c\u6d4b\u7f16\u7801\u3002
\nRequests \u552f\u4e00\u4e0d\u4f1a\u731c\u6d4b\u7f16\u7801\u7684\u60c5\u51b5\u662f\uff0c\u5982\u679c HTTP \u6807\u5934\u4e2d\u4e0d\u5b58\u5728\u663e\u5f0f\u5b57\u7b26\u96c6\u5e76\u4e14\u6807Content-Type \u5934\u5305\u542btext. \u5728\u8fd9\u79cd\u60c5\u51b5\u4e0b\uff0cRFC 2616\u6307\u5b9a\u9ed8\u8ba4\u5b57\u7b26\u96c6\u5fc5\u987b\u662fISO-8859-1. \u5728\u8fd9\u79cd\u60c5\u51b5\u4e0b\uff0c\u8bf7\u6c42\u9075\u5faa\u89c4\u8303\u3002\u5982\u679c\u4f60\u9700\u8981\u4e0d\u540c\u7684\u7f16\u7801\uff0c\u4f60\u53ef\u4ee5\u624b\u52a8\u8bbe\u7f6eResponse.encoding \u5c5e\u6027\uff0c\u6216\u8005\u4f7f\u7528 raw Response.content\u3002
\n<\u5f15\u7528\u7ed3\u675f>
\n\u8fd9\u91cc\u6700\u540e\u4e00\u6bb5\u7684\u63cf\u8ff0\uff0c\u7279\u522b\u91cd\u8981\uff0c\u5982\u679c\u6536\u5230\u7684\u54cd\u5e94HTTP\u5934\u90e8\u4e3a Content-Type: text\/html \uff0c\u90a3\u4e48\u4e0d\u4f1a\u53bb\u731c\u6d4b\u7f16\u7801\uff0c\u76f4\u63a5\u8fd4\u56de\u7f16\u7801\u662fISO-8859-1\u3002<\/p>\n

\u4ec0\u4e48\u65f6\u5019\u4f1a\u51fa\u73b0 Content-Type: text\/html \u8fd9\u79cd\u60c5\u51b5\u5462\uff1f \u8fd9\u5c31\u975e\u5e38\u975e\u5e38\u591a\u4e86\uff0c \u6bd4\u5982\u6700\u5178\u578b\u7684\u548c\u5e38\u89c1\u7684\uff0c nginx \u6ca1\u6709\u8bbe\u7f6echarset \u3002<\/p>\n

\u6ce8\u610fTIPS\uff1a\u8fd9\u91cc\u8bf4\u7684\u662fHTTP\u5934\u90e8\u4e2d\u7684Content-Type\u5b57\u6bb5\uff0c\u4e0d\u662fHTML\u9875\u9762\u4e2d\u7684 <meta http-equiv=\"Content-Type\" content=\"text\/html; charset=utf-8\" \/><\/code> \u8fd9\u662f\u4e24\u4e2a\u4e0d\u540c\u7684\u5b57\u6bb5\u3002\u5bf9\u4e8e\u6d4f\u89c8\u5668\u6765\u8bf4\uff0c\u5148\u4f1a\u91c7\u7528HTTP\u5934\u90e8\u4e2d\u7684Content-Type\u5b57\u6bb5\u7684\u7f16\u7801\uff0c\u89e3\u6790\u8fc7\u7a0b\u4e2d\u5982\u679c\u53d1\u73b0\u9875\u9762\u4e2dContent-Type\u5b57\u6bb5\u6709\u6307\u5b9a\u7f16\u7801\uff0c\u4f1a\u81ea\u52a8\u9002\u914d\u65b0\u7684\u7f16\u7801\u3002<\/em><\/p>\n

\u8fd9\u79cd\u4e3a\u4e86\u9075\u5faaRFC\uff0c\u800c\u9020\u6210\u7684\u4e0d\u4fbf\uff0c\u4e5f\u88ab\u597d\u591a\u4eba\u5410\u69fd\u3002
\n<\/p>\n

\u63a8\u8350\u7684\u65b9\u6cd5<\/h2>\n

\u660e\u767d\u4e86\u539f\u7406\uff0c\u6211\u4eec\u6765\u8bf4\u4e0b\u5982\u4f55\u4f18\u5316\u3002
\nrequests\u63d0\u4f9b\u4e86apparent_encoding \u6765\u5bf9\u7f16\u7801\u8fdb\u884c\u731c\u6d4b\uff0c\u6240\u4ee5\u6211\u4eec\uff0c\u53ef\u4ee5\u76f4\u63a5\u6307\u5b9a apparent_encoding \u6765\u89e3\u7801\u3002<\/p>\n

\n
r= requests.get(url='http:\/\/www.stats.gov.cn\/')\n#\u9ed8\u8ba4\u8fd4\u56de\u7684\u7f16\u7801\nprint(r.encoding)\n#\u731c\u6d4b\u7684\u7f16\u7801\nprint(r.apparent_encoding)\n#\u4f7f\u7528\u731c\u6d4b\u7684\u7f16\u7801\u89e3\u7801\nprint(html.decode(r.apparent_encoding))<\/code><\/pre>\n<\/div>\n

\u4f46\u662f\u5462\uff0c \u4e00\u822c\u6211\u4eec\u5904\u7406\u7684\u662f\u4e2d\u6587\uff0c\u4e2d\u6587\u6709GB2312\u3001GBK\u3001GB19030 \u4e09\u79cd\u7f16\u7801\u65b9\u5f0f\uff0cGB19030\u662f\u6700\u65b0\u4e5f\u662f\u6700\u5168\u7684\u7f16\u7801\uff0c\u6709\u65f6\u5019\u4f1a\u53d1\u73b0\u7f51\u7ad9\u7f16\u7801\u662fGB2312\u7684\uff0c\u4f46\u662f\u5374\u5305\u542b\u4e86GB19030\u624d\u80fd\u89e3\u6790\u7684\u5b57\u7b26\u3002\u56e0\u6b64\u5982\u679c\u53d1\u73b0\u4e2d\u6587\uff0c\u90fd\u6307\u5b9aGB19030 \u624d\u662f\u6b63\u786e\u7684\u9009\u62e9\u3002<\/p>\n

\n
r = requests.get(url='http:\/\/www.stats.gov.cn\/tjsj\/tjbz\/tjyqhdmhcxhfdm\/2017\/45\/14\/25\/451425202.html')\n\nencode_type = r.apparent_encoding\nif r.apparent_encoding in ['GB2312', 'GBK', 'GB19030']:\n    encode_type = 'GB18030'\n\nhtml = r.content\nprint(html.decode(encode_type))<\/code><\/pre>\n<\/div>\n

\u6709\u4e9b\u540c\u5b66\u7528chardet.detect()<\/code> \u6765\u8fdb\u884c\u731c\u6d4b\uff0c\u8fd8\u9700\u8981\u5355\u72ec\u5b89\u88c5chardet\uff0c\u6709\u70b9\u591a\u4f59\uff0c\u4e0d\u63a8\u8350\u3002<\/p>\n

\u8f6c\u8f7d\u8bf7\u6ce8\u660e\uff1aIPCPU-\u7f51\u7edc\u4e4b\u8def<\/a> » Python\u722c\u866brequests\u722c\u53d6\u9875\u9762\u7684\u7f16\u7801\u95ee\u9898<\/a><\/p>","protected":false},"excerpt":{"rendered":"

Python\u722c\u866brequests\u722c\u53d6\u9875\u9762\u7684\u7f16\u7801\u95ee\u9898.md \u53d1\u73b0\u7684\u95ee\u9898 \u5f88\u591a\u65f6\u5019\uff0c\u6211\u4eec\u53d1\u73b0\uff0crequests\u5e93\u8fd4\u56de\u7684\u9875\u9762\u7f16\u7801\u90fd\u662fISO-8859-1\uff0c\u9700\u8981\u6307\u5b9a\u4e3aUTF-8\u624d\u80fd\u6b63\u786e\u8bfb\u53d6\u3002\u8fd9\u662f\u4e3a\u4ec0\u4e48\u5462\uff1f\u6211\u4eec\u6765\u770b\u4e0brequests\u6587\u6863\u4e2d\u7684\u63cf\u8ff0\u3002 https:\/\/requests.readthedocs.io\/en\/latest\/user\/advanced\/ \u8fd9\u91cc\u53ea\u8d34\u4e0a\u4e2d\u6587\u7ffb\u8bd1 <\u5f15\u7528\u5f00\u59cb> \u5f53\u60a8\u6536\u5230\u54cd\u5e94\u65f6\uff0cRequests \u4f1a\u5728\u60a8\u8bbf\u95ee\u5c5e\u6027\u65f6\u731c\u6d4b\u7528\u4e8e\u89e3\u7801\u54cd\u5e94\u7684\u7f16\u7801\u3002\u8bf7\u6c42\u5c06\u9996\u5148\u68c0\u67e5 HTTP \u6807\u5934\u4e2d\u7684\u7f16\u7801\uff0c\u5982\u679c\u4e0d\u5b58\u5728\uff0c\u5c06\u4f7f\u7528 charset_normalizer \u6216chardet\u5c1d\u8bd5\u731c\u6d4b\u7f16\u7801\u3002 \u5982\u679cchardet\u5df2\u5b89\u88c5\uff0crequests\u5219\u4f7f\u7528\u5b83\uff0c\u4f46\u662f\u5bf9\u4e8e python3 chardet\u4e0d\u518d\u662f\u5f3a\u5236\u4f9d\u8d56\u9879\u3002\u8be5chardet \u5e93\u662f\u4e00\u4e2a LGPL \u8bb8\u53ef\u7684\u4f9d\u8d56\u9879\uff0c\u4e00\u4e9b\u8bf7\u6c42\u7528\u6237\u4e0d\u80fd\u4f9d\u8d56\u4e8e\u5f3a\u5236\u6027 LGPL \u8bb8\u53ef\u7684\u4f9d\u8d56\u9879\u3002 \u5f53\u60a8\u5b89\u88c5\u65f6requests\u6ca1\u6709\u6307\u5b9a[use_chardet_on_py3]\u989d\u5916\u7684\uff0c\u5e76\u4e14chardet\u5c1a\u672a\u5b89\u88c5\uff0crequests\u4f7f\u7528charset-normalizer \uff08MIT-licensed\uff09\u6765\u731c\u6d4b\u7f16\u7801\u3002 Requests \u552f\u4e00\u4e0d\u4f1a\u731c\u6d4b\u7f16\u7801\u7684\u60c5\u51b5\u662f\uff0c\u5982\u679c HTTP \u6807\u5934\u4e2d\u4e0d\u5b58\u5728\u663e\u5f0f\u5b57\u7b26\u96c6\u5e76\u4e14\u6807Content-Type \u5934\u5305\u542btext. \u5728\u8fd9\u79cd\u60c5\u51b5\u4e0b\uff0cRFC 2616\u6307\u5b9a\u9ed8\u8ba4\u5b57\u7b26\u96c6\u5fc5\u987b\u662fISO-8859-1. \u5728\u8fd9\u79cd\u60c5\u51b5\u4e0b\uff0c\u8bf7\u6c42\u9075\u5faa\u89c4\u8303\u3002\u5982\u679c\u4f60\u9700\u8981\u4e0d\u540c\u7684\u7f16\u7801\uff0c\u4f60\u53ef\u4ee5\u624b\u52a8\u8bbe\u7f6eResponse.encoding \u5c5e\u6027\uff0c\u6216\u8005\u4f7f\u7528 raw Response.content\u3002 <\u5f15\u7528\u7ed3\u675f> \u8fd9\u91cc\u6700\u540e\u4e00\u6bb5\u7684\u63cf\u8ff0\uff0c\u7279\u522b\u91cd\u8981\uff0c\u5982\u679c\u6536\u5230\u7684\u54cd\u5e94HTTP\u5934\u90e8\u4e3a Content-Type: text\/html \uff0c\u90a3\u4e48\u4e0d\u4f1a\u53bb\u731c\u6d4b\u7f16\u7801\uff0c\u76f4\u63a5\u8fd4\u56de\u7f16\u7801\u662fISO-8859-1\u3002 \u4ec0\u4e48\u65f6\u5019\u4f1a\u51fa\u73b0 Content-Type: text\/html \u8fd9\u79cd\u60c5\u51b5\u5462\uff1f \u8fd9\u5c31\u975e\u5e38\u975e\u5e38\u591a\u4e86\uff0c \u6bd4\u5982\u6700\u5178\u578b\u7684\u548c\u5e38\u89c1\u7684\uff0c nginx \u6ca1\u6709\u8bbe\u7f6echarset \u3002 \u6ce8\u610fTIPS\uff1a\u8fd9\u91cc\u8bf4\u7684\u662fHTTP\u5934\u90e8\u4e2d\u7684Content-Type\u5b57\u6bb5\uff0c\u4e0d\u662fHTML\u9875\u9762\u4e2d\u7684 <meta http-equiv=”Content-Type” content=”text\/html; charset=utf-8″ \/> \u8fd9\u662f\u4e24\u4e2a\u4e0d\u540c\u7684\u5b57\u6bb5\u3002\u5bf9\u4e8e\u6d4f\u89c8\u5668\u6765\u8bf4\uff0c\u5148\u4f1a\u91c7\u7528HTTP\u5934\u90e8\u4e2d\u7684Content-Type\u5b57\u6bb5\u7684\u7f16\u7801\uff0c\u89e3\u6790\u8fc7\u7a0b\u4e2d\u5982\u679c\u53d1\u73b0\u9875\u9762\u4e2dContent-Type\u5b57\u6bb5\u6709\u6307\u5b9a\u7f16\u7801\uff0c\u4f1a\u81ea\u52a8\u9002\u914d\u65b0\u7684\u7f16\u7801\u3002 \u8fd9\u79cd\u4e3a\u4e86\u9075\u5faaRFC\uff0c\u800c\u9020\u6210\u7684\u4e0d\u4fbf\uff0c\u4e5f\u88ab\u597d\u591a\u4eba\u5410\u69fd\u3002 […]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[59],"_links":{"self":[{"href":"https:\/\/c.ipcpu.com\/wp-json\/wp\/v2\/posts\/1646"}],"collection":[{"href":"https:\/\/c.ipcpu.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/c.ipcpu.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/c.ipcpu.com\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/c.ipcpu.com\/wp-json\/wp\/v2\/comments?post=1646"}],"version-history":[{"count":1,"href":"https:\/\/c.ipcpu.com\/wp-json\/wp\/v2\/posts\/1646\/revisions"}],"predecessor-version":[{"id":1647,"href":"https:\/\/c.ipcpu.com\/wp-json\/wp\/v2\/posts\/1646\/revisions\/1647"}],"wp:attachment":[{"href":"https:\/\/c.ipcpu.com\/wp-json\/wp\/v2\/media?parent=1646"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/c.ipcpu.com\/wp-json\/wp\/v2\/categories?post=1646"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/c.ipcpu.com\/wp-json\/wp\/v2\/tags?post=1646"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}