Skip to Content
Python数据处理
book

Python数据处理

by Jacqueline Kazil, Katharine Jarmul
July 2017
Intermediate to advanced
398 pages
11h 54m
Chinese
Posts & Telecom Press
Content preview from Python数据处理
232
11
11-8:有很多页面的网络标签
你可以立即看到这个页面正在处理更多的请求。点击每一个请求,你可以看到每一个请求
加载的内容。请求顺序显示在网络标签中的时间线上。这可以帮助你理解如何抓取和处理
页面,来得到需要的内容。
通过点击每一个请求,可以看到初始页面加载后再加载大部分内容。点击初始页面的请
求,会发现并没有什么内容。我们想要问的第一个问题是:这里是否有一个
JavaScript
求或其他的请求使用
JSON
加载内容?如果有的话,对于我们的脚本来说,这可能是一个
恰当的“快捷方式”。
你知道如何解析和读取
JSON
(第
3
章),所以如果你在网络标签中找到一个
URL
,伴随着一个
JSON
响应,其中保存着你需要的数据,那么你可以使用
这个
URL
来获得数据,之后直接从响应中解析数据。你需要意识到所有可
能在请求中需要发送的头部(展示在网络标签中的头部小节),以得到正确
的响应。
如果这里没有简单的
JSON URL
匹配你需要的信息,或者信息散落在几个不同的的请求
中,需要人工整合它们到一起,那么可以确定,你需要使用一个基于浏览器的方法来抓取
站点。基于浏览器的网页抓取允许你读取看到的页面,而不仅是每一个请求。如果你需要
在正确抓取内容之前同一个下拉菜单交互,或执行一系列基于浏览器的操作,这可以很有
用。
网络标签帮助你找到包含所需内容的请求,以及是否有优秀的备选数据源。我们接下来会
查看
JavaScript
,看看这是否也会提供一些关于抓取器的想法。
11.2.3
 控制台
JavaScript
交互
现在已经分析了页面的标记和结构 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

数据科学中的实用统计学(第2版)

数据科学中的实用统计学(第2版)

Peter Bruce, Andrew Bruce, Peter Gedeck
Java持续交付

Java持续交付

Daniel Bryant, Abraham Marín-Pérez
解密金融数据

解密金融数据

Justin Pauley

Publisher Resources

ISBN: 9787115459190