Book description
There's a wealth of data online, but sorting and gathering it by hand can be tedious and time consuming. Rather than click through page after endless page, why not let bots do the work for you? Webbots, Spiders, and Screen Scrapers will show you how to create simple programs with PHP/CURL to mine, parse, and archive online data to help you make informed decisions.
Table of contents
- Webbots, Spiders, and Screen Scrapers: A Guide to Developing Internet Agents with PHP/CURL
- About the Author
- About the Technical Reviewer
- Acknowledgments
- Introduction
-
I. Fundamental Concepts and Techniques
- 1. What’s in It for You?
- 2. Ideas for Webbot Projects
- 3. Downloading Web Pages
- 4. Basic Parsing Techniques
-
5. Advanced Parsing with Regular Expressions
- Pattern Matching, the Key to Regular Expressions
- PHP Regular Expression Types
- Learning Patterns Through Examples
- Regular Expressions of Particular Interest to Webbot Developers
- When Regular Expressions Are (or Aren’t) the Right Parsing Tool
- Final Thoughts
- 6. Automating Form Submission
- 7. Managing Large Amounts of Data
-
II. Projects
- 8. Price-Monitoring Webbots
- 9. Image-Capturing Webbots
- 10. Link-Verification Webbots
- 11. Search-Ranking Webbots
- 12. Aggregation Webbots
- 13. FTP Webbots
- 14. Webbots That Read Email
- 15. Webbots That Send Email
- 16. Converting a Website into a Function
-
III. Advanced Technical Considerations
- 17. Spiders
- 18. Procurement Webbots and Snipers
- 19. Webbots and Cryptography
- 20. Authentication
- 21. Advanced Cookie Management
- 22. Scheduling Webbots and Spiders
- 23. Scraping Difficult Websites with Browser Macros
- 24. Hacking iMacros
- 25. Deployment and Scaling
-
IV. Larger Considerations
- 26. Designing Stealthy Webbots and Spiders
- 27. Proxies
- 28. Writing Fault-Tolerant Webbots
- 29. Designing Webbot-Friendly Websites
- 30. Killing Spiders
- 31. Keeping Webbots out of Trouble
-
A. PHP/CURL Reference
- Creating a Minimal PHP/CURL Session
- Initiating PHP/CURL Sessions
-
Setting PHP/CURL Options
- CURLOPT_URL
- CURLOPT_RETURNTRANSFER
- CURLOPT_REFERER
- CURLOPT_FOLLOWLOCATION and CURLOPT_MAXREDIRS
- CURLOPT_USERAGENT
- CURLOPT_NOBODY and CURLOPT_HEADER
- CURLOPT_TIMEOUT
- CURLOPT_COOKIEFILE and CURLOPT_COOKIEJAR
- CURLOPT_HTTPHEADER
- CURLOPT_SSL_VERIFYPEER
- CURLOPT_USERPWD and CURLOPT_UNRESTRICTED_AUTH
- CURLOPT_POST and CURLOPT_POSTFIELDS
- CURLOPT_VERBOSE
- CURLOPT_PORT
- Executing the PHP/CURL Command
- Closing PHP/CURL Sessions
- B. Status Codes
- C. SMS Gateways
- Index
- About the Author
- Colophon
Product information
- Title: Webbots, Spiders, and Screen Scrapers, 2nd Edition
- Author(s):
- Release date: March 2012
- Publisher(s): No Starch Press
- ISBN: 9781593273972
You might also like
book
Learning Modern Linux
If you use Linux in development or operations and need a structured approach to help you …
book
Programming Rust, 2nd Edition
Systems programming provides the foundation for the world's computation. Writing performance-sensitive code requires a programming language …
book
How Linux Works, 3rd Edition
Unlike some operating systems, Linux doesn’t try to hide the important bits from you—it gives you …
book
The Linux Command Line, 2nd Edition
The Linux Command Line takes you from your very first terminal keystrokes to writing full programs …