Hoovering a website
Very frequently, it is of interest to scan a website and extract information from specific tags. This basic mechanism can be used to trawl the web in search of useful bits of information. At other times you need to get a list of <IMG>
tags and the SRC
attribute, or <A>
tags and the corresponding HREF
attribute. The possibilities are endless.
How to do it...
- First of all, we need to grab the contents of the target website. At first glance it seems that we should make a cURL request, or simply use
file_get_contents()
. The problem with these approaches is that we will end up having to do a massive amount of string manipulation, most likely having to make inordinate use of the dreaded regular expression. In order to avoid all of ...
Get PHP 7: Real World Application Development now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.