Skip to Content
PHP Cookbook
book

PHP Cookbook

by David Sklar, Adam Trachtenberg
November 2002
Intermediate to advanced
640 pages
16h 33m
English
O'Reilly Media, Inc.
Content preview from PHP Cookbook

11.8. Extracting Links from an HTML File

Problem

You need to extract the URLs that are specified inside an HTML document.

Solution

Use the pc_link_extractor( ) function shown in Example 11-2.

Example 11-2. pc_link_extractor( )

function pc_link_extractor($s) {
  $a = array();
  if (preg_match_all('/<a\s+.*?href=[\"\']?([^\"\' >]*)[\"\']?[^>]*>(.*?)<\/a>/i',
                     $s,$matches,PREG_SET_ORDER)) {
    foreach($matches as $match) {
      array_push($a,array($match[1],$match[2]));
    }
  }
  return $a;
}

For example:

$links = pc_link_extractor($page);

Discussion

The pc_link_extractor( ) function returns an array. Each element of that array is itself a two-element array. The first element is the target of the link, and the second element is the text that is linked. For example:

$links=<<<END
Click <a href="http://www.oreilly.com">here</a> to visit a computer book 
publisher. Click <a href="http://www.sklar.com">over here</a> to visit 
a computer book author.
END;

$a = pc_link_extractor($links);
print_r($a);
Array
               (
                   [0] => Array
                       (
                           [0] => http://www.oreilly.com
                           [1] => here
                       )
                   [1] => Array
                       (
                           [0] => http://www.sklar.com
                           [1] => over here
                       )
               )

The regular expression in pc_link_extractor( ) won’t work on all links, such as those that are constructed with JavaScript or some hexadecimal escapes, but it should function on the majority of reasonably well-formed HTML.

See Also

Recipe 13.8 for information on capturing text inside HTML tags; documentation on preg_match_all( ) at http://www.php.net/preg-match-all.

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

PHP Cookbook

PHP Cookbook

Eric A. Mann
PHP Cookbook, 2nd Edition

PHP Cookbook, 2nd Edition

Adam Trachtenberg, David Sklar
Advanced PHP Programming

Advanced PHP Programming

George Schlossnagle
PHP 5 Power Programming

PHP 5 Power Programming

Andi Gutmans, Stig Sæther Bakken, Derick Rethans

Publisher Resources

ISBN: 1565926811Catalog PageErrata