Let’s say you want to acquire some data from the web site of one of your business partners. There are lots of ways to retrieve a web page:
“By hand,” that is, by pointing your web browser to the right location.
Using a scripting language such as Perl, which, incidentally, has lots of available gizmos and gadgets to interpret the data once you retrieve it.
Via a command-line utility such as GNU wget (one of my favorite utilities).
Using Oracle’s built-in package UTL_HTTP.
Since this is a book about PL/SQL, guess which method I’ll be discussing!
If you’re running Oracle Database 11g Release 2 or later, you will need to set up a network ACL to permit outbound connections to any desired remote hosts, as mentioned in the previous section.
Let’s start with a relatively simple means of coding the retrieval of a web page. This first method, which slices up the web page and puts the slices into an array, actually predates Oracle’s support of CLOBs.
One of the first procedures that Oracle ever released in the UTL_HTTP package retrieves a web page into consecutive elements of an associative array. Usage can be pretty simple:
DECLARE page_pieces UTL_HTTP.html_pieces; -- array of VARCHAR2(2000) BEGIN page_pieces := UTL_HTTP.request_pieces(url => 'http://www.oreilly.com/'); END;
This format is not terribly fun to work with, because the 2,000-byte boundaries are unrelated to anything you would find on the text of the page. So if you have ...