Modifying HREF Attributes
We’re halfway
there: we’ve modified all our filenames to be consistently
lowercase and to end in .html. Now we just need to
edit the HREF attributes of the links inside those
HTML files to reflect those changes. To do that, we will need to
write a new script that can open up each member of a list of files
that is passed to it, make changes to that file, and save the changes
back to disk.
Even more than the renaming-files example we just finished, this one exposes us to a real risk of accidentally doing bad things to our data. Again, please make sure you have a good backup before proceeding. Also, see Parsing HTML with Regexes Considered Harmful for a discussion of some of the limitations of the approach presented here.
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access