Chapter 5. Maintaining Your Collections

Hack 96. Hacks #90-93

It’s rare that one script will solve all your data-grubbing needs. You might want, weekly, to know about new movies being listed on Amazon.com, grab a summary page from IMDB, find the last five movies that each actor or actress starred in, and then image-search and download pictures of them. On the other hand, you might be graphing important information and need to automatically grab the data every hour, day, or week. And what if you’re downloading or mirroring data with wget [Hack #26]?

We have these great tools to automate our information needs, but how do we then automate the running of said tools? Where is our meta-automation?

Hack #90. Using cron to Automate Tasks

Run scripts on a repetitive basis with the cron utility.

There will come a time when you’ve created a script so perfect for your day-to-day life that it becomes absolutely imperative to run on a regular basis. Sure, you could run it manually during your morning routine, but if you can automate the retrieval of data with scraping, why not automate the execution too?

Meet cron, a Unix utility whose life revolves around running things every minute, hour, day, week, month, or year. Give it a command or script and a schedule and let it go. Each user on the system can automate his own tasks with no restrictions: hear the date spoken every minute, have a backup performed every three days at 12:15, or automatically open your email every day at 7:00 A.M. and then again ...

Get Spidering Hacks now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.