Chapter 4. The Simplest Way to Manage Data
At some point, IT staff realize that transferring data from one place to another can become complicated. A natural reaction to this problem is to write a script. Scripts are easy to write, and they start out easy to understand and manage. The problem with writing a script is that things change. The data, its purpose, the technology, the policies, and the people responsible all change over time. Eventually, the bulk of your time will be spent editing the script to keep it working when things change. So, the simplest way to manage data is to use standards and tools to build automated data pipelines that will make it easier to adapt to these inevitable changes.
The Data Pipeline
Make it standard practice to access remote data using web APIs. A standard interface makes it easy to adapt when the storage technology supporting the remote data source changes. Ideally, if data is accessed as a web service, you won’t notice if the storage mechanism changes from, say, a mainframe to a relational database.
Use a standard tool to schedule, execute, and monitor data transfer. If all goes well, the scope of your AI efforts (and your data needs) will grow over time. The number of data sources will increase. The response times and update intervals of data sources will change depending on demand. Using a single, automated data ingestion tool makes it easier to keep up with data pipeline operations, discover when data ingestion jobs are failing, and take ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access