O'Reilly logo

Pentaho® Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration by Jos van Dongen, Roland Bouman, Matt Casters

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 6. Data Extraction

The first step in an ETL process is getting data from one or more data sources. As we discussed in Chapters 1 and 5, this is a demanding task because of the complexity and variety of these different data sources. In a traditional data warehouse environment, data is usually extracted from an organization's transaction systems, such as financial applications or ERP systems. Most of these systems store their data in a relational database such as MySQL, Oracle, or SQL Server. As challenging as this may be from a functional point of view (we'll take a closer look at ERP systems later in this chapter), technically it's pretty straightforward to connect to a MySQL database using a JDBC driver and extract data from it. It gets more interesting when the database isn't relational and there's also no driver available to connect to it. In those situations you often end up having the data delivered in a flat file format such as a comma-separated ASCII file. An even trickier variation to this topic is data that is owned by someone else and is stored outside the corporate firewall, perhaps by a client or vendor company. In that case, a direct connection is usually not feasible so getting flat files might be the only option. In the case of data stored on the Internet, even flat files are not an option. (Imagine yourself calling Google, asking the company to FTP you some data set on a regular basis.) As you'll see later in this chapter, Kettle provides several ways to ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required