The architecture that we just described can be implemented using a variety of available technologies. Some of this information is summarized in the following diagram, where the choice of technology stack is related to the use case, as well as notes/highlights associated with each one of these:
Platform service | Technology stack | Use case | Highlights |
Talend | Use for first-time ingestion. | Barch and real time. | |
Scoop | Bulk data transfer between Hadoop and relational data stores. | Two-way replication with both snapshots and incremental updates. | |
Ingestion | Flume | Online analytics applications and collects log data to store on HDFS. | Reliable, available, maintains central list of ongoing data flows. |