9 Advanced ingestion: finding data sources and building your own
This chapter covers
- Finding third-party data sources for ingestion
- Understanding the benefits of building your own data source
- Building your own data source
- Building a JavaBean data source
In a lot of use cases, I had to get data from nontraditional data sources to use in Apache Spark. Imagine that your data is in an enterprise resource planning (ERP) package, and you want to ingest it via the ERP’s REST API. Of course, you could create a standalone application, dumping all the data in a CSV or JSON file and ingesting the file or files, but you don’t want to deal with the life cycle of each file. When will you be able to delete it? Who has access to it? Can the disk be full ...
Get Spark in Action, Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.