November 2017
Beginner to intermediate
290 pages
7h 34m
English
Data scientists work with data for exploration, often in an ad hoc manner. Data engineering builds data pipelines that run continuously, so operability and production readiness are important concerns. While the two camps look very different and work with different tools and different methods, large organizations often have both work for a common objective. It is also not uncommon that successful data science experiments eventually result in data engineering projects.
An important reason for the popularity of Apache Spark is that its entry barrier is perceived as low and the platform as not limited to Java/Scala developers and distributed system experts. Spark was first to offer support ...
Read now
Unlock full access