14
Building a Data Pipeline in PyCharm
The term data pipeline generally denotes a step-wise procedure that entails collecting, processing, and analyzing data. This term is widely used in the industry to express the need for a reliable workflow that takes raw data and converts it into actionable insights. Some data pipelines work at massive scales, such as a marketing technology (MarTech) company ingesting millions of data points from Kafka streams, storing them in large data stores such as Hadoop or Clickhouse, and then cleansing, enriching, and visualizing that data. Other times, the data is smaller but far more impactful, such as the project we’ll be working on in this chapter.
In this chapter, we will learn about the following topics:
Get Hands-On Application Development with PyCharm - Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.