Introducing Apache NiFi for dataflows

Apache NiFi automates dataflows by receiving data from any source, such as Twitter, Kafka, databases, and so on, and sends it to any data processing system, such as Hadoop or Spark, and then finally to data storage systems, such as HBase, Cassandra, and other databases. There can be multiple problems at these three layers, such as systems being down, or data production and consumption rates are not in sync. Apache NiFi addresses the dataflow challenges by providing the following key features:

  • Guaranteed delivery with write-ahead logs
  • Data buffering with Back Pressure and Pressure Release
  • Prioritized queuing with the oldest first, newest first, or largest first, and so on
  • Configurations for low latency, high throughput, ...

Get Big Data Analytics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.