Apache Kafka is selected for it's strengths in the space of messaging. The following are some of the advantages which Kafka possess, making it ideal for our Data Lake implementation:
- High-throughput: Kafka is capable of handling high-velocity and high-volume data using not so large hardware. It is capable of supporting message throughput of thousands of messages per second.
- Low latency: Kafka is able to handle these messages with very low latency of the range of milliseconds, demanded by most of new use cases.
- Fault tolerant: The inherent capability of Kafka to be resistant to node/machine failure within a cluster.
- Durability: The data/messages are persistent on disk, making it durable and messages are also replicated ...