Artificial Intelligence for Big Data
by Anand Deshpande, Manish Kumar, Albenzo Coletta, Giancarlo Zaccone
Hive
Apache Hive is the data warehouse built on top of Hadoop. Hive provides an SQL-like interface for the data residing on HDFS. The queries are executed as MR, Tez, or Spark jobs on the Hadoop cluster. Hive supports indexing for fast queries along with compressed storage types like ORC. In the context of cyber security, Hive can be used for storing the aggregate views of various logs which are generated by the CI applications.
While the batch processing frameworks like MR on Hadoop are useful in processing very large volumes of data in an efficient manner, they are not suitable for providing security to mission CIs. Such CI systems require real-time (at least near real-time) processing of the streaming or micro-batch data for quick alerts, ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access