O'Reilly logo

Fast Data Processing with Spark 2 - Third Edition by Krishna Sankar

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Data modalities

From a modality perspective, all data can be grouped into three categories: structured, semi-structured, and unstructured. The modality is independent of the data source, organization, or storage technologies. In fact, different representations, organizations, and storage technologies perform well with, at the most, one modality. It is very difficult to efficiently support more than one modality.

  • Structured data is usually stored in databases, Oracle, HBase, Cassandra, and so on. Relational tables are the most commonly used organization and storage mechanism. Usually, structured data formats, data types, and sizes are fixed and well known.
  • Semi-structured data, as the name implies, has enough structure; however, there is also variability ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required