Chapter 9. Advanced Table Designs

After covering the basics of table design in the previous chapter, here we discuss advanced design considerations for storing some commonly encountered types of data in Accumulo. Examples include time series, graph, geospatial, feature vector, and other data.

Time-Ordered Data

Reading and writing data in time order is a common requirement. In a previous example, we ordered email messages in reverse time order within a particular folder belonging to a particular user account. Some applications want to access data primarily in time order. That is, the first and most important element of the data is the time component. Examples include time series such as stock data, application logs, and series of events captured by sensors.

We could simply use a timestamp as the row ID of a table. Rows will be sorted in increasing time order, and retrieving the data for one timestamp or a range of timestamps is straightforward.

But using a simple timestamp as the row ID of a table can be problematic when it comes to writing the data. This is because often new data arrives with timestamps that only ever increase. If we simply order our data this way, all new data will always be written to the end of the table, specifically to the last tablet, which spans some timestamp we’ve already seen up to positive infinity (Figure 9-1).

Hotspot in time-ordered table
Figure 9-1. Hotspot in time-ordered ...

Get Accumulo now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.