Gauging oil prices

Now that we have a substantial amount of data in our data store (we can always add more data using the preceding Spark job) we will proceed to query that data, using the GeoMesa API, to get the rows ready for application to our learning algorithm. We could of course use raw GDELT files, but the following method is a useful tool to have available.

Using the GeoMesa query API

The GeoMesa query API enables us to query for results based upon spatio-temporal attributes, whilst also leveraging the parallelization of the data store, in this case Accumulo with its iterators. We can use the API to build SimpleFeatureCollections, which we can then parse to realize GeoMesa SimpleFeatures and ultimately the raw data that matches our query. ...

Get Mastering Spark for Data Science now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.