Part III. Transforming Data
MarkLogic offers multiple ways to represent data. At one level, everything is represented as a document, but due to a wide variety of indexes, MarkLogic also supports SPARQL queries and updates on RDF triples, as well as SQL queries on rows extracted from document data.
This flexible representation provides one of MarkLogic’s biggest benefits: data modeling is not an up-front activity, but rather an iterative one. With a relational database, a schema must be built before data can be ingested. This means that for each data field, its type, format, cardinality, and relationships to other pieces of data must be established before the meaningful work of building an application—and delivering business value—can be started.
Iterative data modeling means that we load data in the form in which it is made available, then make adjustments to it as needed to address current requirements.
The Envelope Pattern
A common design pattern for integrating data from multiple sources into MarkLogic is called the Envelope Pattern. The content is preserved in its original form, but is wrapped in an extra layer of XML or JSON (depending on how it’s being stored). We can then identify a common piece of information that is represented differently across different sources and record a common form in each document. The approach often looks something like this:
<envelope><canonical><published>2017-11-02</published></canonical><article><title>The Title of an Article</title> ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access