book

MarkLogic Cookbook

Name: MarkLogic Cookbook
Author: Dave Cassel
ISBN: 9781491994603

by Dave Cassel

March 2018

Intermediate to advanced

34 pages

1h 33m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Foreword
Preface
Acknowledgments
I. Implementing XQuery: Practical Solutions to Real-World Problems
1. Peak Performance
Assert Query ModeProblemSolutionDiscussionFast Distinct ValuesProblemSolutionDiscussion
2. Fun with Maps
Check Whether Two Maps Are EqualProblemSolutionDiscussionFind the Intersection of a Sequence of MapsProblemSolutionDiscussionApply a Function to All Values in a MapProblemSolutionDiscussion
3. Document Security
List User Permissions on a DocumentProblemSolutionDiscussionGet Permissions with Role NamesProblemSolutionDiscussion
4. Working with Documents
Generate a Unique IDProblemSolutionDiscussionFind Binary DocumentsProblemSolutionDiscussionFind Recently Modified Binary DocumentsProblemSolutionDiscussion
5. The Task Server
Cancel Active Tasks on the Task ServerProblemSolutionDiscussionCancel Active and Queued Tasks on the Task ServerProblemSolutionDiscussion
6. Administration
Find Hostnames in a ClusterProblemSolutionDiscussionFind Current and Effective MarkLogic Versions During Rolling UpgradeProblemSolutionDiscussion
II. Documents, Triples, and Values: Powering Search

7. Document Searches
Search by Root ElementProblemSolutionDiscussionSee AlsoFind Documents That Are Missing an ElementProblemSolutionDiscussionSee Also
8. Scoring Search Results
Sort Results to Promote Recent DocumentsProblemSolutionDiscussionSee AlsoWeigh Matches Based on Document PartsProblemSolutionDiscussionSee Also
9. Understanding Your Data and How It Gets Used
Logging Search RequestsProblemSolutionDiscussionSee AlsoCount Documents in DirectoriesProblemSolutionDiscussionSee Also
10. Searching with the Optic API
Paging Over ResultsProblemSolutionDiscussionSee AlsoGroup BySolutionDiscussionSee AlsoExtract Content from Retrieved DocumentsProblemSolutionDiscussionSee AlsoSelect Documents Based on Criteria in Joined DocumentsProblemSolutionDiscussionSee Also
III. Transforming Data
11. Input Transformations
Changing Date FormatProblemSolutionDiscussionConverting Binaries to Base64 Strings and BackProblemSolutionDiscussionSee AlsoIngesting an Aggregate JSON File with Many Documents InsideProblemSolutionDiscussion
12. Tokenization
Tokenizing Social Security NumbersProblemSolutionDiscussion
13. Template-Driven Extraction
Searching on Derived DataProblemSolutionDiscussionSee AlsoUsing an IRI Namespace with TDEProblemDiscussionSee Also
14. Redaction
Redacting Credit Card Numbers, Replacing with DigitsProblemSolutionDiscussionSee AlsoRedacting ICD10 CodesProblemSolutionDiscussion

Content preview from MarkLogic Cookbook

Part III. Transforming Data

MarkLogic offers multiple ways to represent data. At one level, everything is represented as a document, but due to a wide variety of indexes, MarkLogic also supports SPARQL queries and updates on RDF triples, as well as SQL queries on rows extracted from document data.

This flexible representation provides one of MarkLogic’s biggest benefits: data modeling is not an up-front activity, but rather an iterative one. With a relational database, a schema must be built before data can be ingested. This means that for each data field, its type, format, cardinality, and relationships to other pieces of data must be established before the meaningful work of building an application—and delivering business value—can be started.

Iterative data modeling means that we load data in the form in which it is made available, then make adjustments to it as needed to address current requirements.

The Envelope Pattern

A common design pattern for integrating data from multiple sources into MarkLogic is called the Envelope Pattern. The content is preserved in its original form, but is wrapped in an extra layer of XML or JSON (depending on how it’s being stored). We can then identify a common piece of information that is represented differently across different sources and record a common form in each document. The approach often looks something like this:

<envelope>
  <canonical>
    <published>2017-11-02</published>
  </canonical>
  <article>
    <title>The Title of an Article</title> ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781491994610

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

MarkLogic Cookbook

by Dave Cassel

Part III. Transforming Data

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.