Chapter 11. Input Transformations

Data modeling is an iterative process with MarkLogic—we load data as is and then revise it over time. Sooner or later, however, we’ll need to modify the structure of the data. The recipes in this chapter show how to change documents, using tools like MarkLogic Content Pump (MLCP), REST API transforms, or CORB2, an open source bulk processing tool.

Changing Date Format

Problem

You’re loading documents, but the data have dates in a nonstandard format. Each document has multiple dates. You want to fix them during ingest.

Solution

Applies to MarkLogic versions 8 and higher

Here’s the code for an MLCP transform that will fix dates in a specified list of JSON properties in newly ingested JSON documents. This code can also be used with REST input transforms or CORB2 jobs.

Save this content in dateTransform.sjs. Add it to your modules database, as described in the documentation.

// Recurse through a JSON document, applying
// function f to any JSON property whose property
// name is in the array keys.
function applyToProperty(obj, keys, f) {
  for (var i in obj) {
    if (!obj.hasOwnProperty(i)) {
      continue;
    }
    else if (typeof obj[i] === 'object') {
      applyToProperty(obj[i], keys, f);
    }
    else if (keys.indexOf(i) !== -1) {
      obj[i] = f.call(this, obj[i]);
    }
  }
}

function fixDate(value) {
  return new Date(value).toISOString().substring(0, 10);
}

// This is the MLCP transform function. Fix any date with the
// property name(s) specified by context.transform_param. ...

Get MarkLogic Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.