Chapter 1. Peak Performance
Many MarkLogic installations store large amounts of data, but still provide fast searches. The key to performance is understanding how MarkLogic works—specifically understanding query and update modes, and the use of indexes. These two recipes help ensure you’re getting the speed you need for your applications.
Assert Query Mode
Problem
All MarkLogic requests run in either query or update mode, based on a static analysis of the code. The mode is important, because query requests are able to run without locking database content. Accidentally running in update mode is a common cause of requests running slower than expected.
Verify that a MarkLogic statement is running in query mode.
Solution
Applies to MarkLogic versions 7 and higher
Place this snippet as early in the code path as you can to make sure it is executed before MarkLogic spends too much time on other parts of your request:
let
$
assert-query-mode
as
xs:unsignedLong
:=
xdmp:request-timestamp
()
If a request that includes this line is run in update mode, then this error will be thrown:
> XDMP-AS: (err:XPTY0004) let $assert-query-mode as xs:unsignedLong := xdmp:request-timestamp() -- Invalid coercion: () as xs:unsignedLong
Discussion
Sometimes MarkLogic’s static analysis may see something that triggers update mode, even if that was not the developer’s intent. The code in this recipe will throw an exception if it is run as an update, making it easy to notice the problem. Once this problem has been seen, find the code that caused the statement to run as an update. If the statement really should be running as an update, remove the assertion. If the update can be removed or isolated into an xdmp:invoke()
call, do that to allow the statement to run as a query. Using this function, we can specify the different-transaction
option, causing the update to be separated from the main request.
See the Transaction Type section of the Application Developer’s Guide for more information about query or update modes.
Note that we don’t need the same approach for Server-side JavaScript (SJS). With SJS, there is no static analysis; the developer must explicitly declare update mode.
It’s important to see that we can’t just call xdmp:request-timestamp()
and get the same effect. The magic is in the as xs:unsignedLong
—because that clause is present, MarkLogic will expect the value to be an unsigned long, or convertible to one. If the code returns the empty sequence, the conversion can’t happen, and the error is thrown.
The name is important too, in order to be self-documenting. What we don’t want to happen is that a developer runs into this exception and realizes that it can be “fixed” by removing the as xs:unsignedLong
, or by changing it to as xs:unsignedLong?
(making it optional). The presence of the word assert
in the name provides a clue that we’re expecting something here, and silencing the message would be contrary to the original developer’s intent.
What do you do if this exception gets thrown? If that’s happening, MarkLogic sees that updates might be made. Check whether those updates can be made in a different transaction using xdmp:invoke
or xdmp:invoke-function
. Consider whether those updates need to be made at all. If updates really should be part of a request, you can remove the assertion—but make sure you aren’t locking too many documents.
Fast Distinct Values
Problem
You want to quickly find the distinct values in a particular element or JSON property.
Solution
Build a range index on the element or property, then call:
let
$
ref
:=
(: call one of the cts:*-reference functions to create a
reference to your index
:)
return
cts:values
(
$
ref
)
Required Index
Range index on the target element or property.
Discussion
Wanting a list of the distinct values in an element or property is a common problem. Developers who are new to MarkLogic often turn to fn:distinct-values()
, like this:
fn:distict-values
(/
content
/
author
/
full-name
)
While this approach will work fine for small numbers of values, it doesn’t scale. As written, MarkLogic will retrieve all fragments that the /content/author/full-name
path matches, put the full-name
elements into a sequence, and pass that to fn:distinct-values()
. Because distinct-values
expects a sequence of strings, each element is converted to a string. The function will then loop through each string it was given in order to find the unique values.
Consider a database that has just 1,000 matching documents, but just 10 distinct values. Even such a small example is enough to illustrate how much effort MarkLogic has to waste by loading all 1,000 fragments to get just those 10 values. To see how many fragments MarkLogic would need to load to answer this query on your data, run this in Query Console: xdmp:plan(/content/author/full-name)
, substituting your XPath for /content/author/full-name
.
Conversely, if a range index is available, then the work has already been done. An element range index on full-name
, or a path range index on /content/author/full-name
, will have a list of distinct values, along with identifiers of fragments that hold the values. By calling cts:values()
, we directly access the index and don’t need to load any of the fragments.
Get MarkLogic Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.