book

XQuery, 2nd Edition

by Priscilla Walmsley

December 2015

Intermediate to advanced

762 pages

19h 13m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Contents of This BookReading the Syntax DiagramsConventions Used in This BookUsing Code ExamplesUseful FunctionsO’Reilly SafariWe’d Like to Hear from YouAcknowledgments
What Is XQuery?Capabilities of XQueryUses for XQueryProcessing ScenariosEasing into XQueryPath ExpressionsFLWORsAdding XML Elements and AttributesAdding ElementsAdding AttributesFunctionsJoinsAggregating and Grouping Values
The Design and History of the XQuery LanguageXQuery in ContextXQuery and XPathXQuery Versus XSLTXQuery Versus SQLXQuery and XML SchemaProcessing QueriesInput DocumentsThe QueryThe ContextThe Query ProcessorThe Results of the QueryThe XQuery Data ModelNodesAtomic ValuesSequencesTypesNamespaces
Categories of ExpressionsKeywords and NamesWhitespace in QueriesLiteralsVariablesFunction CallsCommentsPrecedence and ParenthesesComparison ExpressionsGeneral ComparisonsValue ComparisonsNode ComparisonsConditional (if-then-else) ExpressionsConditional Expressions and Effective Boolean ValueNesting Conditional ExpressionsSwitch ExpressionsLogical (and/or) ExpressionsPrecedence of Logical ExpressionsNegating a Boolean Value
Path ExpressionsPath Expressions and ContextStepsAxesNode TestsAbbreviated SyntaxOther Expressions as StepsPredicatesComparisons in PredicatesUsing Positions in PredicatesUsing Multiple PredicatesMore Complex PredicatesA Closer Look at ContextWorking with the Context NodeAccessing the RootDynamic PathsThe Simple Map Operator
Including Elements and Attributes from the Input DocumentDirect Element ConstructorsContaining Literal CharactersContaining Other Element ConstructorsContaining Enclosed ExpressionsSpecifying Attributes DirectlyDeclaring Namespaces in Direct ConstructorsUse Case: Modifying an Element from the Input DocumentDirect Element Constructors and WhitespaceComputed ConstructorsComputed Element ConstructorsComputed Attribute ConstructorsUse Case: Turning Content to Markup
Selecting with Path ExpressionsFLWOR ExpressionsThe for ClauseThe let ClauseThe where ClauseThe return ClauseThe Scope of VariablesQuantified ExpressionsBinding Multiple VariablesSelecting Distinct ValuesJoinsThree-Way JoinsOuter JoinsJoins and Types
Sorting in XQueryThe order by ClauseThe sort FunctionDocument OrderDocument Order ComparisonsReversing the OrderIndicating That Order Is Not SignificantGroupingGrouping Using the group by ClauseAggregating ValuesIgnoring “Missing” ValuesCounting “Missing” ValuesAggregating on Multiple ValuesConstraining and Sorting on Aggregated Values
Built-in Versus User-Defined FunctionsCalling FunctionsFunction NamesFunction SignaturesArgument ListsSequence TypesCalling Functions with the Arrow OperatorUser-Defined FunctionsWhy Define Your Own Functions?Function DeclarationsThe Function BodyThe Function NameThe Parameter ListFunctions and ContextRecursive Functions
Working with Positions and Sequence NumbersAdding Sequence Numbers to ResultsUsing the count ClauseTesting for the Last ItemWindowingUsing start and end ConditionsWindows Based on PositionWindows Based on Previous or Next ItemsSliding WindowsCopying Input Elements with ModificationsAdding Attributes to an ElementRemoving Attributes from an ElementRemoving Attributes from All DescendantsRemoving Child ElementsChanging NamesCombining ResultsSequence ConstructorsThe union ExpressionThe intersect ExpressionThe except ExpressionUsing Intermediate XML DocumentsCreating Lookup TablesReducing Complexity

XML NamespacesNamespace URIsDeclaring NamespacesDefault Namespace DeclarationsNamespaces and AttributesNamespace Declarations and ScopeNamespaces and XQueryNamespace Declarations in QueriesPredeclared NamespacesProlog Namespace DeclarationsNamespace Declarations in Direct Element ConstructorsNamespace Declarations in Computed ConstructorsThe Impact and Scope of Namespace DeclarationsControlling Namespace Declarations in Your ResultsIn-Scope Versus Statically Known NamespacesControlling the Copying of Namespace DeclarationsURI-Qualified Names
The XQuery Type SystemAdvantages of a Strong Type SystemDo You Need to Care About Types?The Built-in TypesAtomic TypesList TypesUnion TypesTypes, Nodes, and Atomic ValuesNodes and TypesAtomic Values and TypesType Checking in XQueryThe Static Analysis PhaseThe Dynamic Evaluation PhaseAutomatic Type ConversionsSubtype SubstitutionType PromotionCasting of Untyped ValuesAtomizationEffective Boolean ValueFunction Conversion RulesSequence TypesOccurrence IndicatorsGeneric Sequence TypesSimple Type Names as Sequence TypesElement and Attribute TestsSequence Type MatchingThe instance of ExpressionConstructors and CastingConstructorsThe Cast ExpressionThe Castable ExpressionCasting Rules
Structure of a Query: Prolog and BodyProlog DeclarationsThe Version DeclarationAssembling Queries from Multiple ModulesLibrary ModulesImporting a Library ModuleLoading a Library Module DynamicallyVariable DeclarationsVariable Declaration SyntaxThe Scope of VariablesVariable NamesInitializing ExpressionsExternal VariablesPrivate Functions and VariablesDeclaring External Functions
Types of Input and Output DocumentsAccessing Input DocumentsAccessing a Single Document with a FunctionAccessing a CollectionSetting the Context Outside the QueryUsing VariablesSetting the Context in the PrologSerializing OutputSerialization MethodsSerialization ParametersSpecifying Serialization Parameters by Using Option DeclarationsSpecifying Serialization Parameters by Using a Separate XML DocumentSpecifying Serialization Parameters by Using a MapSerialization ErrorsSerializing to a String
What Is a Schema?Why Use Schemas with Queries?W3C XML Schema: A Brief OverviewElement and Attribute DeclarationsTypesNamespaces and XML SchemaIn-Scope Schema DefinitionsWhere Do In-Scope Schema Definitions Come From?Schema ImportsSchema Validation and Type AssignmentThe Validate ExpressionValidation ModeAssigning Type Annotations to NodesNodes and Typed ValuesTypes and Newly Constructed Elements and AttributesSequence Types and Schemas
What Is Static Typing?Obvious Static Type ErrorsStatic Typing and SchemasRaising “False” ErrorsStatic Typing Expressions and ConstructsThe Typeswitch ExpressionThe Treat ExpressionType DeclarationsType Declarations in FLWORsType Declarations in Quantified ExpressionsType Declarations in Global Variable DeclarationsThe zero-or-one, one-or-more, and exactly-one Functions
Query Design GoalsClarityImproving the LayoutChoosing NamesUsing Comments for DocumentationModularityRobustnessHandling Data VariationsHandling Missing ValuesError HandlingAvoiding Dynamic ErrorsThe error and trace FunctionsTry/Catch ExpressionsPerformanceAvoid Reevaluating the Same or Similar ExpressionsAvoid Unnecessary SortingAvoid Expensive Path ExpressionsUse Predicates Instead of where Clauses
The Numeric TypesThe xs:decimal TypeThe xs:integer TypeThe xs:float and xs:double TypesThe xs:numeric TypeConstructing Numeric ValuesThe number FunctionNumeric Type PromotionComparing Numeric ValuesArithmetic OperationsArithmetic Operations on Multiple ValuesArithmetic Operations and TypesPrecedence of Arithmetic OperatorsAddition, Subtraction, and MultiplicationDivisionModulus (Remainder)Functions on NumbersFormatting NumbersFormatting IntegersFormatting Decimal NumbersThe Decimal Format Declaration
The xs:string TypeConstructing StringsString LiteralsThe xs:string Constructor and the string FunctionString ConstructorsComparing StringsComparing Entire StringsDetermining Whether a String Contains Another StringMatching a String to a PatternSubstringsFinding the Length of a StringConcatenating and Splitting StringsConcatenating StringsSplitting Strings ApartConverting Between Codepoints and StringsManipulating StringsConverting Between Uppercase and LowercaseReplacing Individual Characters in StringsReplacing Substrings That Match a PatternWhitespace and StringsNormalizing WhitespaceInternationalization ConsiderationsCollationsUnicode NormalizationDetermining the Language of an Element
The Structure of a Regular ExpressionAtomsQuantifiersParenthesized Sub-Expressions and BranchesRepresenting Individual CharactersRepresenting Any CharacterRepresenting Groups of CharactersMulti-Character EscapesCategory EscapesBlock EscapesCharacter Class ExpressionsSingle Characters and RangesSubtraction from a RangeNegative Character Class ExpressionsEscaping Rules for Character Class ExpressionsReluctant QuantifiersAnchorsBack-ReferencesUsing FlagsUsing Sub-Expressions with Replacement Variables
The Date and Time TypesConstructing and Casting Dates and TimesTime ZonesComparing Dates and TimesThe Duration TypesThe xs:yearMonthDuration and xs:dayTimeDuration TypesComparing DurationsExtracting Components of Dates, Times, and DurationsFormatting Dates and TimesUsing Arithmetic Operators on Dates, Times, and DurationsSubtracting Dates and TimesAdding and Subtracting Durations from Dates and TimesAdding and Subtracting Two DurationsMultiplying and Dividing Durations by NumbersDividing Durations by DurationsThe Date Component Types
Working with Qualified NamesRetrieving Node NamesConstructing Qualified NamesOther Name-Related FunctionsWorking with URIsBase and Relative URIsDocuments and URIsEscaping URIsWorking with IDsJoining IDs and IDREFsConstructing ID AttributesGenerating Unique ID Values
XML CommentsXML Comments and the Data ModelQuerying CommentsComments and Sequence TypesConstructing CommentsProcessing InstructionsProcessing Instructions and the Data ModelQuerying Processing InstructionsProcessing Instructions and Sequence TypesConstructing Processing InstructionsDocumentsDocument Nodes and the Data ModelDocument Nodes and Sequence TypesConstructing Document NodesText NodesText Nodes and the Data ModelQuerying Text NodesText Nodes and Sequence TypesWhy Work with Text Nodes?Constructing Text NodesXML Entity and Character ReferencesCDATA Sections
Why Higher-Order Functions?Constructing Functions and Calling Them DynamicallyNamed Function ReferencesUsing function-lookup to Obtain a FunctionInline Function ExpressionsPartial Function ApplicationThe Arrow Operator and Dynamic Function CallsSyntax RecapFunctions and Sequence TypesHigher-Order FunctionsBuilt-In Higher-Order FunctionsWriting Your Own Higher-Order Functions
MapsConstructing MapsLooking Up Map ValuesQuerying MapsChanging MapsIterating over Entries in a MapMaps and Sequence TypesArraysConstructing ArraysArrays Versus SequencesArrays and AtomizationLooking Up Array ValuesQuerying ArraysChanging ArraysArrays and Sequence TypesJSONParsing JSONSerializing JSONConverting Between JSON and XML
ConformanceVersion SupportNew Features in XQuery 3.0New Features in XQuery 3.1Setting the Query ContextThe Option DeclarationExtension ExpressionsAnnotations
Relational Versus XML Data ModelsComparing SQL Syntax with XQuery SyntaxA Simple QueryConditions and OperatorsFunctionsSelecting Distinct ValuesWorking with Multiple Tables and SubqueriesGroupingCombining SQL and XQueryCombining Structured and Semi-Structured DataFlexible Data StructuresSQL/XML
XQuery and XPathXQuery Versus XSLTShared ComponentsEquivalent ComponentsDifferencesUsing XQuery and XSLT TogetherXQuery Backward Compatibility with XPath 1.0Data ModelNew ExpressionsPath ExpressionsFunction Conversion RulesArithmetic and Comparison ExpressionsBuilt-in Functions
XQuery Update FacilityFull-Text SearchXQueryXRESTXQXQuery API for Java (XQJ)
xs:anyAtomicTypexs:anySimpleTypexs:anyTypexs:anyURICasting and Comparing xs:anyURI Valuesxs:base64Binaryxs:booleanConstructing xs:boolean ValuesCasting xs:boolean Valuesxs:bytexs:datexs:dateTimexs:dateTimeStampxs:dayTimeDurationxs:decimalCasting xs:decimal Valuesxs:doubleCasting xs:double Valuesxs:durationxs:ENTITIESxs:ENTITYxs:errorxs:floatCasting xs:float Valuesxs:gDayxs:gMonthxs:gMonthDayxs:gYearxs:gYearMonthxs:hexBinaryxs:IDxs:IDREFxs:IDREFSxs:intxs:integerCasting xs:integer Valuesxs:languagexs:longxs:Namexs:NCNamexs:negativeIntegerxs:NMTOKENxs:NMTOKENSxs:nonNegativeIntegerxs:nonPositiveIntegerxs:normalizedStringxs:NOTATIONxs:numericxs:positiveIntegerxs:QNamexs:shortxs:stringxs:timexs:tokenxs:unsignedBytexs:unsignedIntxs:unsignedLongxs:unsignedShortxs:untypedxs:untypedAtomicxs:yearMonthDuration
FOAP0001FOAR0001FOAR0002FOAY0001FOAY0002FOCA0001FOCA0002FOCA0003FOCA0005FOCA0006FOCH0001FOCH0002FOCH0003FOCH0004FODC0001FODC0002FODC0003FODC0004FODC0005FODC0006FODC0010FODF1280FODF1310FODT0001FODT0002FODT0003FOER0000FOFD1340FOFD1350FOJS0001FOJS0003FOJS0004FOJS0005FOJS0006FOJS0007FONS0004FONS0005FOQM0001FOQM0002FOQM0003FOQM0005FOQM0006FORG0001FORG0002FORG0003FORG0004FORG0005FORG0006FORG0008FORG0009FORG0010FORX0001FORX0002FORX0003FORX0004FOTY0012FOTY0013FOTY0014FOTY0015FOUT1170FOUT1190FOUT1200FOXT0001FOXT0002FOXT0003FOXT0004FOXT0006SENR0001SEPM0004SEPM0009SEPM0010SEPM0016SEPM0017SEPM0018SEPM0019SERE0003SERE0005SERE0006SERE0008SERE0012SERE0014SERE0015SERE0020SERE0021SERE0022SERE0023SESU0007SESU0011SESU0013XPDY0002XPDY0050XPDY0130XPST0001XPST0003XPST0005XPST0008XPST0017XPST0051XPST0080XPST0081XPTY0004XPTY0018XPTY0019XPTY0020XPTY0117XQDY0025XQDY0026XQDY0027XQDY0041XQDY0044XQDY0054XQDY0061XQDY0064XQDY0072XQDY0074XQDY0084XQDY0091XQDY0092XQDY0096XQDY0101XQDY0102XQDY0137XQST0009XQST0012XQST0013XQST0016XQST0022XQST0031XQST0032XQST0033XQST0034XQST0035XQST0038XQST0039XQST0040XQST0045XQST0046XQST0047XQST0048XQST0049XQST0052XQST0055XQST0057XQST0058XQST0059XQST0060XQST0065XQST0066XQST0067XQST0068XQST0069XQST0070XQST0071XQST0075XQST0076XQST0079XQST0085XQST0087XQST0088XQST0089XQST0090XQST0094XQST0097XQST0098XQST0099XQST0103XQST0104XQST0106XQST0108XQST0109XQST0110XQST0111XQST0113XQST0114XQST0115XQST0116XQST0118XQST0119XQST0125XQST0129XQST0134XQTY0024XQTY0030XQTY0086XQTY0105

Content preview from XQuery, 2nd Edition

Chapter 19. Regular Expressions

Regular expressions are patterns that describe strings. They can be used as arguments to four XQuery built-in functions to determine whether a string value matches a particular pattern (matches), to replace parts of string that match a pattern (replace), to tokenize strings based on a delimiter pattern (tokenize), and to split a string into matching and non-matching parts (analyze-string). This chapter explains the regular expression syntax used by XQuery.

The Structure of a Regular Expression

The regular expression syntax of XQuery is based on that of XML Schema, with some additions. Regular expressions, also known as regexes, can be composed of a number of different parts: atoms, quantifiers, and branches.

Atoms

An atom is the most basic unit of a regular expression. It might describe a single character, such as d, or an escape sequence that represents one or more characters, like \s or \p{Lu}. It could also be a character class expression that represents a range or choice of several characters, such as [a-z]. These kinds of atoms are described later in this chapter.

Quantifiers

Atoms may indicate required, optional, or repeating strings. The number of times a matching string may appear is indicated by a quantifier, which appears directly after an atom. For example, to indicate that the letter d must appear one or more times, you can use the expression d+, where the + means “one or more.” The different quantifiers are listed in Table 19-1.