Cover | Table of Contents | Colophon
|
Assembly
|
Description
|
|---|---|
System.Xml
|
Basic XML input and output with
XmlReader and
XmlWriter, DOM with XmlNode and
its subclasses, many XML utility classes |
System.Xml.Schema
|
Constraint of XML via XML Schema with
XmlSchemaObject and its subclasses |
System.Xml.Serialization
|
Serialization to plain XML and SOAP with
XmlSerializer
|
System.Xml.XPath
|
Navigation of XML via XPath with
XPathDocument,
XPathExpression, and
XPathNavigator
|
System.Xml.Xsl
|
XmlNode. Derived from
XmlNode are XmlAttribute,
XmlDocument,
XmlDocumentFragment, XmlEntity,
XmlLinkedNode, and XmlNotation.
In turn, XmlLinkedNode has a number of subclasses
that serve specific purposes (XmlCharacterData,
XmlDeclaration,
XmlDocumentType, XmlElement,
XmlEntityReference, and
XmlProcessingInstruction). Several of these key
types also have further subclasses. In each case, the final subclass
of each inheritance branch has a name that is meaningful to one
familiar with XML.
XmlNode inheritance hierarchy.
XmlNode subclasses are also represented by the
members of the XmlNodeType enumeration:
Element, Attribute,
Text, CDATA,
EntityReference, Entity,
ProcessingInstruction, Comment,
Document, DocumentType,
DocumentFragment, Notation,
Whitespace, and
SignificantWhitespace, plus the special
pseudo-node types, None,
EndElement, EndEntity, and
XmlDeclaration. Each XmlNode
instance has a NodeType property, which returns an
XmlNodeType that represents the type of the
instance. An XmlNodeType value is also returned by
the NodeType property of
XmlReader, as discussed in Chapter 2, Chapter 3, and Chapter 4.System.Xml namespace to help you read XML,
whether you wish to deal with it as a stream of events or to load the
data into your own data structures. In this chapter we take a look at
XmlReader, its subclasses, and the associated .NET
types and interfaces. I also discuss when it is appropriate to use
the XmlReader instead of other methods of reading
XML, and describe the differences between pull parsers and push
parsers.System.IO
namespace. The basic object used for reading and writing data,
regardless of the source, is the Stream object.
Stream is an abstract base class, which represents
a sequence of bytes; the Stream has a
Read( ) method to read the bytes from the
Stream, a Write( ) method to
write bytes to the Stream, and a Seek(
) method to set the current location within the
Stream. Not all instances or subclasses of
Stream support all these operations; for example,
you cannot write to a FileStream representing a
read-only file, and you cannot Seek( ) to a
position in a NetworkStream. The properties
CanRead, CanWriteSystem.IO
namespace. The basic object used for reading and writing data,
regardless of the source, is the Stream object.
Stream is an abstract base class, which represents
a sequence of bytes; the Stream has a
Read( ) method to read the bytes from the
Stream, a Write( ) method to
write bytes to the Stream, and a Seek(
) method to set the current location within the
Stream. Not all instances or subclasses of
Stream support all these operations; for example,
you cannot write to a FileStream representing a
read-only file, and you cannot Seek( ) to a
position in a NetworkStream. The properties
CanRead, CanWrite, and
CanSeek can be interrogated to determine whether
the respective operations are supported by the instance of
Stream you're dealing with.Stream
type's subclasses and the methods each type
supports.
|
Type
|
Length
|
Position
|
Flush( )
|
Read( )
|
Seek( )
|
Write( )
|
|---|---|---|---|---|---|---|
System.IO.BufferedStream
|
XmlReader
is an abstract base class that provides an event-based, read-only,
forward-only XML pull parser (I'll discuss each of
these terms shortly). XmlReader has three concrete
subclasses, XmlTextReader,
XmlValidatingReader, and
XmlNodeReader, which enable you to read XML from a
file, a Stream, or an XmlNode.
You can also extend XmlReader to read other,
non-XML data formats, and deal with them as if they were XML
(you'll learn how to do this in Chapter 4).XmlReader
provides only the most essential functionality for reading XML
documents. It does not, for example, validate XML
(that's what XmlValidatingReader
does) or expand XML entities into their respective character data
(though XmlTextReader does). This does not mean
that XML read from a text file cannot be validated at all; you can
validate XML from any source by using the
XmlValidatingReader constructor that takes an
XmlReader object as a parameter, as
I'll demonstrate.XmlReader again, with a little explanation.XmlReader, events are delivered by
querying XmlReader's properties
after calling its Read( ) method.XmlReader
, as its
name implies, can only read XML. For writing XML, there is an
XmlWriter class, which I will discuss in Chapter 3.XmlDocument (which
I'll discuss in Chapter 5) or
XPathDocument (which I'll discuss
in Chapter 6).XmlReader implementations. And
I've discussed the pull parser pattern used by the
.NET XML parser and how it differs from a push parser.XmlReader. In the next chapter,
I'll show you the other side of the XML I/O picture
by introducing XmlWriter.XmlReader, I'll start by
taking a general look at how data is written in .NET.
I've already covered input, and output is very
similar in that most operations involve the Stream
class. After a general introduction to how the writing process works,
I'll show you a quick and simple way of writing text
to a file.File and
FileInfo objects in Chapter 2.
In this section, I'll focus on writing to a file
using the same objects.File has a Create(
) method. This method takes a filename as a parameter and
returns a FileStream, so the most basic creation
and writing to a file is fairly intuitive. Stream
and its subclasses implement a variety of Write( )
methods, including one that writes an array of bytes to the
Stream. The following code snippet creates a file
named myfile.txt and writes the text
.NET & XML to it:byte [ ] buffer = new byte [ ] {46,78,69,84,32,38,32,88,77,76};
string filename = "myfile.txt";
FileStream stream;
stream = File.Create(filename);
stream.Write(buffer,0,buffer.Length);Stream; ordinarily, you wouldn't
hardcode an array of bytes like that. I'll show you
a more typical way of encoding a string as a byte array in a moment.XmlReader, I'll start by
taking a general look at how data is written in .NET.
I've already covered input, and output is very
similar in that most operations involve the Stream
class. After a general introduction to how the writing process works,
I'll show you a quick and simple way of writing text
to a file.File and
FileInfo objects in Chapter 2.
In this section, I'll focus on writing to a file
using the same objects.File has a Create(
) method. This method takes a filename as a parameter and
returns a FileStream, so the most basic creation
and writing to a file is fairly intuitive. Stream
and its subclasses implement a variety of Write( )
methods, including one that writes an array of bytes to the
Stream. The following code snippet creates a file
named myfile.txt and writes the text
.NET & XML to it:byte [ ] buffer = new byte [ ] {46,78,69,84,32,38,32,88,77,76};
string filename = "myfile.txt";
FileStream stream;
stream = File.Create(filename);
stream.Write(buffer,0,buffer.Length);Stream; ordinarily, you wouldn't
hardcode an array of bytes like that. I'll show you
a more typical way of encoding a string as a byte array in a moment.byte [ ] buffer = new byte [ ] {46,78,69,84,32,38,32,88,77,76};
string filename = "myfile.txt";
FileStream stream;
if (File.Exists(filename)) {
// it already exists, let's append to it
stream = File.OpenWrite(filename);
stream.Seek(0,SeekOrigin.End);
} else {
// it does not exist, let's create itXmlWriter
is an abstract base class
that defines the interface for creating XML output programmatically.
It contains methods such as WriteStartElement( )
and WriteEndElement( ) to write data.
XmlWriter maintains the state of the XML document
as it writes, so it knows which start element or attribute to close
when you call WriteEndElement( ) or
WriteEndAttribute( ).XmlTextWriter
is the subclass of
XmlWriter, which provides support for output of
XML to any Stream, filename, or
TextWriter. In addition to all the required
features of an XmlWriter,
XmlTextWriter allows you to set the formatting of
the output, using the Formatting,
Indentation, IndentChar,
Namespaces, and QuoteChar
properties.XmlTextWriter formatting properties are
described in Table 3-6.|
Property
|
Type
|
Description
|
|---|---|---|
Formatting |
System.Xml.Formatting |
Specify
Formatting.None if the XML is to be
produced without indentation, or
Formatting.Indented to produce indented XML.
Formatting.Indented makes for more readable
output, but the canonical XML produced is identical. |
Indentation |
int |
If
Formatting is set to
Formatting.Indented,
Indentation specifies the number of characters by
which to indent each successive level of markup. |
IndentChar |
char |
XmlReader and XmlWriter types.
The next chapter will show you how to read and write non-XML data as
though it were XML.XmlReader and
XmlWriter provided in the .NET class libraries,
you're ready to learn how to implement your own
custom types to handle some more complex scenarios. By combining
XmlReader and XmlWriter, you
can work with information stored in other formats as if it was XML,
mixing and matching formats as you find appropriate for your
projects.XmlReader class allows you to read standard XML
syntax, there are alternative XML syntaxes that serve specialized
purposes. There are XML syntaxes that do not use slashes and angle
brackets, and some of these are considered to be more human-readable
and less verbose than standard XML. Most of these alternative XML
formats still retain all the functionality of standard XML. Other
common non-XML formats contain structures you can treat as XML
structures when convenient.XmlReader by writing a custom
XmlReader subclass. Among the advantages of
writing your own XmlReader subclass is that you
can use your custom XmlReader wherever you would
use any of the built-in XmlReaders. For example,
even if the underlying data isn't formatted using
standard XML syntax, you can pass any instance of a custom
XmlReader to XmlDocument.Load(
) to load the XML document into a DOM (more on
XmlDocument in Chapter 5). You
could load a DOM tree from the data, use XPath to query the data,
even transform the data with XSLT, all this even though the original
data does not look anything like XML.XmlReader for it
that presents its content in a way that looks like XML. In this
chapter you'll learn how to write a custom
XmlReader by writing a custom
XmlReader subclass. Among the advantages of
writing your own XmlReader subclass is that you
can use your custom XmlReader wherever you would
use any of the built-in XmlReaders. For example,
even if the underlying data isn't formatted using
standard XML syntax, you can pass any instance of a custom
XmlReader to XmlDocument.Load(
) to load the XML document into a DOM (more on
XmlDocument in Chapter 5). You
could load a DOM tree from the data, use XPath to query the data,
even transform the data with XSLT, all this even though the original
data does not look anything like XML.XmlReader for it
that presents its content in a way that looks like XML. In this
chapter you'll learn how to write a custom
XmlReader implementation which will enable you to
read data formatted in PYX, a line-oriented XML format, as if it were
XML.XmlPyxReader, you first need to
understand PYX syntax. PYX is a line-oriented XML syntax, developed
by Sean McGrath, which reflects XML's SGML heritage.
PYX is based on Element Structure Information Set (ESIS), a popular
alternative syntax for SGML.XmlTextWriter is
very simple, if you want to write your XML in standard angle-brackets
syntax. But since you learned how to read PYX in Chapter 3, you should learn how to write PYX here.XmlPyxWriter. After
you look over the code, I'll highlight some
important bits.using System;
using System.Collections;
using System.Globalization;
using System.IO;
using System.Xml;
public class XmlPyxWriter : XmlWriter {
// constructors
public XmlPyxWriter(TextWriter writer) {
this.writer = writer;
}
public XmlPyxWriter(Stream stream) {
this.writer = new StreamWriter(stream);
}
public XmlPyxWriter(string filename) {
this.writer = new StreamWriter(filename);
}
// private instance variables
private TextWriter writer;
private WriteState writeState = WriteState.Start;
private XmlSpace xmlSpace = XmlSpace.Default;
private string xmlLang = CultureInfo.CurrentCulture.ThreeLetterISOLanguageName;
private Stack elementNames = new Stack( );
// private instance methods
private void Write(string text) {
writer.WriteLine("-{0}", text);
if (writeState == WriteState.Element) {
writeState = WriteState.Content;
}
}
private void Write(char ch) {
writer.WriteLine("-{0}", ch);
if (writeState == WriteState.Element) {
writeState = WriteState.Content;
}
}
private void Write(char [ ] buffer, int index, int count) {
writer.WriteLine("-{0}", buffer, index, count);
if (writeState == WriteState.Element) {
writeState = WriteState.Content;
}
}
// properties from XmlWriter
public override WriteState WriteState {
get { return writeState; }
}
public override XmlSpace XmlSpace {
get { return xmlSpace; }
}
public override string XmlLang {
get { return xmlLang; }
}
// methods from XmlWriter
public override void WriteEndDocument( ) {
// no-op
}
public override void WriteComment(string text) {
// no-op
}
public override void WriteStartDocument( ) {
writeState = WriteState.Prolog;
}
public override void WriteStartDocument(bool standalone) {
writeState = WriteState.Prolog;
}
public override void WriteDocType(string name, string pubid, string sysid, string subset){
writeState = WriteState.Prolog;
}
public override void WriteStartElement(string prefix, string localName, string ns) {
writer.WriteLine("({0} ", localName);
elementNames.Push(localName);
writeState = WriteState.Element;
}
public override void WriteEndElement( ) {
writer.WriteLine("){0}", elementNames.Pop( ));
}
public override void WriteFullEndElement( ) {
WriteEndElement( );
}
public override void WriteStartAttribute(string prefix, string localName, string ns) {
writer.Write("A{0} ",localName);
writeState = WriteState.Attribute;
}
public override void WriteEndAttribute( ) {
writer.WriteLine( );
writeState = WriteState.Element;
}
public override void WriteProcessingInstruction(string name, string text) {
writer.WriteLine("?{0} {1}", name, text);
}
public override void WriteEntityRef(string name) {
char ch = ' ';
switch (name) {
case "amp":
ch = '&';
break;
case "lt":
ch = '<';
break;
case "gt":
ch = '>';
break;
case "quot":
ch = '"';
break;
case "apos":
ch = '\'';
break;
}
Write(ch);
}
public override void WriteCData(string text) {
Write(text);
}
public override void WriteCharEntity(char ch) {
Write(ch);
}
public override void WriteWhitespace(string ws) {
Write(ws);
}
public override void WriteString(string text) {
if (writeState == WriteState.Attribute) {
writer.Write("{0}", text);
} else {
Write(text);
}
}
public override void WriteSurrogateCharEntity(char lowChar, char highChar) {
Write(lowChar);
Write(highChar);
}
public override void WriteChars(char [ ] buffer, int index, int count) {
Write(buffer, index, count);
}
public override void WriteRaw(char [ ] buffer, int index, int count) {
Write(buffer, index, count);
}
public override void WriteRaw(string data) {
Write(data);
}
public override void WriteBase64(byte [ ] buffer, int index, int count) {
Write(writer.Encoding.GetChars(buffer), index, count);
}
public override void WriteBinHex(byte [ ] buffer, int index, int count) {
Write(writer.Encoding.GetChars(buffer), index, count);
}
public override void Close( ) {
writer.Close( );
writeState = WriteState.Closed;
}
public override void Flush( ) {
writer.Flush( );
}
public override string LookupPrefix(string ns) {
return string.Empty;
}
public override void WriteNmToken(string name) {
writer.Write(name);
}
public override void WriteName(string name) {
writer.Write(name);
}
public override void WriteQualifiedName(string localName, string ns) {
writer.Write(localName);
}
}XmlReader and
XmlWriter types to read one particular alternative
XML syntax, and how to use them in programs that think
they're reading and writing XML. You can think of
other applications; besides other alternative XML syntaxes, such as
YAML (Yet Another Markup Language) and James Clark's
Compact Syntax for RELAX NG, you could read data from other formats
completely unrelated to XML, such as CSV files, DBF files—even
databases and filesystems.XmlReader and
XmlWriter as they are combined with higher-level
XML functionality, starting with the Document Object Model.XmlReader allows you to access XML data in a
read-only, forward-only manner, but sometimes you need to read XML in
a non-sequential manner. For example, you may want to change the
order of a couple of elements somewhere in the middle of the document
tree. For this purpose, the World Wide Web Consortium developed the
Document Object Model (DOM).Document,
DocumentFragment, DocumentType,
EntityReference, Element,
Attr, ProcessingInstruction,
Comment, Text,
CDATASection, Entity, and
Notation. Some of these node types can have
subnodes, and the types of subnodes that a particular node type can
have are specified. To handle collections of nodes, the DOM also
specifies a NodeList object and, for dictionaries
of nodes (keyed by their names), the NamedNodeMap
object. Figure 5-1 shows the DOM inheritance
hierarchy.
using System;
using System.Xml;
class DomFeatureChecker {
private static readonly string [ ] versions = new string [ ] {
"1.0", "2.0" };
private static readonly string [ ] features = new string [ ] {
"Core", "XML", "HTML", "Views", "Stylesheets", "CSS",
"CSS2", "Events", "UIEvents", "MouseEvents", "MutationEvents",
"HTMLEvents", "Range", "Traversal" };
public static void Main(string[ ] args) {
XmlImplementation impl = new XmlImplementation( );
foreach (string version in versions) {
foreach (string feature in features) {
Console.WriteLine("{0} {1}={2}", feature, version,
impl.HasFeature(feature, version));
}
}
}
}HasFeature( )
method of the XmlImplementation class returns true
if the given feature is implemented. If you run this program with the
.NET Framework version 1.0 or 1.1, you'll see the
following output:Core 1.0=False XML 1.0=True HTML 1.0=False Views 1.0=False Stylesheets 1.0=False CSS 1.0=False CSS2 1.0=False Events 1.0=False UIEvents 1.0=False MouseEvents 1.0=False MutationEvents 1.0=False HTMLEvents 1.0=False Range 1.0=False Traversal 1.0=False Core 2.0=False XML 2.0=True HTML 2.0=False Views 2.0=False Stylesheets 2.0=False CSS 2.0=False CSS2 2.0=False Events 2.0=False UIEvents 2.0=False MouseEvents 2.0=False MutationEvents 2.0=False HTMLEvents 2.0=False Range 2.0=False Traversal 2.0=False
XmlDocument instances simultaneously to manage an
inventory system.System.Xml.XPath assembly.XmlDocument in memory, you could
choose to navigate through its nodes by using
XmlNodeReader to read each node and do some action
if it was of the desired type. Or, you could recursively iterate
through its child nodes, interrogating each child
node's node type and name, until you reach the one
you're interested in. Or, you could use XPath.System.Xml.XPath assembly, and how it allows you
to use XPath in your .NET applications. Finally,
I'll go through some examples using XPath.http://www.w3.org/TR/xpath.http://www.w3.org/TR/xpath.System.Xml.XPath assembly is relatively small,
containing only five classes, six enumerations, and one interface.
There are two ways to select nodes from an XML document with XPath.
The first, which was introduced in Chapter 5, uses
the SelectNodes( ) and SelectSingleNode(
) methods of XmlNode. The second way
uses the XPathNavigator class, obtained by calling
XmlNode.GetNavigator( ) or
XPathDocument.GetNavigator( ).XmlNode
defines
two methods, with two overloads each, to allow navigation via XPath.
SelectSingleNode( ) returns a single
XmlNode that matches the given XPath, and
SelectNodes( ) returns an
XmlNodeList.SelectSingleNode( ) returns a single
XmlNode that matches the given XPath expression.
If more than one node matches the expression, the first one is
returned; the definition of "first"
depends on the order of the axis used. The context node of the XPath
query is set to the XmlNode instance on which the
method is invoked.SelectSingleNode( ) takes just the XPath
expression. The other one takes the XPath expression and an
XmlNamespaceManager. The
XmlNamespaceManager is used to resolve any
prefixes in the XPath expression.using System;
using System.Xml;
using System.Xml.XPath;
public class XPathQuery {
public static void Main(string [ ] args) {
string filename = args[0];
string xpathExpression = args[1];
XmlDocument document = new XmlDocument( );
document.Load(filename);
XmlTextWriter writer = new XmlTextWriter(Console.Out);
writer.Formatting = Formatting.Indented;
XmlNode node = document.SelectSingleNode(xpathExpression);
node.WriteTo(writer);
writer.Close( );
}
}