BUY THIS BOOK
Add to Cart

Print Book $39.95


Add to Cart

PDF $31.99

Safari Books Online

What is this?

Add to UK Cart

Print Book £28.50

What is this?

Looking to Reprint or License this content?


Practical RDF
Practical RDF

By Shelley Powers
Book Price: $39.95 USD
£28.50 GBP
PDF Price: $31.99

Cover | Table of Contents | Colophon


Table of Contents

Chapter 1: RDF: An Introduction
The Resource Description Framework (RDF) is an extremely flexible technology, capable of addressing a wide variety of problems. Because of its enormous breadth, people often come to RDF thinking that it's one thing and find later that it's much more. One of my favorite parables is about the blind people and the elephant. If you haven't heard it, the story goes that six blind people were asked to identify what an elephant looked like from touch. One felt the tusk and thought the elephant was like a spear; another felt the trunk and thought the elephant was like a snake; another felt a leg and thought the elephant was like a tree; and so on, each basing his definition of an elephant on his own unique experiences.
RDF is very much like that elephant, and we're very much like the blind people, each grabbing at a different aspect of the specification, with our own interpretations of what it is and what it's good for. And we're discovering what the blind people discovered: not all interpretations of RDF are the same. Therein lies both the challenge of RDF as well as the value.
The main RDF specification web site is at http://www.w3.org/RDF/. You can access the core working group's efforts at http://www.w3.org/2001/sw/RDFCore/. In addition, there's an RDF Interest Group forum that you can monitor or join at http://www.w3.org/RDF/Interest/.
RDF is based within the Semantic Web effort. According to the W3C (World Wide Web Consortium) Semantic Web Activity Statement:
The Resource Description Framework (RDF) is a language designed to support the Semantic Web, in much the same way that HTML is the language that helped initiate the original Web. RDF is a framework for supporting resource description, or metadata (data about data), for the Web. RDF provides common structures that can be used for interoperable XML data exchange.
Though not as well known as other specifications from the W3C, RDF is actually one of the older specifications, with the first working draft produced in 1997. The earliest editors, Ora Lassila and Ralph Swick, established the foundation on which RDF rested—a mechanism for working with metadata that promotes the interchange of data between automated processes. Regardless of the transformations RDF has undergone and its continuing maturing process, this statement forms its immutable purpose and focal point.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The Semantic Web and RDF: A Brief History
RDF is based within the Semantic Web effort. According to the W3C (World Wide Web Consortium) Semantic Web Activity Statement:
The Resource Description Framework (RDF) is a language designed to support the Semantic Web, in much the same way that HTML is the language that helped initiate the original Web. RDF is a framework for supporting resource description, or metadata (data about data), for the Web. RDF provides common structures that can be used for interoperable XML data exchange.
Though not as well known as other specifications from the W3C, RDF is actually one of the older specifications, with the first working draft produced in 1997. The earliest editors, Ora Lassila and Ralph Swick, established the foundation on which RDF rested—a mechanism for working with metadata that promotes the interchange of data between automated processes. Regardless of the transformations RDF has undergone and its continuing maturing process, this statement forms its immutable purpose and focal point.
In 1999, the first recommended RDF specification, the RDF Model and Syntax Specification (usually abbreviated as RDF M&S), again coauthored by Ora Lassila and Ralph Swick, was released. A candidate recommendation for the RDF Schema Specification, coedited by Dan Brickley and R.V. Guha, followed in 2000. In order to open up a previously closed specification process, the W3C also created the RDF Interest Group, providing a view into the RDF specification process for interested people who were not a part of the RDF Core Working Group.
As efforts proceeded on the RDF specification, discussions continued about the concepts behind the Semantic Web. At the time, the main difference between the existing Web and the newer, smarter Web is that rather than a large amount of disorganized and not easily accessible data, something such as RDF would allow organization of data into knowledge statements—assertions about resources accessible on the Web. From a Scientific American article published May 2001, Tim Berners-Lee wrote:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The Specifications
As stated earlier, the RDF specification was originally released as one document, the RDF Model and Syntax, or RDF M&S. However, it soon became apparent that this document was attempting to cover too much material in one document, and leaving too much confusion and too many questions in its wake. Thus, a new effort was started to address the issues about the original specification and, hopefully, eliminate the confusion. This work resulted in an updated specification and the release of six new documents: RDF Concepts and Abstract Syntax, RDF Semantics, RDF/XML Syntax Specification (revised), RDF Vocabulary Description Language 1.0: RDF Schema, the RDF Primer, and the RDF Test Cases.
The RDF Concepts and Abstract Syntax and the RDF Semantics documents provide the fundamental framework behind RDF: the underlying assumptions and structures that makes RDF unique from other metadata models (such as the relational data model). These documents provide both validity and consistency to RDF—a way of verifying that data structured in a certain way will always be compatible with other data using the same structures. The RDF model exists independently of any representation of RDF, including RDF/XML.
The RDF/XML syntax, described in the RDF/XML Syntax Specification (revised), is the recommended serialization technique for RDF. Though several tools and APIs can also work with N-Triples (described in Chapter 2) or N3 notation (described in Chapter 3), most implementation of and discussion about RDF, including this book, focus on RDF/XML
The RDF Vocabulary Description Language defines and constrains an RDF/XML vocabulary. It isn't a replacement for XML Schema or the use of DTDs; rather, it's used to define specific RDF vocabularies; to specify how the elements of the vocabulary relate to each other. An RDF Schema isn't required for valid RDF (neither is a W3C XML Schema or an XML 1.0 Document Type Definition—DTD), but it does help prevent confusion when people want to share a vocabulary.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
When to Use and Not Use RDF
RDF is a wonderful technology, and I'll be at the front in its parade of fans. However, I don't consider it a replacement for other technologies, and I don't consider its use appropriate in all circumstances. Just because data is on the Web, or accessed via the Web, doesn't mean it has to be organized with RDF. Forcing RDF into uses that don't realize its potential will only result in a general push back against RDF in its entirety—including push back in uses in which RDF positively shines.
This, then, begs the question: when should we, and when should we not, use RDF? More specifically, since much of RDF focuses on its serialization to RDF/XML, when should we use RDF/XML and when should we use non-RDF XML?
As the final edits for this book were in progress, a company called Semaview published a graphic depicting the differences between XML and RDF/XML (found at http://www.semaview.com/c/RDFvsXML.html). Among those listed was one about the tree-structured nature of XML, as compared to RDF's much flatter triple-based pattern. XML is hierarchical, which means that all related elements must be nested within the elements they're related to. RDF does not require this nested structure.
To demonstrate this difference, consider a web resource, which has a history of movement on the Web. Each element in that history has an associated URL, representing the location of the web resource after the movement has occurred. In addition, there's an associated reason why the resource was moved, resulting in this particular event. Recording these relationships in non-RDF XML results in an XML hierarchy four layers deep:
<?xml version="1.0"?>
<resource>
  <uri>http://burningbird.net/articles/monsters3.htm</uri>
  <history>
    <movement>
       <link>http://www.yasd.com/dynaearth/monsters3.htm</link>
       <reason>New Article</reason>
    </movement>
  </history>
</resource>
In RDF/XML, you can associate two separate XML structures with each other through a Uniform Resource Identifier (URI, discussed in Chapter 2). With the URI, you can link one XML structure to another without having to embed the second structure directly within the first:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Some Uses of RDF/XML
The first time I saw RDF/XML was when it was used to define the table of contents (TOC) structures within Mozilla, when Mozilla was first being implemented. Since then, I've been both surprised and pleased at how many implementations of RDF and RDF/XML exist.
One of the primary users of RDF/XML is the W3C itself, in its effort to define a Web Ontology Language based on RDF/XML. Being primarily a data person and not a specialist in markup, I wasn't familiar with some of the concepts associated with RDF when I first started exploring its use and meaning. For instance, there were references to ontology again and again, and since my previous exposure to this word had to do with biology, I was a bit baffled. However, ontology in the sense of RDF and the Semantic Web is, according to dictionary.com, "An explicit formal specification of how to represent the objects, concepts and other entities that are assumed to exist in some area of interest and the relationships that hold among them."
As mentioned previously, RDF provides a structure that allows us to make assertions using XML (and other serialization techniques). However, there is an interest in taking this further and expanding on it, by creating just such an ontology based on the RDF model, in the interest of supporting more advanced agent-based technologies. An early effort toward this is the DARPA Agent Markup Language program, or DAML. The first implementation of DAML, DAML+OIL, is tightly integrated with RDF.
A new effort at the W3C, the Web Ontology Working Group, is working on creating a Web Ontology Language (OWL) derived from DAML+OIL and based in RDF/XML. The following quote from the OWL Use Cases and Requirements document, one of many the Ontology Working Group is creating, defines the relationship between XML, RDF/XML, and OWL:
The Semantic Web will build on XML's ability to define customized tagging schemes and RDF's flexible approach to representing data. The next element required for the Semantic Web is a Web ontology language which can formally describe the semantics of classes and properties used in web documents. In order for machines to perform useful reasoning tasks on these documents, the language must go beyond the basic semantics of RDF Schema.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Related Technologies
Several complementary technologies are associated with RDF. As previously discussed, the most common technique to serialize RDF data is via RDF/XML, so influences on XML are likewise influences on RDF. However, other specifications and technologies also impact on, and are impacted by, the ongoing RDF efforts.
Though not a requirement for RDF/XML, you can use XML Schemas and DTDs to formalize the XML structure used within a specific instance of RDF/XML. There's also been considerable effort to map XML Schema data types to RDF, as you'll see in the next several chapters.
One issue that arises again and again with RDF is where to include the XML. For instance, if you create an RDF document to describe an HTML page resource, should the RDF be in a separate file or contained within the HTML document? I've seen RDF embedded in HTML and XML using a variety of tricks, but the consensus seems to be heading toward defining the RDF in a separate file and then linking it within the HTML or XHTML document. Chapter 3 takes a closer look at issues related to merging RDF with other formats.
A plethora of tools and utilities work with RDF/XML. Chapter 7 covers some of these. In addition, several different APIs in a variety of languages, such as Perl, Java, Python, C, C++, and so on, can parse, query, and generate RDF/XML. The remainder of the second section of the book explores some of the more stable or representative of these, including a look at Jena, a Java-based API, RAP (RDF API for PHP), Redland's multilanguage RDF API, Perl and Python APIs and tools, and so on.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Going Forward
The RDF Core Working Group spent considerable time ensuring that the RDF specifications answered as many questions as possible. There is no such thing as a perfect specification, but the group did its best under the constraints of maintaining connectivity with its charter and existing uses of RDF/XML.
RDF/XML has been used enough in so many different applications that I consider it to be at a release level with the publication of the current RDF specification documents. In fact, I think you'll find that the RDF specification will be quite stable in its current form after the documents are released—it's important that the RDF specification be stabilized so that we can begin to build on it. Based on this hoped-for stability, you can use the specification, including the RDF/XML, in your applications and be comfortable about future compatibility.
We're also seeing more and more interest in and use of RDF and its associated RDF/XML serialization in the world. I've seen APIs in all major programming languages, including Java, Perl, PHP, Python, C#, C++, C, and so on. Not only that, but there's a host of fun and useful tools to help you edit, parse, read, or write your RDF/XML documents. And most of these tools, utilities, APIs, and so on are free for you to download and incorporate into your current work.
With the release of the RDF specification documents, RDF's time has come, and I'm not just saying that because I wrote this book. I wrote this book because I believe that RDF is now ready for prime time.
Now, time to get started.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 2: RDF: Heart and Soul
RDF's purpose is fairly straightforward: it provides a means of recording data in a machine-understandable format, allowing for more efficient and sophisticated data interchange, searching, cataloging, navigation, classification, and so on. It forms the cornerstone of the W3C effort to create the Semantic Web, but its use isn't restricted to this specific effort.
Perhaps because RDF is a description for a data model rather than a description of a specific data vocabulary, or perhaps because it has a foothold in English, logic, and even in human reasoning, RDF has a strong esoteric element to it that can be intimidating to a person wanting to know a little more about it. However, RDF is based on a well-defined set of rules and constraints that governs its format, validity, and use. Approaching RDF through the specifications is a way of grounding RDF, putting boundaries around the more theoretical concepts.
The chapter takes a look at two RDF specification documents that exist at opposite ends of the semantic spectrum: the RDF Concepts and Abstract Model and the RDF Semantics documents. In these documents we're introduced to the concepts and underlying strategy that form the basis of the RDF/XML that we'll focus on in the rest of the book. In addition, specifically within the Semantics document, we'll be exposed to the underlying meaning behind each RDF construct. Though not critical to most people's use of RDF, especially RDF/XML, the Semantics document ensures that all RDF consumers work from the same basic understanding; therefore, some time spent on this document, primarily in overview, is essential.
Both documents can be accessed directly online, so I'm not going to duplicate the information contained in them in this chapter. Instead, we'll take a look at some of the key elements and unique concepts associated with RDF.
The RDF Concepts and Abstract Syntax document can be found at http://www.w3.org/TR/rdf-concepts/. The RDF Semantics document can be found at http://www.w3.org/TR/rdf-mt/.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The Search for Knowledge
Occasionally, I like to write articles about non-Internet-related topics, such as marine biology or astronomy. One of my more popular articles is on Architeuthis Dux—the giant squid. The article is currently located at http://burningbird.net/articles/monsters1.htm.
According to the web profile statistics for this article, it receives a lot of visitors based on searches performed in Google, a popular search engine. When I go to the Google site, though, to search for the article based on the term giant squid, I find that I get a surprising number of links back. The article was listed on page 13 of the search results (with 10 links to a page). First, though, were several links about a production company, the Jules Verne novel 10,000 Leagues Under the Sea, something to do with a comic book character called the Giant Squid, as well as various other assorted and sundry references such as a recipe for cooking giant squid steaks (as an aside, giant squids are ammonia based and inedible).
For the most part, each link does reference the giant squid as a marine animal; however, the context doesn't match my current area of interest: finding an article that explores the giant squid's roots in mythology.
I can refine my search, specifying separate keywords such as giant, squid, and mythology to make my article appear on page 6 of the list of links—along with links to a Mexican seafood seller offering giant squid meat slabs and a listing of books that discuss a monster called the Giant Squid that oozes green slime.
The reason we get so many links back when searching for specific resources is that most search engines use keyword-based search engine functionality, rather than searching for a resource within the context of a specific interest. The search engines' data is based on the use of automated agents or robots and web spiders that traverse the Web via in-page links, pulling keywords from either HTML meta tags or directly from the page contents themselves.
A better approach for classifying resources such as the giant squid article would be to somehow attach information about the context of the resource. For instance, the article is part of a series comparing two legendary creatures: the giant squid and the Loch Ness Monster. It explores what makes a creature legendary, as well as current and past efforts to find living representatives of either creature. All of this information forms a description of the resource, a
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The RDF Triple
Three is a magical number. For instance, three legs are all you need to create a stable stool, and a transmitter and two receivers are all you need to triangulate a specific transmission point. You can create a perfect sphere with infinitely small triangles. (Triangles are a very useful geometric shape, also used to find the heights of mountains and the distances between stars.)
RDF is likewise based on the principle that three is a magic number—in this case, that three pieces of information are all that's needed in order to fully define a single bit of knowledge. Within the RDF specification, an RDF triple documents these three pieces of information in a consistent manner that ideally allows both human and machine consumption of the same data. The RDF triple is what allows human understanding and meaning to be interpreted consistently and mechanically.
Of the three pieces of information, the first is the subject. A property such as name can belong to a dog, cat, book, plant, person, car, nation, or insect. To make finite such an infinite universe, you must set boundaries, and that's what subject does for RDF. The second piece of information is the property type or just plain property. There are many facts about any individual subject; for instance, I have a gender, a height, a hair color, an eye color, a college degree, relationships, and so on. To define which aspect of me we're interested in, we need to specifically focus on one property.
If you look at the intersection of subject and property, you'll find the final bit of information quietly waiting to be discovered—the value associated with the property. X marks the spot. I (subject) have a name (property), which is Shelley Powers (property value). I (subject) have a height (property), which is five feet eleven inches (property value). I (subject) also have a location (property), which is St. Louis (property value). Each of these assertions adds to a picture that is me; the more statements defined, the better the picture. Stripping away the linguistic filler, each of these statements can be written as an RDF triple.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The Basic RDF Data Model and the RDF Graph
The RDF Core Working Group decided on the RDF graph—a directed labeled graph—as the default method for describing RDF data models for two reasons. First, as you'll see in the examples, the graphs are extremely easy to read. There is no confusion about what is a subject and what are the subject's property and this property's value. Additionally, there can be no confusion about the statements being made, even within a complex RDF data model.
The second reason the Working Group settled on RDF graphs as the default description technique is that there are RDF data models that can be represented in RDF graphs, but not in RDF/XML.
The addition of rdf:nodeIDs, discussed in Chapter 3, provided some of the necessary syntactic elements that allow RDF/XML to record all RDF graphs. However, RDF/XML still can't encode graphs whose properties (predicates) cannot be recorded as namespace-qualified XML names, or QNames. For more on QNames, see XML in a Nutshell, Second Edition (O'Reilly).
The RDF directed graph consists of a set of nodes connected by arcs, forming a pattern of node-arc-node. Additionally, the nodes come in three varieties: uriref, blank nodes, and literals.
A uriref node consists of a Uniform Resource Identifier (URI) reference that provides a specific identifier unique to the node. There's been discussion that a uriref must point to something that's accessible on the Web (i.e., provide a location of something that when accessed on the Internet returns something). However, there is no formal requirement that urirefs have a direct connectivity with actual web resources. In fact, if RDF is to become a generic means of recording data, it can't restrict urirefs to being "real" data sources.
Blank nodes are nodes that don't have a URI. When identifying a resource is meaningful, or the resource is identified within the specific graph, a URI is given for that resource. However, when identification of the resource doesn't exist within the specific graph at the time the graph was recorded, or it isn't meaningful, the resource is diagrammed as a blank node.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
URIs
Since an understanding of urirefs is central to working with RDF, we'll take a moment to look at what makes a valid URI—the identifiers contained within a uriref and used to identify specific predicates.
Resources can be accessed with different protocols and using different syntaxes, such as using http:// to access a resource as a web page and ftp:// to access another resource using FTP. However, one thing each approach shares is the need to access a specific object given a unique name or identifier. URIs provide a common syntax for naming a resource regardless of the protocol used to access the resource. Best of all, the syntax can be extended to meet new needs and include new protocols.
URIs are related to URLs (Uniform Resource Locators) in that a URL is a specific instance of a URI scheme based on a known protocol, commonly the Hypertext Transfer Protocol (HTTP). URIs, and URLs for that matter, can include either a complete location or path to a resource or a partial or relative path. The URI can optionally include a fragment identifier, separated from the URI by a pound sign (#). In the following example, http://burningbird.net/articles/monsters3.htm is the URI and introduction is the fragment:
http://burningbird.net/articles/monsters3.htm#introduction
A URI is only an identifier. A specific protocol doesn't need to be specified, nor must the object identified physically exist on the Web—you don't have to specify a resolvable protocol such as http:// or ftp://, though you can if you like. Instead, you could use something as different as a UUID (Universally Unique Identifier) referencing a COM or other technology component that exists locally on the same machine or within a network of machines. In fact, a fundamental difference between a URL and a URI is that a URL is a location of an object, while a URI can function as a name or a location. URIs also differ from URNs (Uniform Resource Name) because URIs can refer to a location as well as a name, while URNs refer to globally unique names.
The RDF specification constrains all urirefs to be absolute or partial URIs. An absolute URI would be equivalent to the URL:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
RDF Serialization: N3 and N-Triples
Though RDF/XML is the serialization technique used in the rest of this book, another serialization technique supported by many RDF applications and tools is N-Triples. This format breaks an RDF graph into its separate triples, one on each line. Regardless of the shorthand technique used within RDF/XML, N-Triples generated from the same RDF graph always come out the same, making it an effective way of validating the processing of an RDF/XML document. For instance, the test cases in the RDF Test Cases document, part of the RDF specification, are given in both the RDF/XML format and the N-Triples format to ensure that the RDF/XML (and the underlying RDF concepts) are consistently interpreted.
Though other techniques for serialization exist, as has been previously discussed, the only serialization technique officially adopted by the RDF specifications is RDF/XML.
N-Triples itself is based on another notation, called N3.
RDF/XML is the official serialization technique for RDF data, but another notation is also used frequently, which is known as N3 or Notation3. It's important you know how to read it; however, since this book is focusing on RDF/XML, we'll look only briefly at N3 notation.
N3 exists independent of RDF and can extend RDF in such a way as to violate the semantics of the underlying RDF graph. Some prefer N3 to RDF/XML; I am not one of them, primarily because I believe RDF/XML is a more comfortable format for people more used to markup (such as XML or HTML).
The basic structure of an N3 triple is:
               subject 
               predicate 
               object .
In this syntax, the subject, predicate, and object are separated by spaces, and the triple is terminated with a period (.). An actual example of N3 would be:
<http://weblog.burningbird.net/fires/000805.htm> 
                               <http://purl.org/dc/elements/1.1/creator> Shelley .
In this example, the absolute URIs are surrounded by angle brackets. To simplify this even further, namespace-qualified XML names (QNames) can be used instead of the full namespace, as long as the namespaces are declared somewhere within the document. If QNames are used, the angle brackets are omitted for the predicates:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Talking RDF: Lingo and Vocabulary
Right at this moment, you have enough understanding of the RDF graph to progress into the RDF/XML syntax in the next chapter. However, if you follow any of the conversations related to RDF, some terms and concepts might cause confusion. Before ending this chapter on the RDF graph, I thought I would spend some time on these potentially confusing concepts.
In any RDF graph, a subgraph of the graph would be a subset of the triples contained in the graph. As I said earlier, each triple is uniquely its own RDF graph, in its own right, and can actually be modeled within a separate directed graph. In Figure 2-3, the triple represented by the following is a subgraph of the entire set of N-Triples representing the entire graph:
<http://burningbird.net/articles/monsters3.htm> <http://burningbird.net/postcon/
elements/1.0/title> "Architeuthis Dux"
Taking this concept further, a union of two or more RDF graphs is a new graph, which the Model document calls a merge of the graphs. For instance, Figure 2-4 shows one graph containing exactly one RDF triple (one statement).
Figure 2-4: RDF graph with exactly one triple
Adding the following triple results in a new merged graph, as shown previously in Figure 2-3. Since both triples share the same subject, as determined by the URI, the mergence of the two attaches the two different triples to the same subject:
<http://burningbird.net/articles/monsters3.htm> <http://burningbird.net/postcon/elements/1.0/author> "Shelley Powers"
Now, if the subjects differed, the merged graph would still be valid—there is no rule or regulation within the RDF graph that insists that all nodes be somehow connected with one another. All the RDF graph insists on is that the triples are valid and that the RDF used with each is valid. Figure 2-5 shows an RDF graph of two merged graphs that have disconnected nodes.
Figure 2-5: Merged RDF graph with disconnected nodes
Blank nodes are never merged in a graph because there is no way of determining whether two nodes are the same—one can't assume similarity because of artificially generated identifiers. The only components that are merged are urirefs and literals (because two literals that are syntactically the same can be assumed to be the same). In fact, when tools are given two graphs to merge and each graph contains blank nodes, each blank node is given a unique identifier in order to separate it from the others before the mergence.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 3: The Basic Elements Within the RDF/XML Syntax
The usability of RDF is heavily dependent on the portability of the data defined in the RDF models and its ability to be interchanged with other data. Unfortunately, recording the RDF data in a graph—the default RDF documentation format—is not the most efficient means of storing or retrieving this data. Instead, transporting RDF data, a process known as serialization, usually occurs with RDF/XML.
Originally, the RDF model and the RDF/XML syntax were incorporated into one document, the Resource Description Framework (RDF) Model and Syntax Specification. However, when the document was updated, the RDF model was separated from the document detailing the RDF/XML syntax. Chapter 2 covered the RDF abstract model, graph, and semantics; this chapter provides a general introduction to the RDF/XML model and syntax (RDF M&S).
The original RDF M&S Specification can be found at http://www.w3.org/TR/REC-rdf-syntax/. The updated RDF/XML Syntax Specification (revised) can be found at http://www.w3.org/TR/rdf-syntax-grammar/.
Some RDF-specific aspects of RDF/XML at first make it seem overly complex when compared to non-RDF XML. However, keep in mind that RDF/XML is nothing more than well-formed XML, with an overlay of additional constraints that allow for easier interchange, collection, and mergence of data from multiple models. In most implementations, RDF/XML is parsable with straight XML technology and can be manipulated manually if you so choose. It's only when the interchangeability of the data is important and the data can be represented only by more complex data structures and relationships that the more formalized elements of RDF become necessary. And in those circumstances, you'll be glad that you have the extra capability.
All examples listed in the chapter are validated using the W3C's RDF Validator, located at http://www.w3.org/RDF/Validator/.
Serialization converts an object into a persistent form. The RDF/XML syntax provides a means of documenting an RDF model in a text-based format, literally
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Serializing RDF to XML
Serialization converts an object into a persistent form. The RDF/XML syntax provides a means of documenting an RDF model in a text-based format, literally serializing the model using XML. This means that the content must both meet all requirements for well-formed XML and the additional constraints of RDF. However, before showing you some of these constraints, let's walk through an example of using RDF/XML.
RDF doesn't require XML-style validity, just well-formedness. RDF/XML parsers and validators do not use DTDs or XML Schemas to ensure that the XML used is valid. Norman Walsh wrote a short article for xml.com on what it means for an XML document to be well formed and/or valid; it explains the two concepts in more detail. See it at http://www.xml.com/pub/a/98/10/guide3.html.
In Chapter 2, I discussed an article I wrote on the giant squid. Now, consider attaching context to it. Among the information that could be exposed about the article is that it explores the idea of the giant squid as a legendary creature from myths and lore; it discusses the current search efforts for the giant squid; and it provides physical characteristics of the creature. Putting this information into a paragraph results in the following:
The article on giant squids, titled "Architeuthis Dux," at 
http://burningbird.net/articles/monsters3.htm, written by Shelley Powers, explores 
the giant's squid's mythological representation as the legendary Kraken as well 
as describing current efforts to capture images of a live specimen. In addition, 
the article also provides descriptions of a giant squid's physical 
characteristics. It is part of a four-part series, described at 
http://burningbird.net/articles/monsters.htm and entitled "A Tale of Two 
Monsters." 
Reinterpreting this information into a set of statements, each with a specific predicate (property or fact) and its associated value, I come up with the following list:
  • The article is uniquely identified by its URI,
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
RDF Blank Nodes
It would be easy to extrapolate a lot of meaning about blank nodes but, bottom line, a blank node represents a resource that isn't currently identified. As with the infamous null value from the relational data model, there could be two reasons why the identifying URI is absent: either the value will never exist (isn't meaningful) or the value could exist but doesn't at the moment (currently missing).
Most commonly, a blank node—known as a bnode, or occasionally anonymous node—is used when a resource URI isn't meaningful. An example of this could be a representation of a specific individual (since most of us don't think of humans with URIs).
In RDF/XML, a blank node is represented by an oval (it is a resource), with either no value in the oval or a computer-generated identifier. The RDF/XML Validator generates an identifier, which it uses within the blank node to distinguish it from other blank nodes within the graph. Most tools generate an identifier for blank nodes to differentiate them.
In Example 3-8, bio attributes are grouped within an enclosing PostCon bio resource. Since the bio doesn't have its own URI, a blank node represents it within the model.
Example 3-8. Blank node within RDF model
<?xml version="1.0"?>
<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:pstcn="http://burningbird.net/postcon/elements/1.0/"
  xml:base="http://burningbird.net/articles/">

  <rdf:Description rdf:about="monsters1.htm">
     <pstcn:bio>
       <rdf:Description>
     		<pstcn:title>Tale of Two Monsters: Legends</pstcn:title>
            <pstcn:description>
               Part 1 of four-part series on cryptozoology, legends, 
               Nessie the Loch Ness Monster and the giant squid.
            </pstcn:description>
     		<pstcn:created>1999-08-01T00:00:00-06:00</pstcn:created>
     		<pstcn:creator>Shelley Powers</pstcn:creator>
  	 </rdf:Description>
     </pstcn:bio>
   </rdf:Description>
</rdf:RDF>
Running this example through the RDF Validator gives the directed graph shown in Figure 3-6 (modified to fit within the page).
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
URI References
All predicates within RDF/XML are given as URIs, and most resources—other than those that are treated as blank nodes—are also given URIs. A basic grounding of URIs was given in Chapter 2, but this section takes a look at how URIs are used within the RDF/XML syntax.
Not all URI references in a document are full URIs. It's not uncommon for relative URI references to be given, which then need to be resolved to a base URI location. In the previous examples, the full resource URI is given within the rdf:about attribute. Instead of using the full URI, the example could be a relative URI reference, which resolves to the base document concatenated with the relative URI reference. In the following, the relative URI reference "#somevalue.htm":
  <rdf:Description rdf:about="#somevalue">
then becomes http://burningbird.net/articles/somedoc.htm#somevalue if the containing document is http://burningbird.net/articles/somedoc.htm. To resolve correctly, the relative URI reference must be given with the format of pound sign (#) followed by the reference ("#somevalue").
Normally, when a full URI is not provided for a specific resource, the owning document's URL is considered the base document for forming full URIs given relative URI references. So if the document is http://burningbird.net/somedoc.htm, the URI base is considered to be this document, and changes of the document name or URL change the URI for the resource.
With xml:base, you can specify a base document that's used to generate full URIs when given relative URI references, regardless of the URL of the owning document. This means that your URIs can be consistent regardless of document renaming and movement.
The xml:base attribute is added to the RDF/XML document, usually in the same element tag where you list your namespaces (though it can be placed anywhere). Redefining Example 3-6 with xml:base and using a relative URI reference would give you the RDF/XML shown in Example 3-10.
Example 3-10. Using xml:base to define the base document for relative URI references
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Representing Structured Data with rdf:value
Not all data relations in RDF represent straight binary connections between resource and object value. Some data values, such as measurement, have both a value and additional information that determines how you treat that value. In the following RDF/XML:
<pstcn:lastEdited>18</pstcn:lastEdited>
the statement is ambiguous because we don't know exactly what 18 means. Is it 18 days? Months? Hours? Did a person identified by the number 18 edit it?
To represent more structured data, you can include the additional information directly in the value:
<pstcn:lastEdited>18 days</pstcn:lastEdit>
However, this type of intelligent data then requires that systems know enough to split the value from its qualifier, and this goes beyond what should be required of RDF parsers and processors. Instead, you could define a second vocabulary element to capture the qualifier, such as:
<pstcn:lastEdited>18</pstcn:lastEdited>
<pstcn:lastEditedUnit>day</pstcn:lastEditedUnit>
This works, but unfortunately, there is a disconnect between the value and the unit because the two are only indirectly related based on their relationship with the resource. So the syntax is then refined, which is where rdf:value enters the picture. When dealing with structured data, the rdf:value predicate includes the actual value of the structure—it provides a signal to the processor that the data itself is included in this field, and all other members of the structure are qualifiers and additional information about the structure.
Redefining the data would then result in:
<pstcn:lastEdited rdf:parseType="Resource">
    <rdf:value>18</rdf:value>
   <pstcn:lastEditedUnit>day</pstcn:lastEditedUnit>
</pstcn:lastEdited>
Now, not only do we know that we're dealing with structured data, we know what the actual value, the kernel of the data so to speak, is by the use of rdf:value. You could use your own predicate, but rdf:value is global in scope—it crosses all RDF vocabularies—making its use much more attractive if you're concerned about combining your vocabulary data with other data.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The rdf:type Property
One general piece of information that is consistent about an RDF resource—outside of the URI to uniquely identify it—is the resource or class type. In the examples shown thus far, this value could implicitly be "Web Resource" to refer to all of the resources, or could be explicitly set to "article" for articles. All these would be correct, depending on how generically you want to define the resource and the other properties associated with the resource. To explicitly define the resource type, you would use the RDF rdf:type property.
Usually the rdf:type property is associated at the same level of granularity as the other properties. As the resources defined using RDF in this chapter all have properties associated more specifically with an article than a web resource, the RDF type property would be "article" or something similar.
In the next section, covering RDF containers, we will learn that the resource type for an RDF container would be the type of container rather than the type of the contained property or resource. Again, the type is equivalent to the granularity of the resource being described, and with containers, the resource is a canister (or group) of resources or properties rather than a specific resource or property.
The value of the RDF rdf:type property is a URI identifying an rdfs:Class-typed resource (rdfs:Class is described in detail in Chapter 5). To demonstrate how to attach an explicit type to a resource, Example 3-13 shows the resource defined in the RDF/XML for Example 3-1, but this time explicitly defining an RDF Schema element for the resource.
Example 3-13. Demonstrating the explicit resource property type
<?xml version="1.0"?>
<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:pstcn="http://burningbird.net/postcon/elements/1.0/">
  <rdf:Description rdf:about="http://burningbird.net/articles/monsters3.htm">
    <pstcn:Author>Shelley Powers</pstcn:Author>
    <pstcn:Title>Architeuthis Dux</pstcn:Title>
    <rdf:type rdf:resource="http://burningbird.net/postcon/elements/1.0/Article" />
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
RDF/XML Shortcuts
An RDF/XML shortcut is just what it sounds like—an abbreviated technique you can use to record one specific characteristic of an RDF model within RDF/XML. In the last section, we looked at using a shortcut to embed a resource's type with the resource definition. Other RDF/XML shortcuts you can use include:
  • Separate predicates can be enclosed within the same resource block.
  • Nonrepeating properties can be created as resource attributes.
  • Empty resource properties do not have to be formally defined with description blocks.
The first shortcut or abbreviated syntax—enclosing all predicates (properties) for the same subject within that subject block—is so common that it's unlikely you'll find RDF/XML files that repeat the resource for each property. However, the RDF/XML in Example 3-13 is equivalent to that shown in Example 3-15.
Example 3-15. Fully separating each RDF statement into separate XML block
<?xml version="1.0"?>
<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:pstcn="http://burningbird.net/postcon/elements/1.0/">
  <rdf:Description rdf:about="http://dynamicearth.com/articles/monsters3.htm">
    <pstcn:author>Shelley Powers</pstcn:author>
  </rdf:Description>
  <rdf:Description rdf:about="http://burningbird.net/articles/monsters3.htm">
    <pstcn:title>Architeuthis Dux</pstcn:title>
  </rdf:Description>
</rdf:RDF>
If you try this RDF/XML within the RDF Validator, you'll get exactly the same model as you would with the RDF/XML from Example 3-1.
The RDF/XML from Examples Example 3-1 and Example 3-13 also demonstrates that you can generate an RDF graph from RDF/XML, but when you then convert it back into RDF/XML from the graph, you won't always get the same RDF/XML that you started with. In this example, the graph for both RDF/XML documents would most likely reconvert back to the document shown in Example 3-1, rather than the one shown in Example 3-13.
For the second instance of abbreviated syntax, we'll again return to RDF/XML in Example 3-1. Within this document, each of the resource properties is listed within a separate XML element. However, using the second abbreviated syntax—nonrepeating properties can be created as resource attributes—properties that don't repeat and are literals can be listed directly in the resource element, rather than listed out as separate formal predicate statements.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
More on RDF Data Types
RDF data types were discussed in Chapter 2, but their impact extends beyond just the RDF abstract model and concepts. RDF data types have their own XML constructs within the RDF/XML specification.
For instance, you can use the xml:lang attribute to specify a language for each RDF/XML element. In the examples in this English-language book, the value would be "en", and would be included within an element as follows:
<pstcn:reason xml:lang="en">First in the series</pstcn:reason>
You can find out more about xml:lang at http://www.w3.org/TR/REC-xml#sec-lang-tag.
You can also specify a general type for a predicate object with rdf:parseType. We've seen rdf:parseType of "Resource", but you can also use rdf:parseType of "Literal":
<pstcn:reason xml:lang="en" rdf:parseType="Literal"><h1>Reason</h1></pstcn:reason>
By using rdf:parseType="Literal", you are telling the RDF/XML parser to treat the contents of a predicate as a literal value rather than parse it out for new RDF/XML elements. This allows you to embed XML into an element that is not parsed.
Some implementations of RDF/XML specifically recommend using rdf:parseType="Literal" as a way of including unparsed XML within a document, to bypass having to formalize the XML into an RDF/XML valid syntax. This attribute was never intended to bypass best practices. If the data contained in the attribute is recurring, best practice would be to formalize the XML into RDF/XML and incorporate it into the vocabulary or create a new vocabulary.
RDF also allows for typed literals, which contain a reference to the data type of the literal compatible with the XML Schema data types. In the N3 notation, the typed literal would look similar to the following, as pulled from the RDF Primer:
ex:index.html  exterms:creation-date  "1999-08-16"^^xsd:date .
The format of the literal string is value (1999-08-16), data type URI (^^ in this example), and XML Schema data type (xsd:date).
As interesting as this format is, one could see how this approach lacks some popularity, primarily because of the intelligence built directly into the string, which can be missed depending on the XML parser that forms the basis of the RDF/XML parser. Luckily, within RDF/XML, the data type is specified as an attribute of the element, using the
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
RDF/XML: Separate Documents or Embedded Blocks
By convention, RDF/XML files are stored as separate documents and given the extension of .rdf (just rdf for Mac systems). The associated MIME type for an RDF/XML document is: application/rdf+xml.
There's been considerable discussion about embedding RDF within other documents, such as within non-RDF XML and HTML. I've used RDF embedded within HTML pages, and I know other applications that have done the same.
The problem with embedding, particularly within HTML documents, is that it's not a simple matter to separate the RDF/XML from the rest of the content. If the RDF/XML used consists of a resource and its associated properties listed as attributes of the resource, this isn't a problem. An example of this would be:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:dc="http://purl.org/dc/elements/1.1/">
<rdf:Description
    about="http://burningbird.net/cgi-bin/mt-tb.cgi?tb_id=121"
    dc:title="Good RSS"
    dc:identifier="http://weblog.burningbird.net/archives/000619.php"
    dc:subject="Technology"
    dc:description="Mark Pilgrim and Sam Ruby created an RSS Validator for us to use 
to validate our RSS feeds, and Bill Kearney was kind enough to host it. Many 
appreciations, folks. I ran the Validator against my RSS feeds (both Userland..."
    dc:creator="shelley"
    dc:date="2002-10-2209:46:26-06:00" />
</rdf:RDF>
This is RDF/XML that's generated by a weblogging tool called Movable Type (found at http://moveabletype.org). It's used for the tool's trackback feature, which allows webloggers to notify each other when they reference each other's posts in their own.
All of the data is contained in RDF/XML element attributes. Including all of the properties as attributes means that there is no visible XML content contained within any element and therefore parsed by the HTML parser and displayed in the page—all of the data is contained in RDF/XML element attributes.
This is pretty handy, but not all RDF/XML can use the abbreviated syntax that allows us to convert RDF properties to XML attributes. In those cases, the approach I use to embed RDF within an HTML document is to include it within
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 4: Specialized RDF Relationships: Reification, Containers, and Collections
Reification, collections, and containers deserve separate coverage from the rest of the RDF/XML syntax, primarily because these constructs have caused the most controversy and confusion. And most of this has to do with meaning.
It isn't precisely clear what is happening, for instance, when I use reification syntax within an RDF/XML document. Am I making a statement about a statement? Am I claiming a special truth for the statement? Or how about the use of a collection or container—is there an interpretation of the relationship of the items within the groups that extends beyond the fact that the items are grouped?
During the process of revamping the RDF specification, the RDF Working Group at one time actually pushed for the removal of containers because the semantics associated with them could be easily emulated using rdf:type. There was also less than general approbation for the concept of reification, which no one seemed to be quite happy with. However, the group kept containers and reification, as well as adding in collections, but with a caveat: no additional semantics are attached to these constructs other than those that carefully delimited within the RDF documentation. Any additional interpretation would then be between the RDF toolmaker and the people who built the RDF vocabularies and used the tools. However, even within this, there is common acceptance of additional semantics, particularly as semantics relate to containers; of that, one can almost be guaranteed.
In this chapter, we'll not only look more closely at the physical aspects of reification, collections, and containers, we'll also look at what they "mean," intended or otherwise.
As I was writing this book, the RDF Working Group issued a document titled "Refactoring RDF/XML Syntax" detailing modifications to the RDF Model and Syntax Specification. One of the major changes to the specification was a modification related to RDF containers, the subject of this section. However, since the recommended modifications were fairly extensive, they couldn't be covered within a note.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Containers
As I was writing this book, the RDF Working Group issued a document titled "Refactoring RDF/XML Syntax" detailing modifications to the RDF Model and Syntax Specification. One of the major changes to the specification was a modification related to RDF containers, the subject of this section. However, since the recommended modifications were fairly extensive, they couldn't be covered within a note.
I rewrote this section of the book only to have the Working Group somewhat reverse itself as to the legitimacy of containers—containers would be included in the RDF/XML syntax, but their meaning would be constrained.
To ensure a proper perspective of containers, the next section contains an overview of containers as they were modeled in the original specification; a section detailing the changes from the refactoring follows. Finally, at the end I summarize containers as they are understood in the newest release of the RDF Syntax Specification.
Resource properties can occur singly or in groups. To this point, we've looked at recording only individual properties, but RDF needs to record multiply occurring properties.
The creators of the RDF syntax were aware of this and created the concept of RDF Containers specifically for handling multiple resources or for handling multiple literals (properties). Each of the several types of RDF Containers has different behaviors and constraints.
This section covers containers as implemented in the first release of