Convert between XML and native Python data structures with jxmlease

A new module for easing the pain of XML conversion.

By Jonathan Looney
May 19, 2016
Cricket boundary rope Cricket boundary rope (source: Oliver Brown via Wikimedia Commons)

When Stacy Smith and I started writing Automating Junos Administration, we defined our target audience (network engineers and network automation programmers), and we discovered that they would probably feel most comfortable using Python for their automation scripting. Yes, the language choice is sometimes constrained by the tools. And, yes, some might prefer other languages. But, we felt that Python was the thing the most people would have in common. So, we set out to write our scripting examples using Python, where possible. And, because the Junos software includes a rich XML API, that meant that we needed a good way to process XML in Python. In this article I describe a new open-source project called jxmlease, which is a Python module for converting between XML and native Python data structures, why we created it, and how you may be able to use it to simplify the handling of XML data in your Python scripts.

Tackling the challenges of XML conversion

Early in the process of writing the book, we realized that we needed to have an easier way to process XML in Python. The lxml module is extremely feature-rich; however, it uses custom data structures that don’t map well to Python data structures. For example, if you want to get the value of the “c” element in Example 1, you can’t just evaluate lxml_object['a']['b']['c'].

Learn faster. Dig deeper. See farther.

Join the O'Reilly online learning platform. Get a free trial today and find answers on the fly, or master something new and useful.

Learn more

Example 1:

 <a>
      <b>
          <c>foo</c>
      </b>
  </a>

Don’t get me wrong: the lxml module is a great module. But, it is something that an XML power-user will love much more than a casual XML user.

We looked for some other Python modules that would convert XML to native Python data structures. We found a pretty good one called xmltodict. The xmltodict module converts XML data to native Python data structures using one of the XML-to-JSON conversion algorithms. But, it is worth noting that none of the algorithms for converting XML to JSON are particularly great. They all suffer from a couple problems:

  1. They need to convert XML (which contains both data and metadata) to JSON (which contains only data). That means that the encoding of the data needs to change to accommodate the metadata (or, even the possibility of metadata).
  2. XML documents don’t contain data types. This means that the entity that does the conversion to JSON may not be able to distinguish single-member lists from a scalar variable. As a result, the conversion may make bad assumptions. (For example, it may always use lists, even when not necessary, or it may fail to use lists when there is only a single member.)

The xmltodict module (while quite good at what it does) suffers from the same problems that other XML to JSON conversion algorithms face. Again, please understand me: The xmltodict module is a good module. But it gets a little bit complicated when you throw in metadata and single-member lists. And Junos XML data has both.

jxmlease is born

To meet the needs of our readers, we decided to create a new open-source project called jxmlease. Our goal was to make using XML documents with both data and metadata in Python just as easy as using JSON documents with only data. I think we succeeded.

jxmlease is a Python module that converts XML data into Python objects in a way that preserves the structure of the original XML data, while also maintaining the metadata. It has features to ease the processing of the data (such as handling variable-length lists, or converting lists into dictionaries based on a key). It also allows you to reverse the process, easily converting normal Python objects into XML data, while optionally appending metadata.

How jxmlease works

One of the important realizations we made was that Python objects also have metadata. So, we can represent the XML data as normal Python objects, but store the XML metadata as metadata in the resulting Python objects.

Using jxmlease, you can easily convert XML data to Python data structures. Here, the ‘xml’ variable contains the XML document shown in Example 2 (below). You convert it to Python data objects and print it:

 >>> root = jxmlease.parse(xml)
 >>> root.prettyprint()
  {'a': {'b': {'z': 'foo'},
         'c': {'d': {'z': 'bar'}},
         'e': {'z': 'baz'}}}

But, you can still access the metadata, too:

  >>> root['a']['b']['z'].get_xml_attr("changed")
  'true'

Note: If you are using Python 2, you may see that the strings have a u prefix. (For example, you might see u'true' instead of 'true'.) This is Python 2’s way of representing Unicode strings.

And, because the generate objects are subclasses of Python dict, list, and unicode string objects, you can use the normal Python tools and methods to work with them.

Example 2:

  <a>
      <b>
          <z changed="true">foo</z>
      </b>
      <c>
          <d>
              <z changed="true">bar</z>
          </d>
      </c>
      <e>
          <z>baz</z>
      </e>
  </a>

We also have extensions to ease interaction with Junos configuration. Take this XML snippet:

  <configuration>
    <routing-instances>
      <instance>
        <name junos:key="key">foo</name>
        <instance-type>virtual-router</instance-type>
        <routing-options>
          <autonomous-system>
            <as-number>3.3</as-number>
          </autonomous-system>
        </routing-options>
      </instance>
      <instance>
        <name junos:key="key">bar</name>
        <instance-type>virtual-router</instance-type>
        <routing-options>
          <autonomous-system>
            <as-number>3.3</as-number>
          </autonomous-system>
        </routing-options>
      </instance>
    </routing-instances>
  </configuration>

The code will represent this XML fragment as a normal dictionary, with a list of instances. However, we have the key information available to us (see the junos:key="key" attributes above). So, the code will let us transform the list of instances into a dictionary with the correct keys:

  >>> ri = root['configuration']['routing-instances']
  >>> ri['instance'].jdict().prettyprint(depth=2)
  {'bar': {'instance-type': 'virtual-router',
           'name': 'bar',
           'routing-options': {...}},
  'foo': {'instance-type': 'virtual-router',
           'name': 'foo',
           'routing-options': {...}}}

Like the mechanical XML to JSON transformation programs, jxmlease can’t distinguish between single-member lists and scalar values. We solve this by providing a list() method which a developer can use. If the XML already contained a list of objects with the same tag, the list() method simply returns a list. If the XML only contained a single element with that tag, the list() method returns a single-member list. This lets the developer write his program to expect a list, while letting jxmlease worry about standardizing the data.

If you only want certain data from the XML file you are parsing, it is easy to extract that data through a parse-time generator. We wrote our book using the DocBook XML data structure. The SLAX examples are in <screen language="slax"> nodes. I used this code to extract the SLAX examples from one of the chapters I wrote. It loops over all <screen> elements. Within the loop, it saves any <screen> element with a language attribute of “slax”:

  def extract_slax(filename):

     rv = []

     fp = open(filename, "r")

     for _, _, node in jxmlease.parse(fp, generator="screen"):

         if node.get_xml_attr('language', '') == 'slax':

             rv.append(node)

     fp.close()

     return rv

On the output side, you can easily convert Python objects to XML data:

 >>> my_data = {'a': {'b': 'foo', 'c': 'bar'}}
 >>> print(jxmlease.emit_xml(my_data))
 <?xml version="1.0" encoding="utf-8"?>
 <a>
      <c>bar</c>
      <b>foo</b>
 </a>

Getting started with jxmlease

We think this module will help anyone who is using XML data in Python. We particularly think this module has the potential to make it much easier for users to interact with the Junos software’s XML API. In fact, we think this will be even easier than using the JSON format provided by the Junos software.

The module has been released as open-source software. For more information, you can view the project’s GitHub page, which includes a link to the documentation. To get started using the module, you can install it via pip.

Post topics: Operations
Share: