Parsing a Search String

The explosion of data made possible by the technologies of the Internet and the World Wide Web has led to the emergence of applications and services for searching and organizing that mass of data. A typical interface to a search service is a string of keywords that is used to retrieve web pages of interest to the searcher. Services such as Google have very simplified search interfaces, in which each separate word is assumed to be a potential keyword, and the search engine will look for pages containing any of the given keywords (perhaps ranking the pages by the number of keywords present on the page).

In this application, I am going to describe a more elaborate search string interface, with support for AND, OR, and NOT keyword qualifiers. Keywords may be single words delimited by whitespace, or a quoted string for keywords that contain spaces or non-alphanumeric characters, or for a search keyword or phrase that includes one of the special qualifier words AND, OR, or NOT. Here are a few sample search phrases for us to parse:

    wood and blue or red
    wood and (blue or red)
    (steel or iron) and "lime green"
    not steel or iron and "lime green"
    not(steel or iron) and "lime green"

describing objects in the simple universe depicted in this figure.

The universe of all possible things

Figure 1. The universe of all possible things

We would also like to have our parser return the parsed results in a hierarchical ...

Get Getting Started with Pyparsing now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.