## With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

No credit card required

# Finding the Intersection of Two Dictionaries

Credit: Andy McKay, Chris Perkins, Sami Hangaslammi

## Problem

Given two dictionaries, you need to find the set of keys that are in both dictionaries.

## Solution

Dictionaries are a good concrete representation for sets in Python, so operations such as intersections are common. Say you have two dictionaries (but pretend that they each contain thousands of items):

```some_dict = { 'zope':'zzz', 'python':'rocks' }
another_dict = { 'python':'rocks', 'perl':'\$' }```

Here’s a bad way to find their intersection that is very slow:

```intersect = []
for item in some_dict.keys(  ):
if item in another_dict.keys(  ):
intersect.append(item)
print "Intersects:", intersect```

And here’s a good way that is simple and fast:

```intersect = []
for item in some_dict.keys(  ):
if another_dict.has_key(item):
intersect.append(item)
print "Intersects:", intersect```

In Python 2.2, the following is elegant and even faster:

`print "Intersects:", [k for k in some_dict if k in another_dict]`

And here’s an alternate approach that wins hands down in speed, for Python 1.5.2 and later:

`print "Intersects:", filter(another_dict.has_key, some_dict.keys())`

## Discussion

The `keys` method produces a list of all the keys of a dictionary. It can be pretty tempting to fall into the trap of just using `in`, with this list as the righthand side, to test for membership. However, in the first example, you’re looping through all of `some_dict`, then each time looping through all of `another_dict`. If `some_dict` has N1 items, and `another_dict` has N2 items, your intersection operation will have a compute time proportional to the product of N1x N2. (O(N1x N2) is the common computer-science notation to indicate this.)

By using the `has_key` method, you are not looping on `another_dict` any more, but rather checking the key in the dictionary’s hash table. The processing time for `has_key` is basically independent of dictionary size, so the second approach is O(N1). The difference is quite substantial for large dictionaries! If the two dictionaries are very different in size, it becomes important to use the smaller one in the role of `some_dict`, while the larger one takes on the role of `another_dict` (i.e., loop on the keys of the smaller dictionary, thus picking the smaller N1).

Python 2.2 lets you iterate on a dictionary’s keys directly, with the statement:

`for key in dict`

You can test membership with the equally elegant:

`if key in dict`

rather than the equivalent but syntactically less nice `dict.has_key(key)`. Combining these two small but nice innovations of Python 2.2 with the list-comprehension notation introduced in Python 2.0, we end up with a very elegant approach, which is at the same time concise, clear, and quite speedy.

However, the fastest approach is the one that uses `filter` with the bound method `another_dict.has_key` on the list `some_dict.keys`. A typical intersection of two 500-item dictionaries with 50% overlap, on a typical cheap machine of today (AMD Athlon 1.4GHz, DDR2100 RAM, Mandrake Linux 8.1), took 710 microseconds using `has_key`, 450 microseconds using the Python 2.2 technique, and 280 microseconds using the `filter`-based way. While these speed differences are almost substantial, they pale in comparison with the timing of the bad way, for which a typical intersection took 22,600 microseconds—30 times longer than the simple way and 80 times longer than the `filter`-based way! Here’s the timing code, which shows a typical example of how one goes about measuring relative speeds of equivalent Python constructs:

```import time

def timeo(fun, n=1000):
def void(  ): pass
start = time.clock(  )
for i in range(n): void(  )
stend = time.clock(  )
overhead = stend - start
start = time.clock(  )
for i in range(n): fun(  )
stend = time.clock(  )
thetime = stend-start
return fun._ _name_ _, thetime-overhead

to500 = {}
for i in range(500): to500[i] = 1
evens = {}
for i in range(0, 1000, 2): evens[i] = 1

def simpleway(  ):
result = []
for k in to500.keys(  ):
if evens.has_key(k):
result.append(k)
return result

def pyth22way(  ):
return [k for k in to500 if k in evens]

def filterway(  ):
return filter(evens.has_key, to500.keys(  ))

def badsloway(  ):
result = []
for k in to500.keys(  ):
if k in evens.keys(  ):
result.append(k)
return result

for f in simpleway, pyth22way, filterway, badsloway:
print "%s: %.2f"%timeo(f)```

You can save this code into a `.py` file and run it (a few times, on an otherwise quiescent machine, of course) with `python -O` to check how the timings of the various constructs compare on any specific machine in which you’re interested. (Note that this script requires Python 2.2 or later.) Timing different code snippets to find out how their relative speeds compare is an important Python technique, since intuition is a notoriously unreliable guide to such relative-speed comparisons. For detailed and general instruction on how to time things, see the introduction to Chapter 17.

When applicable without having to use a `lambda` form or a specially written function, `filter`, `map`, and `reduce` often offer the fastest solution to any given problem. Of course, a clever Pythonista cares about speed only for those very, very few operations where speed really matters more than clarity, simplicity, and elegance! But these built-ins are pretty elegant in their own way, too.

We don’t have a separate recipe for the union of the keys of two dictionaries, but that’s because the task is even easier, thanks to a dictionary’s `update` method:

```def union_keys(some_dict, another_dict):
temp_dict = some_dict.copy(  )
temp_dict.update(another_dict)
return temp_dict.keys(  )```

## See Also

The Library Reference section on mapping types.

## With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

No credit card required