## With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

No credit card required

# 1.8. Checking Whether a String Contains a Set of Characters

Credit: Jürgen Hermann, Horst Hansen

## Problem

You need to check for the occurrence of any of a set of characters in a string.

## Solution

The simplest approach is clear, fast, and general (it works for any sequence, not just strings, and for any container on which you can test for membership, not just sets):

```def containsAny(seq, aset):
""" Check whether sequence seq contains ANY of the items in aset. """
for c in seq:
if c in aset: return True
return False```

You can gain a little speed by moving to a higher-level, more sophisticated approach, based on the `itertools` standard library module, essentially expressing the same approach in a different way:

```import itertools
def containsAny(seq, aset):
for item in itertools.ifilter(aset._ _contains_ _, seq):
return True
return False```

## Discussion

Most problems related to sets are best handled by using the `set` built-in type introduced in Python 2.4 (if you're using Python 2.3, you can use the equivalent `sets.Set` type from the Python Standard Library). However, there are exceptions. Here, for example, a pure set-based approach would be something like:

```def containsAny(seq, aset):
return bool(set(aset).intersection(seq))```

However, with this approach, every item in `seq` inevitably has to be examined. The functions in this recipe's Solution, on the other hand, "short-circuit": they return as soon as they know the answer. They must still check every item in `seq` when the answer is `False`—we could never affirm that no item in `seq` is a member of `aset` without examining all the items, of course. But when the answer is `True`, we often learn about that very soon, namely as soon as we examine one item that is a member of `aset`. Whether this matters at all is very data-dependent, of course. It will make no practical difference when `seq` is short, or when the answer is typically `False`, but it may be extremely important for a very long `seq` (when the answer can typically be soon determined to be `True`).

The first version of `containsAny` presented in the recipe has the advantage of simplicity and clarity: it expresses the fundamental idea with total transparency. The second version may appear to be "clever", and that is not a complimentary adjective in the Python world, where simplicity and clarity are core values. However, the second version is well worth considering, because it shows a higher-level approach, based on the `itertools` module of the standard library. Higher-level approaches are most often preferable to lower-level ones (although the issue is moot in this particular case). `itertools.ifilter` takes a predicate and an iterable, and yields the items in that iterable that satisfy the "predicate". Here, as the "predicate", we use ```aset._ _contains_ _```, the bound method that is internally called when we code `in aset` for membership testing. So, if `ifilter` yields anything at all, it yields an item of `seq` that is also a member of `aset`, so we can `return True` as soon as this happens. If we get to the statement following the `for`, it must mean the `return True` never executed, because no items of `seq` are members of `aset`, so we can `return False`.

If your application needs some function such as `containsAny` to check whether a string (or other sequence) contains any members of a set, you may also need such variants as:

```def containsOnly(seq, aset):
""" Check whether sequence seq contains ONLY items in aset. """
for c in seq:
if c not in aset: return False
return True```

`containsOnly` is the same function as `containsAny`, but with the logic turned upside-down. Other apparently similar tasks don't lend themselves to short-circuiting (they intrinsically need to examine all items) and so are best tackled by using the built-in type `set` (in Python 2.4; in 2.3, you can use `sets.Set` in the same way):

```def containsAll(seq, aset):
""" Check whether sequence seq contains ALL the items in aset. """
return not set(aset).difference(seq)```

If you're not accustomed to using the `set` (or `sets.Set`) method `difference`, be aware of its semantics: for any `set` `a`, `a.difference(b)` (just like `a-set(b)`) returns the set of all elements of `a` that are not in `b`. For example:

```>>> L1 = [1, 2, 3, 3]
>>> L2 = [1, 2, 3, 4]
>>> set(L1).difference(L2)`set([  ])`
>>> set(L2).difference(L1)
`set([4])````

which hopefully helps explain why:

```>>> containsAll(L1, L2)`False`
>>> containsAll(L2, L1)
`True````

(In other words, don't confuse `difference` with another method of `set`, `symmetric_difference`, which returns the set of all items that are in either argument and not in the other.)

When you're dealing specifically with (plain, not Unicode) strings for both `seq` and `aset`, you may not need the full generality of the functions presented in this recipe, and may want to try the more specialized approach explained in Recipe 1.10 based on strings' method `translate` and the `string.maketrans` function from the Python Standard Library. For example:

```import string
notrans = string.maketrans('', '')           # identity "translation"
def containsAny(astr, strset):
return len(strset) != len(strset.translate(notrans, astr))
def containsAll(astr, strset):
return not strset.translate(notrans, astr)```

This somewhat tricky approach relies on `strset.translate(notrans, astr)` being the subsequence of `strset` that is made of characters not in `astr`. When that subsequence has the same length as `strset`, no characters have been removed by `strset.translate`, therefore no characters of `strset` are in `astr`. Conversely, when the subsequence is empty, all characters have been removed, so all characters of `strset` are in `astr`. The `translate` method keeps coming up naturally when one wants to treat strings as sets of characters, because it's speedy as well as handy and flexible; see Recipe 1.10 for more details.

These two sets of approaches to the recipe's tasks have very different levels of generality. The earlier approaches are very general: not at all limited to string processing, they make rather minimal demands on the objects you apply them to. The approach based on the `translate` method, on the other hand, works only when both `astr` and `strset` are strings, or very closely mimic plain strings' functionality. Not even Unicode strings suffice, because the `translate` method of Unicode strings has a signature that is different from that of plain strings—a single argument (a `dict` mapping code numbers to Unicode strings or `None`) instead of two (both strings).

Recipe 1.10; documentation for the `translate` method of strings and Unicode objects, and `maketrans` function in the `string` module, in the Library Reference and Python in a Nutshell; ditto for documentation of built-in `set` (Python 2.4 only), modules `sets` and `itertools`, and the special method `_ _contains_ _`.