August 2018
Intermediate to advanced
366 pages
10h 14m
English
The SequenceMatcher provides support for marking some values as junk. You might expect this to mean that those values are ignored, but in fact that's not what happens.
Computing ratios with and without junk will return the same value in most cases:
>>> a = 'aaaaaaaaaaaaaXaaaaaaaaaa' >>> b = 'X' >>> difflib.SequenceMatcher(lambda c: c=='a', a, b, False).ratio() 0.08 >>> difflib.SequenceMatcher(None, a, b, False).ratio() 0.08
The a results were not ignored even though we provided an isjunk function that reports all a results as junk (the first argument to SequenceMatcher).
You can see by using .get_matching_blocks() that in both cases the only parts of the string that match are the X in position 13 and 0 for a and b:
>>> difflib.SequenceMatcher(None, ...