Skip to Content
Python Cookbook
book

Python Cookbook

by Alex Martelli, David Ascher
July 2002
Intermediate to advanced
608 pages
15h 46m
English
O'Reilly Media, Inc.
Content preview from Python Cookbook

Retrieving a Line at Random from a File of Unknown Size

Credit: Richard Papworth

Problem

You have a file of unknown (but potentially very large) size, and you need to select one line at random from it, with a single pass on the file.

Solution

We do need to read the whole file, but we don’t have to read it all at once:

import random

def randomLine(file_object):
    "Retrieve a random line from a file, reading through the file once"
    lineNum = 0
    selected_line = ''

    while 1:
        aLine = file_object.readline(  )
        if not aLine: break
        lineNum = lineNum + 1
        # How likely is it that this is the last line of the file?
        if random.uniform(0,lineNum)<1:
            selected_line = aLine
    file_object.close(  )
    return selected_line

Discussion

Of course, a more obvious approach would be:

random.choice(file_object.readlines(  ))

But that requires reading the whole file into memory at once, which can be a problem for truly enormous files.

This recipe works thanks to an unobvious but not terribly deep theorem: when we have seen the first N lines in the file, there is a probability of exactly 1/N that each of them is the one selected so far (i.e., the one to which the selected_line variable refers at this point). This is easy to see for the last line (the one just read into the aLine variable), because we explicitly choose it with a probability of 1.0/lineNum. The general validity of the theorem follows by induction. If it was true after the first N-1 lines were read, it’s clearly still true after the Nth one is read. As we select ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Modern Python Cookbook - Second Edition

Modern Python Cookbook - Second Edition

Steven F. Lott
Python Cookbook, 3rd Edition

Python Cookbook, 3rd Edition

David Beazley, Brian K. Jones

Publisher Resources

ISBN: 0596001673Supplemental ContentCatalog PageErrata