O'Reilly logo

Practical Predictive Analytics by Ralph Winters

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Tips for dealing with large files

Some input files can get quite large and inefficient to read. Here are some tips to speed up the process:

  • Use external Unix tools for splitting files so that they can be read in chunks. There is usually a field that you can use to split out separate files. Date fields are good ones.
  • Consider using external tools to replace large character strings with numerical or shorter character strings. This will save valuable memory.
  • Use parameters on input to control how much data you want to read. You may want to process your input file by starting to read your input at row 1,000,000. You don't always have to read a file from the beginning.
  • Do not feel obliged to always read all of the columns. Once you have determined ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required