Some input files can get quite large and inefficient to read. Here are some tips to speed up the process:
- Use external Unix tools for splitting files so that they can be read in chunks. There is usually a field that you can use to split out separate files. Date fields are good ones.
- Consider using external tools to replace large character strings with numerical or shorter character strings. This will save valuable memory.
- Use parameters on input to control how much data you want to read. You may want to process your input file by starting to read your input at row 1,000,000. You don't always have to read a file from the beginning.
- Do not feel obliged to always read all of the columns. Once you have determined ...