You can reduce costs significantly without deleting any data by reducing the relative accessibility of the data. We will cover some ways to do this next:
- Compression: Compression is your friend. Compressing data leaves you with all the information at the (typically) slight penalty of increased time to access it. Using compression formats such as Avro and Parquet can significantly reduce storage size (and therefore costs) in Hadoop clusters and S3 folders, while often improving performance. The performance improvements require some thoughtful design of the file format, but are a best practice anyway. HDFS supports other compression formats such as GZIP and Snappy as well. This should be the first thing you do to reduce ...