Data Compression

The java.util.zip package contains classes you can use for data compression in streams or files. The classes in the java.util.zip package support two widespread compression formats: GZIP and ZIP. In this section, we’ll talk about how to use these classes. We’ll also present two useful example programs that build on what you have learned in this chapter. After that, we’ll talk about a higher-level way to work with ZIP archives—as filesystems—introduced with Java 7.

Archives and Compressed Data

The java.util.zip package provides two filter streams for writing compressed data. The GZIPOutputStream is for writing data in GZIP compressed format. The ZIPOutputStream is for writing compressed ZIP archives, which can contain one or many files. To write compressed data in the GZIP format, simply wrap a GZIPOutputStream around an underlying stream and write to it. The following is a complete example that shows how to compress a file using the GZIP format, but the stream could just as well be sent over a network connection or to any other type of stream destination. Our GZip example is a command line utility that compresses a file.

    import java.io.*;
    import java.util.zip.*;

    public class GZip {
      public static int sChunk = 8192;

      public static void main(String[] args) {
        if (args.length != 1) {
          System.out.println("Usage: GZip source");
          return;
        }
        // create output stream
        String zipname = args[0] + ".gz";
        GZIPOutputStream zipout;
        try {
          FileOutputStream out = new FileOutputStream(zipname);
          zipout = new GZIPOutputStream(out);
        }
        catch (IOException e) {
          System.out.println("Couldn't create " + zipname + ".");
          return;
        }
        byte[] buffer = new byte[sChunk];
        // compress the file
        try {
          FileInputStream in = new FileInputStream(args[0]);
          int length;
          while ((length = in.read(buffer, 0, sChunk)) != -1)
            zipout.write(buffer, 0, length);
          in.close();
        }
        catch (IOException e) {
          System.out.println("Couldn't compress " + args[0] + ".");
        }
        try { zipout.close(); }
        catch (IOException e) {}
      }
    }

First, we check to make sure we have a command-line argument representing a filename. We then construct a GZIPOutputStream wrapped around a FileOutputStream representing the given filename, with the .gz suffix appended. With this in place, we open the source file. We read chunks of data and write them into the GZIPOutputStream. Finally, we clean up by closing our open streams.

Zip archives

While GZIP is simple compression format for a stream or file, a ZIP archive is a file that is actually a collection of files, some (or all) of which may be compressed. Writing data to a ZIP archive file is a little more involved than simply wrapping a stream, but not difficult. Each item in the ZIP file is represented by a ZipEntry object. When writing to a ZipOutputStream, you’ll need to call putNextEntry() before writing the data for each item. The following example shows how to create a ZipOutputStream. You’ll notice that it starts out with a stream wrapper just like it did when creating a GZIPOutputStream:

    ZipOutputStream zipout;
    try {
      FileOutputStream out = new FileOutputStream("archive.zip");
      zipout = new ZipOutputStream(out);
    }
    catch (IOException e) {}

Let’s say we have two files we want to write into this archive. Before we begin writing, we need to call putNextEntry() to set the name of the file within the archive and initialize the stream to the correct position for it. Here we create a simple ZipEntry with just a file name. You can set other ZIP format specific fields in ZipEntry, but most of the time, you won’t need to bother with them.

    try {
      ZipEntry entry = new ZipEntry("first.dat");
      zipout.putNextEntry(entry);
      zipout.write( ... ) // Write data for first file

      ZipEntry entry = new ZipEntry("second.dat");
      zipout.putNextEntry(entry);
      zipout.write( ... ) // Write data for second file
      . . .
      zipout.close();
    }
    catch (IOException e) {}

Decompressing Data

To decompress data in the GZIP format, simply wrap a GZIPInputStream around an underlying FileInputStream and read from it. The following example complements our earlier GZip example and shows how to decompress a GZIP file:

    import java.io.*;
    import java.util.zip.*;

    public class GUnzip {
      public static int sChunk = 8192;
      public static void main(String[] args) {
        if (args.length != 1) {
          System.out.println("Usage: GUnzip source");
          return;
        }
        // create input stream
        String zipname, source;
        if (args[0].endsWith(".gz")) {
          zipname = args[0];
          source = args[0].substring(0, args[0].length() - 3);
        }
        else {
          zipname = args[0] + ".gz";
          source = args[0];
        }
        GZIPInputStream zipin;
        try {
          FileInputStream in = new FileInputStream(zipname);
          zipin = new GZIPInputStream(in);
        }
        catch (IOException e) {
          System.out.println("Couldn't open " + zipname + ".");
          return;
        }
        byte[] buffer = new byte[sChunk];
        // decompress the file
        try {
          FileOutputStream out = new FileOutputStream(source);
          int length;
          while ((length = zipin.read(buffer, 0, sChunk)) != -1)
            out.write(buffer, 0, length);
          out.close();
        }
        catch (IOException e) {
          System.out.println("Couldn't decompress " + args[0] + ".");
        }
        try { zipin.close(); }
        catch (IOException e) {}
      }
    }

First, we check to make sure we have a command-line argument representing a filename. If the argument ends with .gz, we figure out what the filename for the uncompressed file should be. Otherwise, we use the given argument and assume the compressed file has the .gz suffix. Then we construct a GZIPInputStream wrapped around a FileInputStream that represents the compressed file. With this in place, we open the target file. We read chunks of data from the GZIPInputStream and write them into the target file. Finally, we clean up by closing our open streams.

Reading a ZIP archive is also the mirror of writing. When reading from a ZipInputStream, you should call getNextEntry() before reading each item. When getNextEntry() returns null, there are no more items to read. The following example shows how to create a ZipInputStream:

    ZipInputStream zipin;
    try {
      FileInputStream in = new FileInputStream("archive.zip");
      zipin = new ZipInputStream(in);
    }
    catch (IOException e) {}

Suppose we want to read two files from this archive. Before we begin reading, we need to call getNextEntry(). At the very least, the entry gives us a name of the item we are reading from the archive:

    try {
      ZipEntry first = zipin.getNextEntry();
      zipin.read( ... ) // Read the file data
    } catch (IOException e) {}

Now, you can read the contents of the first item in the archive. When you come to the end of the item, the read() method returns -1. At this point, you can call getNextEntry() again to read the second item from the archive. If you call getNextEntry() and it returns null, there are no more items and you have reached the end of the archive.

Zip Archive As a Filesystem

One of the benefits of the new java.nio.file package introduce with Java 7 is the ability to implement custom filesystems in Java. (We talked about the File API for the NIO file package earlier in this chapter and we’ll return to the more general NIO facilities in the next section.) Java 7 ships with one such custom filesystem implementation bundled within it: the Zip Filesystem Provider.[35] Using the Zip Filesystem Provider, we can open a ZIP archive and treat it like a filesystem: reading, writing, copying, and renaming files using all of the standard java.nio.file APIs, except that all of these operations happen inside the ZIP archive file instead of on the host computer filesystem (as you might otherwise expect).

The key to making this possible is that the NIO File API starts with a FileSystem abstraction that serves as a factory for Path objects. In our previous discussion of the NIO File API we always simply asked for the default filesystem using Filesystems.getDefault(). This time, we are going to target a particular custom filesystem type and destination by constructing a special URI for our ZIP archive. (As we’ll discuss in the networking chapters, a URI is kind of like a URL except that it can be more abstract).

        // Construct the URI pointing to the ZIP archive
        URI zipURI = URI.create("jar:file:/Users/pat/tmp/MyArchive.zip");

        // Open or create it and write a file
        Map<String, String> env = new HashMap<>();
        env.put("create", "true");
        try ( FileSystem zipfs = FileSystems.newFileSystem( zipURI, env ) )
        {
            Path path = zipfs.getPath("/README.txt");
            OutputStream out = Files.newOutputStream( path );
            try ( PrintWriter pw = new PrintWriter( 
                new OutputStreamWriter( out ) ) ) {

                pw.println("Hello World!");
            }
        }

In this snippet, we constructed a URI for our ZIP archive using the URIcreate() method and the special jar:file: prefix. (The Java JAR format is really just the ZIP format with some additional conventions.) We then used that URI with the Filesystems newFileSystem() method to create the right kind of filesystem reference for us. The FileSystem it returns will perform all of its operations on entries within the ZIP, but otherwise will behave just like we’ve seen previously. The other argument to the newFileSystem() method is a Map containing string properties that are specific to the provider. In this case, we pass in the value “create” as “true,” indicating that we want the ZIP filesystem provider to create the archive if it does not already exist. In order to know what properties can be passed, you’ll have to consult the documentation for the particular filesystem provider.

In our preceding snippet, we then create a Path for a file /README.txt at the root folder of the filesystem and write a string to it. Because we are using try-with-resources clauses to encapsulate opening the filesystem and writing to the file, the resources will be automatically closed for us when the operation is complete.

Other operations proceed just as with “normal” files. For example, we can move a file by creating a path for the existing file and a path for the new location and then using the standard Files move() method.

        // Move the file
        try ( FileSystem zipfs = FileSystems.newFileSystem( fsURI, env ) )
        {
            Path path = zipfs.getPath("/README.txt");
            Path toPath = zipfs.getPath("/README2.txt");
            Files.move( path, toPath );
        }


[35] The Zip Filesystem Provider is also supplied as an example along with sample source code even though it’s unclear if Oracle intends it to be a standard. But at the time of this writing, it is bundled with the JDK and JRE of Java 7 on all platforms.

Get Learning Java, 4th Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.