Thursday, September 3, 2009

The Best Way of Working with Zip Files in Java

Finally I found the best way of working with zip files in Java :)

In one of my hobby projects I need to work intensively with compressed files (mainly zip files). My first approach was to use the java.util.zip package, but was too slow. So I started looking for other options.

My second approach was to create a wrapper for the 7zip executable (so I could have cross-platform support). This approach was much better, from a performance point of view, but to retrieve any info I needed to parse console output and solve a lot of problems with the java.lang.Runtime class.

Today I've finally found the best solution .... but before some really simple comparations:

java.util.zipHome Made
7zip.exe Wrapper
"Best Solution"
Extract All 1220341094094
Get Number of Elements4000410978
  • All the measurements are in milliseconds
  • The test was done with a 104MB zip file containing 110 jpg files.
  • The "Get Number of Elements" with the "Home made Wrapper" is also done extracting everything and counting the number of files (I'm sorry I was a bit lazy here)
I don't think that I need to add too much to these numbers. The integrated Java zip package is nice, but painfully slow.

And finally is time to show the "Best Solution" and an example comparing it with the Java zip code ;)

The winner is "7-Zip-JBinding" and has they explain in their website:
7-Zip-JBinding is a java wrapper for 7-Zip C++ library. It allows extraction of many archive formats using a very fast native library directly from java through JNI.

Get the Number of files using java.util.zip
    static public int getSize(File file) {
ZipFile zf = null;
int total = -1;

try {
zf = new ZipFile(file);
total = 0;
for (Enumeration e = zf.entries(); e.hasMoreElements();) {
e.nextElement();
total++;
}
zf.close();
} catch (ZipException ex) {
Logger.getLogger(Zip.class.getName()).log(Level.SEVERE, null, ex);
} catch (IOException ex) {
Logger.getLogger(Zip.class.getName()).log(Level.SEVERE, null, ex);
}

return total;
}

And the same with 7-Zip-JBinding

public int getSize (File file) {
RandomAccessFile randomAccessFile = null;
ISevenZipInArchive inArchive = null;
int total = -1;
try {
randomAccessFile = new RandomAccessFile(file.getAbsolutePath(), "r");
inArchive = SevenZip.openInArchive(null, // autodetect archive type
new RandomAccessFileInStream(randomAccessFile));

total = inArchive.getNumberOfItems();

} catch (Exception e) {
System.err.println("Error occurs: " + e);
System.exit(1);
} finally {
if (inArchive != null) {
try {
inArchive.close();
} catch (SevenZipException e) {
System.err.println("Error closing archive: " + e);
}
}
if (randomAccessFile != null) {
try {
randomAccessFile.close();
} catch (IOException e) {
System.err.println("Error closing file: " + e);
}
}
}

return total;
}

As you can see the code is quite similar, but the performance is quite different.

See you.

13 comments:

  1. Can't see why, in both cases, the API providers didn't provide a method in the API to return the count of items in the given archive.
    For users to write the above code rather than use a provided method seems crazy.

    ReplyDelete
  2. Hi Chris,

    Sun is not providing such method AFAIK, but the 7ZipJBinding provides it

    total = inArchive.getNumberOfItems();

    In the 7ZipJBinding example, almost everything is error control ;)

    ReplyDelete
  3. Hi,

    thanks for this post. I'm proud, you like 7-Zip-JBinding ;-)

    You are right about error handling overhead. It could be less of it. The main problem is to close archive and stream/file in normal and error cases. You need "if" and then "try..catch" for each of it :-(

    If you have some ideas, how to make it better, please, post a message in a forum on sourceforge.net:
    https://sourceforge.net/projects/sevenzipjbind/

    Best wishes,
    Boris Brodski

    ReplyDelete
    Replies
    1. the problem is 7-Zip-JBinding use System librarys from CentOS 7 if somebody use other Versions of CentOS it didn't work.

      Delete
  4. Can't you just use the size method of the ZipFile class?

    http://java.sun.com/javase/6/docs/api/java/util/zip/ZipFile.html#size()

    The JavaDoc says:
    Returns the number of entries in the ZIP file.

    ReplyDelete
  5. Per Christina,

    You're right I could use the size() method. I missed it somehow.

    Anyway after testing it under the same conditions I don't get too much difference.

    Using the size() method I get 3905 milliseconds, that's only 100 milliseconds less that doing the loop, and still miles away of 7ZipJBindings ;)

    I want to point that my tests are in a single-core system, so results could change in multi-core environments.

    ReplyDelete
  6. Try TrueZip https://truezip.dev.java.net/

    ReplyDelete
  7. any comments on working with gzip files?

    ReplyDelete
  8. Hi Steven,

    7-Zip-JBinding supports gzip files. In fact should support any fileformat supported by 7zip ;)

    ReplyDelete
  9. Any idea how Apache Commons VFS compares?

    ReplyDelete
  10. I like the music. I often download hundreds tracks from the Internet,but yesterday I couldn't listen a track inside zip archive. I was bitterly disappeared,but fortunately my friend is a DJ and advised me - compressed file corrupted zip. I tried this tool and it astonihed me. Because the software restored my track very fast and without money as far as I remember. Moreover my friend was quite right with regard to the tool.

    ReplyDelete
  11. can be zip a file using 7-zip-JBinding?

    ReplyDelete
  12. Hi vikas,

    I'm afraid that the creation of files is still in planning, at least the last time I checked.

    ReplyDelete