Beating C with 33 Lines of Java

By Christian Köberl  |  Jun 29, 2020  |  java, linux

After reading the “Beating C with X lines of Y” 1 last year I immediately thought about trying this in my go-to language: Java. Now you might ask: “Java? Isn’t Java known to have a slow startup and heavy memory footprint?” But let’s see how Java performs - especially with its latest incarnation GraalVM.

Benchmarking

I will be using GNU time for the time and memory comparison (same as the Go version):

$ /usr/bin/time -f "%es %MKB" wc lorem_ipsum.txt
  17023  782992 4830478 lorem_ipsum.txt
0.05s 2064KB

All results are from runs on my Laptop:

The “standard” approach

The initial version I came up with used the obvious way in Java - reading the file using a BufferedReader:

InputStream in = Files.newInputStream(Paths.get(fileName));
try (BufferedReader reader = new BufferedReader(new InputStreamReader())) {
  ...
}

(full code on GitHub)

Running some first tests with this code shows that it can’t compete with the native wc:

ImplementationInput fileTimeMax memory
wc4.6MB0.05s2,064KB
wc-br.java4.6MB0.24s40,188KB
wc100MB0.43s2,188KB
wc-br.java100MB1.21s45,036KB

Manual buffering

So let’s try a more direct approach with manual buffering (inspired by the Go version):

class wc {
  public static void main(String[] args) throws IOException {
    int lineCount = 0, wordCount = 0, charCount = 0, count;
    char[] cbuf = new char[16 * 1024];
    InputStream is = args.length == 0 ? System.in : Files.newInputStream(Paths.get(args[0]));
    try (Reader reader = new InputStreamReader(is)) {
      boolean prevWhitespace = true;
      while ((count = reader.read(cbuf)) >= 0) {
        for (int i = 0; i < count; i++) {
          char charVal = cbuf[i];
          charCount++;
          if (charVal == '\n') {
            lineCount++;
            prevWhitespace = true;
          } else if (isWhitespace(charVal)) {
            prevWhitespace = true;
          } else if (prevWhitespace) {
            wordCount++;
            prevWhitespace = false;
          }
        }
      }
    }
    // printing
  }
}

(full code on GitHub - 33 lines of code according to tokei)

Let’s try this version:

ImplementationInput fileTimeMax memory
wc4.6MB0.05s2,064KB
wc.java4.6MB0.18s39,844KB
wc100MB0.43s2,188KB
wc.java100MB0.46s42,840KB

Looking a lot better for big files (almost the same time) but still not that good for small ones. And the used memory is more than an order of magnitude higher: ~40MB vs 2MB.

Enter GraalVM

GraalVM is a new ecosystem and platform for running Java and other programming languages (like JavaScript, Ruby or Python). One of main benefits for Java programs is to compile them to a native executable which reduces startup-time and memory usage.

Let’s give it a try - after installing GraalVM and native-image simply compile the Java class to a native binary with:

javac wc.java && native-image wc

which results in a native binary wc (or wc.exe) for your platform (currently cross-compilation is not supported).

When running the same code above with the native binary there are major improvements - here are the results plus a test with a 1GB file:

ImplementationInput fileTimeMax memory
wc4.6MB0.05s2,064KB
wc.java (native)4.6MB0.03s7,172KB
wc100MB0.43s2,188KB
wc.java (native)100MB0.39s7,524KB
wc1GB4.20s2,120KB
wc.java (native)1GB3.57s10,968KB

Memory usage is down to 7MB for file up to 100 MB and we now easily beat the time of the C implementation. Still, the memory consumption is 3-5 time higher than the “real” native program.

Summary

Java has come a long way - from Applets to enterprise server application and now microservices. Because of the J2EE/enterprise era it is known for heavy memory usage and slow startup times but in recent years Java has changed and adapted to the cloud native boom. With GraalVM and native images Java can now even compete with system languages like C, Go or Rust. With native image Java is now also a viable solution for building CLIs.