Sunday, February 7, 2010

Groovy++ Performance - Now we're talkin'

This is a follow-up of my last post, where I compared the runtimes of the quicksort algorithm implemented in Java, Scala, Groovy and Groovy++. Groovy++ came up with a significant improvement over Groovy. But there was a comment by ait (Thanks ait) on how to improve the performance of the Groovy++ implementation even further, by changing a single line. In fact, this improved the performance dramatically.

In the previous post I showed the de-compiled byte code of the Groovy++ implementation of the algorithm. There were two things that were different from the Java byte code: array access via ArraysMethods.getAt(index) and the integer boxing und calculation. Ait recommended to change the following line

into


Obviously, this tackles the integer handling problem. After changing this line I ran the code again and had a look at the timings. The results are amazing. Here they are for sorting 100.000 and 1.000.000 integers:

n100.0001.000.000
Java1295
Scala1295
Groovy375041700
Groovy++ (old version)5154900
Groovy++ (new version)19130


This is marvellous! Now Groovy++ is really on a par with Java and Scala!


De-compiling the byte code again shows the difference. Here's the old byte code:

And here is the new version:

Still, there is no native array access, but the integer boxing is gone, it's primitive integer calculations.

Ait (Is that you, Alex?) will come up with a post with more details. I'll provide a link here then.


Update: These are the numbers for Java, Scala and Groovy with shifting instead of division. Only Groovy can take advantage of this:
n100.0001.000.000
Java1190
Scala1290
Groovy190022000
Groovy++ (new version)19130






Saturday, February 6, 2010

Groovy++ vs. Groovy vs. Java vs. Scala - Performance Update

I've been writing about a performance comparison between Java, Scala and Groovy in this and this post, where I compared the runtimes of these languages for a straight-forward (and far from idiomatic) implementation of the quicksort algorithm. As you might have heard, there's a new kid on the block: Groovy++ a.k.a. static Groovy. I was eager to see, how this would improve Groovy's results.

So, (without any further justification of this comparison) I took the same source code for Java, Scala and Groovy as previously and took this modified version of the Groovy code for Groovy++:


So, I basically took the Groovy code, and added the @Typed annotation and some more type information. I ran this on my machine with Java 1.6.0 Update 17, Scala 2.7.7, Groovy 1.7 and Groovy++ which is based on Groovy 1.8.0-SNAPSHOT. Here are the results (in ms) for sorting 100.000 and 1.000.000 integers:

n100.0001.000.000
Java1295
Scala1295
Groovy375041700
Groovy++5154900


Obviously, while still Java and Scala are by far the fastest, Groovy++ makes a huge difference. Groovy takes about 312x the time of Java/Scala, Groovy++ just takes 42x.

While there isn't much difference in the source code of Groovy and Groovy++, there is of course a big difference in the generated byte code. I had a look at it with the Java Decompiler. The Groovy byte code contains lots of dynamic dispatch code, while there's not much in the Groovy++ byte code that indicates that it was compiled from Groovy. Here's a short excerpt:


public Object quicksort(int[] a, int L, int R) {
while (true) {
int m = ArraysMethods.getAt(a, DefaultGroovyPPMethods.div(DefaultGroovyPPMethods.box(L + R), DefaultGroovyPPMethods.box(2)).intValue());
int i = L;
int j = R;
while (i <= j) {
for (; ArraysMethods.getAt(a, i) < m; (i++));
for (; ArraysMethods.getAt(a, j) > m; (j--));
if (i <= j) {
swap(a, i, j);
(i++);
(j--);
}
}
if (L < j) quicksort(a, L, j);
if (R <= i) break; R = R; L = i; a = a; this = this; } return null;
}


This isn't even far from the byte code generated by java. There is just that autoboxing of integers, integer division and the array access with getAt(). Considering the early state of Groovy++ this looks very promising.


Update: There's is a follow-up post that shows how to tune the Groovy++ code to make it run as fast as Java and Scala (Thanks to ait, see comments).