Thursday, July 2, 2009

Java vs. Scala vs. Groovy - Performance Update

It's been quite a while since my post on the micro benchmark comparison between Java, Scala and Groovy. Since the Groovy developers tackled some performance issues with the Groovy 1.6 release, I thought, I'd give it another try. I used Groovy 1.6.3, Scala 2.7.5 and JDK 1.6.0 Update 14. I left the code that I used for the last comparison unchanged. I did not make much use of any specific language features, so it's quite a straightforward quicksort implementation, see the code below.

The Result
Before showing the result, just let me make clear, that I love the Groovy ;-) But seriously, after the anouncement, that the performance was one of the primary goals for the Groovy 1.6 release, I was a bit disappointed. We'll have to keep in mind though, that this is just a (poorly written) micro benchmark, that doesn't mean anything. And after all, it's all about the groovy language features anyway.

Nuff said, here is the result of sorting 100.000 integers:

1. Java: 45ms
2. Scala: 56ms
3. Groovy: 12800ms


And here's the code:

Java

public class QuicksortJava {

public static void swap(int[] a, int i, int j) {
int temp = a[i];
a[i] = a[j];
a[j] = temp;
}

public static void quicksort(int[] a, int L, int R) {
int m = a[(L + R) / 2];
int i = L;
int j = R;
while (i <= j) {
while (a[i] < m)
i++;
while (a[j] > m)
j--;
if (i <= j) {
swap(a, i, j);
i++;
j--;
}
}
if (L < j)
quicksort(a, L, j);
if (R > i)
quicksort(a, i, R);
}

public static void quicksort(int[] a) {
quicksort(a, 0, a.length - 1);
}

public static void main(String[] args) {

// Sample data
int[] a = new int[100000];
for (int i = 0; i < a.length; i++) {
a[i] = i * 3 / 2 + 1;
if (i % 3 == 0)
a[i] = -a[i];
}

long t1 = System.currentTimeMillis();
quicksort(a);
long t2 = System.currentTimeMillis();
System.out.println(t2 - t1);

}

}



Scala

package quicksort

object QuicksortScala {

def quicksort(xs: Array[Int]) {

def swap(i: Int, j: Int) {
val t = xs(i); xs(i) = xs(j); xs(j) = t
}

def sort1(l: Int, r: Int) {
val pivot = xs((l + r) / 2)
var i = l;
var j = r
while (i <= j) {
while (xs(i) < pivot) i += 1
while (xs(j) > pivot) j -= 1
if (i <= j) {
swap(i, j)
i += 1
j -= 1
}
}
if (l < j) sort1(l, j)
if (r > i) sort1(i, r)
}
sort1(0, xs.length - 1)
}

def main(args : Array[String]) {
var a : Array[Int] = new Array[Int](100000)
var i : Int = 0
for (e <- a) {
a(i) = i*3/2+1;
if (i%3==0) a(i) = -a(i);
i = i+1
}
val t1 = System.currentTimeMillis();
quicksort (a)
val t2 = System.currentTimeMillis();
println(t2-t1)
}
}


Groovy

def swap(a, i, j) {
temp = a[i]
a[i] = a[j]
a[j] = temp
}

def quicksort(a, L, R) {
m = a[((L+R)/2).intValue()]
i=L
j=R
while (i<=j) {
while (a[i]<m) i++
while (a[j]>m) j--
if (i<=j) {
swap(a, i, j)
i++
j--
}
}
if (L<j) quicksort(a,L,j)
if (R>i) quicksort(a,i,R)
}

def quicksort(a) {
quicksort(a, 0, a.length-1)
}

// Sample data
a = new int[100000]
a.eachWithIndex {x, i ->
a[i] = i*3/2+1
if (i%3==0) a[i] = -a[i]
}

t1 = System.currentTimeMillis()
quicksort(a)
t2 = System.currentTimeMillis()
println t2-t1


Note: There was a bug in Scala code of the first version of this post, which is now fixed. Scala now goes slightly better.

42 comments:

Eishay Smith said...

would be interesting to see it with Scala 2.8 @specialized annotation

Georg Wächter said...

and it would be interesting to see how scala compares if you write real functional code. I guesss it will be even slower ... but on ther other side it will be much much smaller and thus more readable

Nick Wiedenbrück said...

Follow up from Eishay: http://www.eishay.com/2009/07/microbenchmarking-scala-vs-java.html

Anonymous said...

what an idiotic comparison

Eishay Smith said...

I think you have a bug at the Scala code line #25, it doesn't match the Java code (thanks to Ismael who found it).
Fixed it and now my Scala's code performance matches Java's.

Unknown said...

I made little optimalization in Groovy script:

t1 = System.currentTimeMillis()
//quicksort(a)
Arrays.sort(a);
t2 = System.currentTimeMillis()
println t2-t1

Anonymous said...

using gdk sort as Jan Kotek says:

perform like scala and java


changing 4 lines:

+ def temp = a[i]

+ int m = a[((L+R)*0.5).intValue()]
+ int i=L
+ int j=R

4-5x boost over proposed groovy code

Groovy let you write scripts as you did, and also offer you the possibility to have nearly the same performance if you really need it (and it is not difficult at all)

Artur Biesiadowski said...

Mr Jan, I can see this reaction every time speed of groovy comes into picture...

Going further this route, if somebody will compare speed of java and C for ray tracing, I'll answer System.exec("povray abc.pov")

Point of this benchmark is exactly to prove that you HAVE to escape from groovy to java for any performance-sensitive code. It also proves that in scala you can in many cases write your algorithms fully in scala, without having to switch language in hot spots.

Said that, benchmark is completely horrible. There is no time for jvm to warm up in case of java and scala - array is too small and there is no repetition of test.

I have done some slightly more scientific tests for array of 20000000 ints and results are (jdk 1.6.0_11 server, scala 2.8.0)

java 1.5 s
scala 2.2 s
groovy still running after 40 minutes, I'll send update later

Anonymous said...

Why is the groovy code the only one not using strong typing? It's possible in groovy as well you know. Surelly that should help. After, I know scala optimize the compilation of the primitive operations, so no surprise there. All this benchmark shows is that to run programs on Integer fast, stick to primitive type, in java for example. but it's not groovy playground.

Anonymous said...

So, you created slow groovy code and you are surprised it is slow... wow!

Nick Wiedenbrück said...

@Eishay Fixing the bug helped a bit. The Scala code is now slightly better than before.

Anonymous said...

Write the Java code with BigDecimal instead of int and it will be as slow as the groovy code!

Artur Biesiadowski said...

To all the people commenting about bad groovy code, lack of static types and using BigDecimal.

Do you really think it with those changes groovy will be anything close to java? Or just 300 times slower instead of 3000 times slower? (I'm taking about test after warming up).

I gave up after 1 hour of running groovy code, so lower bound is 3000 times slower with original code.

For smaller test with 1M elements, java is around 65ms, old groovy 20 seconds, groovy with explicit types (I have put them everywhere possible) 10 seconds.

Which means that groovy is still around 300 times slower - exactly the number I was getting in my previous tests few years ago.

So, now when primitive types are out of the picture, what will be the next excuse from groovy crowd? I'm still waiting for "Groovy cannot be slow because it is used in Fortune 500 companies" and "All applications are database bound anyway", which I got last time.

BTW, java is taking 145ms for sorting 1M Integer entries (2.5x slowdown), so primitive versus object difference is not explaining anything.

Unknown said...

Arthur: You are right, no one argues that. With Groovy you HAVE TO sometimes fallback into Java. Java is great for number crunching, Groovy (onlike Scala) is good for boiler code around.

I have big application in Groovy and around 5% have to be in Java. Profiling my app is part of development (write code, write tests, profile tests, profile app).

If you want challenge, lets start writing file parsers and Swing forms. You would not stand a chance with Scala.

BTW: with my patch it runs 3x faster then your Java implementation :-)))

Unknown said...

Copied & pasted the java code into groovy console and it ran in 1.4 sec. Added the three changes proposed by Anonymous and it ran in 0.67 seconds.

What is wrong with that? Groovy is a superset of Java; both are available always.

Artur Biesiadowski said...

@Jan: You have to be careful with parsers, scala has very powerful capabilities here. For swing forms, I have to agree, groovy is currently state of the art ;)
And to be honest, people ARE arguing with the fact that number crunching in pure groovy is slow. I have heard argument that as groovy is compiled to bytecode it is same fast as java, for example...


@Arouse: Groovy is superset of java from point of syntax, but same code runs considerably slower because of various extensions which have to be supported. If by superset you mean possibility of easily escaping into normal java, then it doesn't matter, because the point of the test was to see if you have to escape for performance critical code.

Nick Wiedenbrück said...

James Strachan refers to this post in his blog

Anonymous said...

I think the discussions above illustrate a great point about scala and groovy compared to java.

The guy is sorting integers!?

So you intend to have these sorts of discussions about more complex code...all 200,000++ lines of it in a real enterprise app?

Please...

Artur Biesiadowski said...

Sorting integers is easy to proof (and people have found few errors here). But when you will start writing graph processing algorithms and you end up with O(n^3) algorithm somewhere, you may want to know if C is 1 or 1000.

On top of that, it is better to talk about platform performance BEFORE writing 200k lines. I have seen systems written with "we will fix performance when it is ready" mindset, which were so broken in design performance-wise that they had to be rewritten from scratch.

You can avoid some mistakes if you are aware of basic facts from very start. Do not use groovy for scripting if performance matters. If you need to compute 1000 exotic option prices per second, your scripting language will be better faster than slower (and java things like Janino, thanks to the Hotspot, are way faster than any interpreted C++ stuff). And when you have to keep below 10kb of garbage for end-to-end processing, you better choose language where you can easily control amount of fluff.

Scott Hickey said...

I'm living a code base right now that could be 1/10th to 1/20th of the size if it was written in Groovy. Performance is a big issue - but guess what? They are a startup and the client constantly wants enhancements. The enhancements take forever because the code base is so large and complex. They nailed the premature optimization anti-pattern. They are barely into production and afraid to make changes.

If I'm looking at 200k code base, the first thing I'm thinking is how can make it smaller so its easier to manage change. I've never been on a project where changing requirements wasn't the dominant challenge for developers working a project.

There are things that you can do in Groovy and other languages that support a meta-object protocol that you simply can't do with languages like Java. Just because Strachen doesn't see the value in a MOP doesn't mean its value isn't real.

Finally, I am amazed how people who have never used Groovy on a real project (but are experts because they wrote 50 lines of benchmark code) dismiss "dropping into Java" as somehow not valid for Groovy code. It is not a knock on Groovy - it is a feature.

Those of us who have used Groovy on real projects have seen that a developer gets a great combination of features. I get a really expressive and compact syntax, dynamic behavior if I want it and its trivial to drop down into Java when necessary.

Nick Wiedenbrück said...

This post raised quite more of a discussion than I thought it would. I mean, it doesn't really tell anything new and it basically says "Look, I'm using the wrong tool for the job ... and it breaks!". I said that in the post! So I don't understand some of the comments, where people are getting really serious about this.

Groovy really shines in some areas, and it does not in some other areas. That's what also Guillaume Laforge says, but some people really want to use Groovy for everything. To those people the numbers above must look like a bug report.

Unknown said...

Hello, I replace the quicksort algorithm of groovy by this

def quickSort(list) {
if (list.size() < 2) return list
def pivot = list[list.size().intdiv(2)]
def left = list.findAll {item -> item < pivot }
def middle = list.findAll {item -> item == pivot }
def right = list.findAll {item -> item > pivot }
return (quickSort(left) + middle + quickSort(right))
}

From the book Manning Groovy in action, page 109. The results was:

Blog Algorithm 25650 ms

Book Algorithm 9490 ms

This is the complete code:

def quickSort(list) {
if (list.size() < 2) return list
def pivot = list[list.size().intdiv(2)]
def left = list.findAll {item -> item < pivot }
def middle = list.findAll {item -> item == pivot }
def right = list.findAll {item -> item > pivot }
return (quickSort(left) + middle + quickSort(right))
}


// Sample data
a = new int[100000]
a.eachWithIndex {int x, int i ->
a[i] = i*3/2+1
if (i%3==0) a[i] = -a[i]
}

long t1 = System.currentTimeMillis()
quickSort(a)
long t2 = System.currentTimeMillis()
println t2-t1

Anonymous said...

how about using Java inside Groovy when needed?

import QuicksortJava

def quicksort(a) {
QuicksortJava.quicksort(a, 0, a.length-1)
}

// Sample data
a = new Integer[100000]
a.eachWithIndex {x, i ->
a[i] = i*3/2+1
if (i%3==0) a[i] = -a[i]
}


t1 = System.currentTimeMillis()
quicksort(a)
t2 = System.currentTimeMillis()
println t2-t1

Anonymous said...

Forgot to mention I had to change signature to:

swap(Integer[] a, int i, int j)
quicksort(Integer[] a, int L, int R)

and works in about 80 ms.

Ben said...

Thanks for very useful information.
Best web design services

Илья said...

I used Groovy 1.7 - my result is 3243ms, on Java - 101ms.

The next step. I modified the groovy program, used typing (int) - 1479ms

Artur Biesiadowski said...

@Scott

While changing requirements ARE the major hurdle in almost every project, there are project where it is not a biggest issue.

- writing optimizing compiler
- embedded code (including telecom related stuff, where embedded doesn't mean small)
- writing performance sensitive libraries/frameworks

and probably many, many more.

I'm not saying that groovy is bad choice in above cases, just that ease of reaction to changing requirements is not always the highest priority.

As far as 'experience' is concerned, my commercial adventure with groovy started when I had to choose scripting language for exotic derivatives pricing. While hardcore computations were done in optimized C++, there was considerable amount of glue code around (with some less trivial stuff like modifications of volatility surfaces on the fly during pricing etc). I have considered some kind of DSL based on groovy for that - and guess what, groovy WAS slowest part of entire chain. Basically, it was too slow to be useful as a scripting language. Calling java is not going to help here - because it is exactly java/C++ which is calling 'scripting language', sometimes even inside a tight loops.

Wrong tool for the job ? Yes. We have ended up with using variation based on janino (some preprocessing done for easier usage), which gave us full java speed, with hotspot being able to really optimize the stuff.

There are cases where speed of the language ITSELF is important and saying 'jump to java in case of trouble' is not going to cut it. This is why it is nice to know how fast is the language on it's own. This is why I'm so angry when I see people comparing groovy to java speed by doing 20 seconds blocking query to database and saying it is taking 20.0001s in java and 20.01s seconds in groovy and saying that groovy is just fraction of percent slower than java in "all interesting use cases".

I AM using groovy for scripting (especially for things which are touching both filesystem and database) - and performance is for sure not any issue. I would have no problems with using it for any 'normal' web-based application or some DB-based reporting thingy. Still, when you have to process tens of thousands of updates per second on single machine and have to fight for every few bytes of garbage so gc won't become a pain, it is good to know that groovy is not a way to go... at least before Groovy++ came into picture - I'm looking at it closely. On the other hand, I know about scala now, so next similar try will be probably scala-based...

Adam Crain said...

Wow, I actually did the same thing for my first blog post:

http://agile-automata.net/

I think that is the natural way to handle logging in Scala. I will widen my trait now to incorporate the standard slf4j interface as yours does. Check out the use of by-name params and closures in my post.

Trevor Tippins said...

I've been doing Project Euler problems in Groovy and have found that Java is around 6-8x faster than Groovy when using built-ins i.e. when Groovy is mostly used to glue Java object together, 40-60x faster when doing "general" programming and several hundred times faster when doing computationally intensive work (as Java allows native ints without all the autoboxing that goes on in Groovy). The "syntactical sugar" features of Groovy are quite expensive in runtime performance.

Obviously these are coarse generalizations, and the usage in PE problems is somewhat specialized, but you can find some more details on my blog: http://keyzero.wordpress.com

Note though that Groovy quite scores well when compared to other dynamic scripting languages such as Ruby, JRuby and Python.

Anonymous said...

Tried with scala 2.8 vs java 1.6.

Java 110 ms
Scala 34 ms

The optimization of scala 2.8 paid of i guess :)

Anonymous said...

Tried Groovy++

100K elements:
java 9 ms
groovy++ 18 ms

1M elements:
java 68 ms
groovy++ 117 ms

Anonymous said...

interesting fact for groovy++:
if I change
def int m = a[((L+R)/2)]
to
def int m = a[((L+R)*0.5)]
time goes from 117 ms to 1010 ms

rakeback said...

Indeed java is the fastest for the moment and groovy just can compete for now
Adrian frorakebackm

The Author said...

Interesting test but the best tests are, IMHO, "real world". For instance, I changed the Groovy code to:

def quicksort(a) {
Arrays.sort(a)
}

And the time on my box went from 2406 to 12 ms!

In the real world I'd use Arrays.sort() before I attempted to code my own sort :-)

Artur Biesiadowski said...

@E
Thats the entire point, groovy is not a language in which you will write a sorting algorithm, you need to use something else.

Going further this way, I can write state of art 3d game in groovy.

System.exec("crysis.exe")

PokerEx said...

What I want to know is why I should care? I mean, not to say that what youve got to say isnt important, but I mean, its so generic. Everyones talking about this man. Give us something more, something that we can get behind so we can feel as passionately about it as you do.

Ocular Concepts said...

Lovely post. The scala 2.8 reacts to your code snippets.

Unknown said...

http://stronglytypedblog.blogspot.com/2009/07/java-vs-scala-vs-groovy-performance.html

Anonymous said...

This change

int m = a[(L+R) >> 1]

produces 3 times faster execution
than

int m = a[((L+R)*0.5).intValue()]

Using groovy++, performance is equivalent than Java (with the >> 1 change... of course!!!)

Anonymous said...

i wonder whether those who actually fund the software projects which use the languages mentioned would be concerned about the run time performance or the dev time performance.

they want the projects to be completed on time.

Anonymous said...

Where is the groovy code? You have two java version and a scala version dude

orubel said...

This is a WEAK groovy test! First no one would use 'def' and second, you aren't using groovy functions. You don't even understand how to write Groovy; you are deliberately writing your groovy test to FAIL!