Wednesday, June 30, 2010

Lambdas in Java Preview - Part 1: The Basics

As announced at Devoxx last year, closures (or better lambda expressions) will (probably) be added to JDK7. The team of project lambda has checked in initial parts of the implementation into the OpenJDK repositories. This is the first part (see part 2) in a series of blog posts giving some practical examples of lambdas, how functional programming in Java could look like and how lambdas could affect some of the well known libraries in Java land. Although most of the examples will work with the current prototype implementation, keep in mind, that this is just a preview, which is based on the straw-man proposal, the specification draft, the discussions on the project lambda mailing list and the current state of the prototype. There might/will be both semantical and syntactical differences to the final version of lambdas in Java. Also some details are left out, e.g. exception handling will probably be out of scope.

The focus of this series is on practical usage of lambdas, not in principal on the fundamental concepts of lambdas. You don't need to have any theoretical knowledge of lambda calculus or a strong background in functional programming. And don't break your mind on what closures are, just think of them as anonymous functions or code blocks that can be passed around. But despite you don't need to know about all this, I don't want to discourage you from researching these topics.

Note: If you want to try out some examples there's a little trick to get the lambda prototype working, see here.

Lambda expressions
To get started, let's have a look at a very simple example of an anonymous function or "lambda expression". The following is a lambda expression that takes an integer x and returns 2*x:
  1. #(int x)(2*x)  
The hash symbol introduces the lambda expression, or "function literal". The first pair of parentheses is a comma-separated argument list and the second pair of parentheses encloses the expression, which is evaluated on invocation. Here is another example of a lambda expression, that takes two arguments x and y and returns their sum:
  1. #(int x, int y)(x+y)  
What we would naturally do with a lambda expression is to evaluate it. We could directly invoke one of the function literals above directly with #(int x)(2*x).(7), but this will probably be a bit uncommon.

Function types
Instead invocation will mostly happen on variables of a function type. So we first bind a function to variable and do the invocation on that. The syntax for function types is very similar to the syntax of function literals:
  1. #int(int) doubler = #(int x)(2*x);  
This declares a variable doubler of type #int(int), i.e. of type function that takes an int (in parentheses) and returns an int (after the #). Now, to invoke doubler we can write
  1. int n = doubler.(3);  
  2. assert n == 6;   
Notice the dot before the parentheses. For another example, let's look at the sum lambda again:
  1. #int(intint) sum = #(int x, int y)(x+y);  
  2. int x = sum.(37);  
  3. assert x == 10;  
More complex expressions
So far the body of the lambda expressions was just a single expression. In this case it can be included in parentheses and the return keyword can be omitted. In the more complex case the body can be a block with curly braces and must explicitly return a value (if it's not void):
  1. #int(intint) max = #(int x, int y) {               
  2.     if (x >= y) return x;  
  3.     else return y;  
  4. };  
  5. int z = max.(3,4);  
  6. assert z == 4;  
Higher-order functions
Of course, functions can also be passed as arguments to methods and other functions. This will be the most common usage. A function, that takes an integer n and function f, that executes f n times could look like this:
  1. public static void times(int n, #void(int) f) {  
  2.     for (int i=0;i<n;i++) {  
  3.         f.(i);  
  4.     }  
  5. }  
The following will print the squares of 0 to 4 on the console.
  1. times(5, #(int index)(System.out.println(index*index)));  
Functions and methods can also have a function as their return value. The following method takes an integer x and returns a function that takes another integer y and returns x*y.
  1. public static #int(int) multiplier(final int x) {  
  2.     return #(int y)(x*y);  
  3. }  
The invocation looks like this:
  1. #int(int) mult5 = multiplier(5);  
  2. assert 20 == mult5.(4);  
This case also shows, that it is possible to capture variables from the enclosing scope (this is what makes it a closure by the way). x is a free variable and its definition is copied over into the body of the lambda expression at runtime. For this to work x must be declared effectively-final or shared (see straw-man proposal for details).

That's it
That's basically it for the fundamentals of lambda expression in Java. But you will probably have noticed by now, that it will have a huge impact on Java, both the language and the libraries.

Side note: Some people don't like the syntax of lambdas as above. I don't want to start the discussion here again, just two points. First, the syntax can actually be awkward in some cases, but most of the time it's just passing around functions as literals or variables, which doesn't look awkward. And second, because Java is a statically typed language and has features like checked exceptions and others, closures won't look like in dynamically typed languages or languages with strong type inferencing or languages without checked exceptions.

Function conversion
This last section is about function conversion, which isn't something that is essential to lambda expressions, but will also have a huge impact. Many Java libraries use so called SAM types (single abstract method) - interfaces with only a single method and abstract classes with only one abstract method. Function conversion means that a function of appropriate type can be converted into an anonymous instance of a SAM type as needed. For example, the Collections.sort method takes a List and a Comparator, which has a single abstract method int compare(T x, T y). Up to now this would look like this:
  1. // This would be just '= [4,2,1,3]'   
  2. // with collection literals  
  3. List<Integer> list =   
  4.     Arrays.asList(new Integer[]{4,2,1,3});  
  5.   
  6. Collections.sort(list, new Comparator<Integer>() {  
  7.     public int compare(Integer x, Integer y) {  
  8.         return -x.compareTo(y);  
  9.     }  
  10. });  
But with function conversion we can substitute a function of appropriate type:
  1. Collections.sort(list,   
  2.     #(Integer x, Integer y)(-x.compareTo(y)));  
This will probably be used very often, because SAM types are so common in the Java world. As a side note, an open questions to me is, if API designers should choose to take functions as arguments to their methods/functions or SAM types, which may be more flexible and more expressive.

Final example
For a final example we implement the Fibonacci function and call it several times in parallel threads.
  1. class ParallelFib {  
  2.   
  3.     final static #void(int,int) fib =   
  4.         #(int c, int n) {  
  5.             int result = fib(n);  
  6.             System.out.println(c + ") " +   
  7.                 "fib(" + n + ") = " + result);  
  8.         };  
  9.   
  10.     public static int fib(int n) {  
  11.         if (n == 0 || n == 1return 1;  
  12.         else return fib(n-1) + fib(n-2);  
  13.     }  
  14.   
  15.     public static void main(String[] args) {    
  16.   
  17.         for (int i=0;i<10;i++) {  
  18.             final int i2 = i;  
  19.             new Thread(#()(fib.(i2, 32))).start();  
  20.         }      
  21.   
  22.     }  
  23.   
  24. }  
fib first occurs as a class variable of function type. This lambda expression takes a counter c and input n, calls the method fib (I don't know, if recursive lambda calls are possible at the moment) and then prints the counter and the result. The main method creates 10 threads each taking the fib lambda expression, which is converted implicitly into a Runnable. The output is something like this:
  1. 1) fib(32) = 3524578  
  2. 3) fib(32) = 3524578  
  3. 2) fib(32) = 3524578  
  4. 0) fib(32) = 3524578  
  5. 7) fib(32) = 3524578  
  6. 4) fib(32) = 3524578  
  7. 9) fib(32) = 3524578  
  8. 6) fib(32) = 3524578  
  9. 8) fib(32) = 3524578  
  10. 5) fib(32) = 3524578  
Feel free to post comments on what you think about lambdas right here, or give feedback directly to the members of project lambda, which I think is always welcome.

12 comments:

tbee said...

Very insightful read!

This lamba notation is not really easy to read, it might just take some getting used to, but they're introducing more and more symbols as syntax, which is not "java" like.

And it seems to suffer from the same introduction problem generics did; duplication.

#int(int, int) sum = #(int x, int y)

vs

Map<Integer,Integer> x = new HashMap<Integer,Integer>();

In Java 7 they are going to try and remove duplication from generics, maybe try and do that immediately for lamba?

#int sum = #(int x, int y)

Nick Wiedenbrück said...

I had the same feeling, that some more type inferencing would be nice. Especially for higher order function it can become quite hard, e.g. the type of a function that takes an int and returns another function that takes an int and returns an int would be:

##int(int)(int)

Although, here the code is even more readable than natural language. I'll come up with more examples of this type in the next post.

Generally I think, that the introduction of lambda expressions is similar to that of generics in this respect, because this kind of complexity occurs mostly in the implementation of APIs, but is hidden from clients of these APIs.

Olivier said...

Excellent summary!

It's great to see some real-life examples after all the fuss going on lately around the JDK 7 unit tests.

Thanks for taking the time to write this article, this goes to my shared reading list.

steve said...

After reading the available things on the mailing list and talking to people, I got the strong impression that closures are neither anywhere working (e. g. break after you push things a bit further), well specified or ready for inclusion in Java 7 (except for the case Java 7 will be postponed until 2013).

I just wonder what Oracle is up to?

Did they see Sun's disaster regarding Generics in Java 5 and thought "us too!"?

Will we have to live with yet another half-baked feature?

In my opinion, Java 7 with ARM, Time & Date API and Project Coins improvements are quite acceptable.

I just wonder what sense it does make do add something to the Java language, which basically changes the way people can think and work, if there is no interest in adapting existing APIs.
(Sure I heard about Defender methods, but really ... are these people joking?)

Will Oracle just e. g. deprecate the existing Collection framework when Java 7 arrives and add a new one with foreach, map, forall, flatten, etc,. methods?
If not, why even bother about closures in Java?

If people want to use closures, they should just take a different VM language, which had them since day one and has APIs which actually use them.

Maybe these things are the reason why there is no JSR for Java 7 yet.
Because closures wouldn't get very far if other community members had a say about it, Oracle puts closures and all these other improvements together and finally says "here it is - Java 7: Take it or leave it".

That reminds me how laws are made, but not how software engineering is done.

But thanks for your information.

Nick Wiedenbrück said...

The primary reason to introduce closures was to make it easier to write parallel programs in Java with the new fork/join framework, e.g. with parallel arrays. I found that a bit strange, though.

Gili said...

Well said Steve!

steve said...

@Nick Wiedenbrück:

Not anymore. They removed function types from closures this month.

Without function types it will not be possible to reduce the 132 SAM types in Parallel Array's Ops.

So the reason for closures Oracle gave us is null and void.

Nick Wiedenbrück said...

The latest I've heard (and that is still a proposal as well) is that it is actually true, that function types could be removed. But that there will still be lambda expressions which can be converted to SAM types.

Justin N. said...

In the multiplier example of the section "Higher-order functions", the instantiation of mult5 variable is done as multiplier(5).

Is this correct or should it be multiplier.(5)?

Thanks in advance.

Nick Wiedenbrück said...

@JNau No, that's okay. multiplier is a method, so it's called multiplier(5). But it returns a function type - that is called with a dot and parentheses mult5.(3)

Anonymous said...

Thanks for the post, really informative.

Just wandering what was the reason to have the syntax for invocation with "." before argument list. Is this based on "instance.field" syntax?

This seems somewhat different when compared to C/C++ function pointers, which this probably closest related to, where invocation of the function pointer is conveniently seamless and indistinguishable from regular method invocation.

Nick Wiedenbrück said...

Regarding the dot before the parameter list, this is because in Java variables and methods of a class have distinct namespaces. So you can have both a variable foo of function type and a method called foo within the same class. So, if you would call foo(), it would not be clear, whether to call the function typed variable foo or the method foo.