Strongly Typed, Loosely Coupled

Wednesday, June 30, 2010

Lambdas in Java Preview - Part 1: The Basics

As announced at Devoxx last year, closures (or better lambda expressions) will (probably) be added to JDK7. The team of project lambda has checked in initial parts of the implementation into the OpenJDK repositories. This is the first part (see part 2) in a series of blog posts giving some practical examples of lambdas, how functional programming in Java could look like and how lambdas could affect some of the well known libraries in Java land. Although most of the examples will work with the current prototype implementation, keep in mind, that this is just a preview, which is based on the straw-man proposal, the specification draft, the discussions on the project lambda mailing list and the current state of the prototype. There might/will be both semantical and syntactical differences to the final version of lambdas in Java. Also some details are left out, e.g. exception handling will probably be out of scope.

The focus of this series is on practical usage of lambdas, not in principal on the fundamental concepts of lambdas. You don't need to have any theoretical knowledge of lambda calculus or a strong background in functional programming. And don't break your mind on what closures are, just think of them as anonymous functions or code blocks that can be passed around. But despite you don't need to know about all this, I don't want to discourage you from researching these topics.

Note: If you want to try out some examples there's a little trick to get the lambda prototype working, see here.

Lambda expressions
To get started, let's have a look at a very simple example of an anonymous function or "lambda expression". The following is a lambda expression that takes an integer x and returns 2*x:

#(int x)(2*x)

The hash symbol introduces the lambda expression, or "function literal". The first pair of parentheses is a comma-separated argument list and the second pair of parentheses encloses the expression, which is evaluated on invocation. Here is another example of a lambda expression, that takes two arguments x and y and returns their sum:

#(int x, int y)(x+y)

What we would naturally do with a lambda expression is to evaluate it. We could directly invoke one of the function literals above directly with #(int x)(2*x).(7), but this will probably be a bit uncommon.

Function types
Instead invocation will mostly happen on variables of a function type. So we first bind a function to variable and do the invocation on that. The syntax for function types is very similar to the syntax of function literals:

#int(int) doubler = #(int x)(2*x);

This declares a variable doubler of type #int(int), i.e. of type function that takes an int (in parentheses) and returns an int (after the #). Now, to invoke doubler we can write

int n = doubler.(3);
assert n == 6;

Notice the dot before the parentheses. For another example, let's look at the sum lambda again:

#int(int, int) sum = #(int x, int y)(x+y);
int x = sum.(3, 7);
assert x == 10;

More complex expressions
So far the body of the lambda expressions was just a single expression. In this case it can be included in parentheses and the return keyword can be omitted. In the more complex case the body can be a block with curly braces and must explicitly return a value (if it's not void):

#int(int, int) max = #(int x, int y) {             
    if (x >= y) return x;
    else return y;
};
int z = max.(3,4);
assert z == 4;

Higher-order functions
Of course, functions can also be passed as arguments to methods and other functions. This will be the most common usage. A function, that takes an integer n and function f, that executes f n times could look like this:

public static void times(int n, #void(int) f) {
    for (int i=0;i<n;i++) {
        f.(i);
    }
}

The following will print the squares of 0 to 4 on the console.

times(5, #(int index)(System.out.println(index*index)));

Functions and methods can also have a function as their return value. The following method takes an integer x and returns a function that takes another integer y and returns x*y.

public static #int(int) multiplier(final int x) {
    return #(int y)(x*y);
}

The invocation looks like this:

#int(int) mult5 = multiplier(5);
assert 20 == mult5.(4);

This case also shows, that it is possible to capture variables from the enclosing scope (this is what makes it a closure by the way). x is a free variable and its definition is copied over into the body of the lambda expression at runtime. For this to work x must be declared effectively-final or shared (see straw-man proposal for details).

That's it
That's basically it for the fundamentals of lambda expression in Java. But you will probably have noticed by now, that it will have a huge impact on Java, both the language and the libraries.

Side note: Some people don't like the syntax of lambdas as above. I don't want to start the discussion here again, just two points. First, the syntax can actually be awkward in some cases, but most of the time it's just passing around functions as literals or variables, which doesn't look awkward. And second, because Java is a statically typed language and has features like checked exceptions and others, closures won't look like in dynamically typed languages or languages with strong type inferencing or languages without checked exceptions.

Function conversion
This last section is about function conversion, which isn't something that is essential to lambda expressions, but will also have a huge impact. Many Java libraries use so called SAM types (single abstract method) - interfaces with only a single method and abstract classes with only one abstract method. Function conversion means that a function of appropriate type can be converted into an anonymous instance of a SAM type as needed. For example, the Collections.sort method takes a List and a Comparator, which has a single abstract method int compare(T x, T y). Up to now this would look like this:

// This would be just '= [4,2,1,3]' 
// with collection literals
List<Integer> list = 
    Arrays.asList(new Integer[]{4,2,1,3});

Collections.sort(list, new Comparator<Integer>() {
    public int compare(Integer x, Integer y) {
        return -x.compareTo(y);
    }
});

But with function conversion we can substitute a function of appropriate type:

Collections.sort(list, 
    #(Integer x, Integer y)(-x.compareTo(y)));

This will probably be used very often, because SAM types are so common in the Java world. As a side note, an open questions to me is, if API designers should choose to take functions as arguments to their methods/functions or SAM types, which may be more flexible and more expressive.

Final example
For a final example we implement the Fibonacci function and call it several times in parallel threads.

class ParallelFib {

    final static #void(int,int) fib = 
        #(int c, int n) {
            int result = fib(n);
            System.out.println(c + ") " + 
                "fib(" + n + ") = " + result);
        };

    public static int fib(int n) {
        if (n == 0 || n == 1) return 1;
        else return fib(n-1) + fib(n-2);
    }

    public static void main(String[] args) {  

        for (int i=0;i<10;i++) {
            final int i2 = i;
            new Thread(#()(fib.(i2, 32))).start();
        }    

    }

}

fib first occurs as a class variable of function type. This lambda expression takes a counter c and input n, calls the method fib (I don't know, if recursive lambda calls are possible at the moment) and then prints the counter and the result. The main method creates 10 threads each taking the fib lambda expression, which is converted implicitly into a Runnable. The output is something like this:

1) fib(32) = 3524578
3) fib(32) = 3524578
2) fib(32) = 3524578
0) fib(32) = 3524578
7) fib(32) = 3524578
4) fib(32) = 3524578
9) fib(32) = 3524578
6) fib(32) = 3524578
8) fib(32) = 3524578
5) fib(32) = 3524578

Feel free to post comments on what you think about lambdas right here, or give feedback directly to the members of project lambda, which I think is always welcome.

Monday, June 14, 2010

Eclipse Helios Preview for Java Developers

The final release of the new Eclipse 3.6 - Helios is coming close on June 23. Here is a short overview of some of the new features for Java developers.

Open Implementation
Open Implementation, which was available by holding Ctrl and hovering over a method in the editor, is now also available in the navigate menu and can be bound to a keyboard shortcut (e.g. F4?).

Expressions View
When debugging, it's now easier to add expressions in the expressions view and the expressions view has now columns

Breakpoints View
The breakpoint view now provides the possibility to quickly edit some properties of the breakpoint, e.g. breakpoint condition.

Dragging and linking files from operating system
You can now drag files from the operating system into the package explorer and instead of copying them there you can create just links. This feature is definitely useful in a lot of cases.

Move Type to New File Refactoring
Convert Member to Top Level Type has been renamed to Move Type to New File.

Javadoc includes annotations
This is a feature I've been waiting for quite long. The JavaDoc hovers now include information about annotations.

Tuesday, June 1, 2010

So you find Java 7 closure syntax nasty?

Two days ago Baptiste Wicht blogged about the first version of closures pushed by Oracle. He says "This syntax is a little bit shocking, but I think we’ll get used." and the comments to his post go like "Really ugly" and "terrible syntax". Same over at Java Posse Google Group: "closures are looking worse by the day", "I realllllly don't like the look of them", "Wow that's ugly. Source code obfuscaters are going to have a blast".

But let me ask one question: Why has it taken so long for the Java community to realize that? It's just a few month until JDK7 is going to be released, but the closure debate has been around for quite a long time now (not to say years). There were several proposals, there has been the straw-man proposal and the spec draft and the announcement last Devoxx. But there hasn't been much of a discussion in the community.

What is the reason for that? I have some assumptions. Has project lambda been too quiet? There weren't many blog posts or examples. Or is there something broken with the community? Were have the community leaders been? Are they too busy learning Scala? Or is it just that closures in a statically typed language (without type inferencing) are intrinsically complex?

And by the way, what about Java modularity?

Sunday, February 7, 2010

Groovy++ Performance - Now we're talkin'

This is a follow-up of my last post, where I compared the runtimes of the quicksort algorithm implemented in Java, Scala, Groovy and Groovy++. Groovy++ came up with a significant improvement over Groovy. But there was a comment by ait (Thanks ait) on how to improve the performance of the Groovy++ implementation even further, by changing a single line. In fact, this improved the performance dramatically.

In the previous post I showed the de-compiled byte code of the Groovy++ implementation of the algorithm. There were two things that were different from the Java byte code: array access via ArraysMethods.getAt(index) and the integer boxing und calculation. Ait recommended to change the following line

into

Obviously, this tackles the integer handling problem. After changing this line I ran the code again and had a look at the timings. The results are amazing. Here they are for sorting 100.000 and 1.000.000 integers:

n	100.000	1.000.000
Java	12	95
Scala	12	95
Groovy	3750	41700
Groovy++ (old version)	515	4900
Groovy++ (new version)	19	130

This is marvellous! Now Groovy++ is really on a par with Java and Scala!

De-compiling the byte code again shows the difference. Here's the old byte code:

And here is the new version:

Still, there is no native array access, but the integer boxing is gone, it's primitive integer calculations.

Ait (Is that you, Alex?) will come up with a post with more details. I'll provide a link here then.

Update: These are the numbers for Java, Scala and Groovy with shifting instead of division. Only Groovy can take advantage of this:

n	100.000	1.000.000
Java	11	90
Scala	12	90
Groovy	1900	22000
Groovy++ (new version)	19	130

Saturday, February 6, 2010

Groovy++ vs. Groovy vs. Java vs. Scala - Performance Update

I've been writing about a performance comparison between Java, Scala and Groovy in this and this post, where I compared the runtimes of these languages for a straight-forward (and far from idiomatic) implementation of the quicksort algorithm. As you might have heard, there's a new kid on the block: Groovy++ a.k.a. static Groovy. I was eager to see, how this would improve Groovy's results.

So, (without any further justification of this comparison) I took the same source code for Java, Scala and Groovy as previously and took this modified version of the Groovy code for Groovy++:

So, I basically took the Groovy code, and added the @Typed annotation and some more type information. I ran this on my machine with Java 1.6.0 Update 17, Scala 2.7.7, Groovy 1.7 and Groovy++ which is based on Groovy 1.8.0-SNAPSHOT. Here are the results (in ms) for sorting 100.000 and 1.000.000 integers:

n	100.000	1.000.000
Java	12	95
Scala	12	95
Groovy	3750	41700
Groovy++	515	4900

Obviously, while still Java and Scala are by far the fastest, Groovy++ makes a huge difference. Groovy takes about 312x the time of Java/Scala, Groovy++ just takes 42x.

While there isn't much difference in the source code of Groovy and Groovy++, there is of course a big difference in the generated byte code. I had a look at it with the Java Decompiler. The Groovy byte code contains lots of dynamic dispatch code, while there's not much in the Groovy++ byte code that indicates that it was compiled from Groovy. Here's a short excerpt:


  public Object quicksort(int[] a, int L, int R) {
    while (true) {
      int m = ArraysMethods.getAt(a, DefaultGroovyPPMethods.div(DefaultGroovyPPMethods.box(L + R), DefaultGroovyPPMethods.box(2)).intValue());
      int i = L;
      int j = R;
      while (i <= j) {
        for (; ArraysMethods.getAt(a, i) < m; (i++));
        for (; ArraysMethods.getAt(a, j) > m; (j--));
        if (i <= j) {
          swap(a, i, j);
          (i++);
          (j--);
        }
      }
      if (L < j) quicksort(a, L, j);
      if (R <= i) break; R = R; L = i; a = a; this = this; } return null;
  }

This isn't even far from the byte code generated by java. There is just that autoboxing of integers, integer division and the array access with getAt(). Considering the early state of Groovy++ this looks very promising.

Update: There's is a follow-up post that shows how to tune the Groovy++ code to make it run as fast as Java and Scala (Thanks to ait, see comments).

Wednesday, December 16, 2009

Lift from a Wicket Developer's Perspective

I've been messing around with Scala again lately. After learning the basics of the Scala language, I decided to have a look at the lift web framework - "the simply functional web framework". In this highly subjective post I will outline, why I'll stick with Wicket.

My Lift Experience

To be clear, I'm not an experienced Scala developer. I did my first steps in Scala a year ago or so, and I just started learning Lift by having a look at the getting started guide. But this short first impression was enough for me to decide, that I'm not going any further with lift. Without going into much detail, here's why:

Lift is a template driven framework, whereas Wicket is component based. In Lift you compose a page of (X)HTML templates, called snippets. In Wicket you compose a page of components, i.e. not through (X)HTML but through an object graph/hierarchy. Lift's template approach allows for reusable templates, but I like the Wicket's object oriented, hierarchical approach much more.

There's not much documentation on Lift. There is the getting started guide, but the examples did not work immediately, because the API had changed. This troubled me a lot. Also, there's not much documentation in the source code.

A positive aspect of Lift is that it comes with an integrated OR-Mapping API, which looks quite interesting. Also there's a nice query API that can be compared to Hibernate's criteria API, and at first glance it looks even nicer. I heard that lift also runs on Google App Engine (?). I wonder, though, how hard it was to get this integrated OR-Mapping/Query API to run on GAE.

The model/entity classes have too many responsibilities. They contain validation logic and OR-mapping logic. This is okay so far. But there's also conversion logic in there, e.g. from String to the specific data type, and there's also code for HTML and even JavaScript generation. The MappedBoolean class for example, which is used to represent a boolean property of an entity, also has methods like toForm, which yields a (X)HTML checkbox element, and asJsExp, which yields the corresponding JavaScript expression of the value. I'd rather seperate these responsibilities into more than one class.

In general, the seperation of presentation content and logic is not quite as good as stated in the getting started guide. Here is an example snippet from the getting started guide:
 private def header(td: ToDo, reDraw: () => JsCmd) = swappable({td.header}, {ajaxText(td.header, v => {td.header(v).save;reDraw()})}) 
Without getting into the details, this method contains presentation logic in the form of HTML span elements and a checkbox input element generated by the ajaxText call. Actually, these aren't really HTML element strings in the code but this is Scala's XML support. But that does not make it better, right? You'll never - well, almost never - see such things in Wicket.

Lift is a Scala web framework, Wicket is Java. And although Scala in general is a great language and superior to Java in many aspects, I'm still not quite sure whether it will become my first language of choice ever. I still have some issues with it.

Related to this is the tooling aspect. The IDEs (Eclipse, IDEA) still lack full support for the Scala language. Maven support for Scala/Lift is quite good, though.

So, this is was my first impression of Lift. Don't get this wrong, Lift has some interesting aspects and in some ways it's much better than many other Java frameworks. But if I had to choose a web framework for a green field project right now, I'd certainly go with Wicket.

Friday, November 13, 2009

Concurrency is not such a big Deal ...

... when it comes to web development.

In the current debate about the future of Java (Java the language, of course) and which alternative language would be an appropriate successor of Java (Scala, Clojure, ...), much of the discussion focuses on the question, which language provides the better concurrency model in order to utilize multi-core processors in the future. Actually, I'm not an expert in concurrency, and concurrency clearly is really hard to get right (at least in Java), but usually in web applications - and this is quite a huge part of the Java applications out there - concurrency is not such a big deal, even despite the fact, that web applications are one of the most concurrently running kind of applications.

Concurrency in typical web applications
Surely, most web developers know exactly how to tackle concurrency in web applications. It's not that hard. But for the fun it, let's have a closer look. In a web application client requests are handled by the app server concurrently, each in its own thread, possibly distributed among cpu cores. So generally, there are two situations, where threads run simultaneously:

Two or more requests of different clients

Two or more requests of the same client

These threads may possibly access (read and write) shared state. And concurrency is hard mainly because of mutable state, accessed by different threads simultaneously. But let's put it another way then:

Concurrency isn't hard for immutable state and

Concurrency isn't hard for state that is not accessed by different threads simultaneously (mutable or not)

Now let's have a look at what types of state there are in a typical web application and how it is accessed by different threads:

State of the current request: This is for example data needed to generate the response of the request, status of the request, and so on. In most cases, this kind of state is only available to a single thread, i.e. not accessed simultaneously. This is guaranteed by all application servers for request state held in the app server. And this is guaranteed by all web frameworks (also with front controller pattern) for request state held there. The developer has to take care not to store request state where it can be accessed by other request threads. This is neither hard to understand nor hard to accomplish.
Application state held in the application server or in the web framework: Examples are ServletContext or web framework settings. This is mostly static state that will be initialized at startup but won't be mutated in the lifetime of the application. Otherwise, the application server/web framework will take care that this kind of state is accessed in a thread-safe way.
Conversational state: HTTP is stateless. There is no conversational state. There is not even a conversation. Unfortunately, this is not exactly true in most web applications, as most applications will store some kind of conversational state in the HTTP session. This type of state might be accessed simultaneously by multiple threads, but only by threads spawned by the same client. In most applications concurrent access to this state is rare and might be well supported by the web framework. Nonetheless, this is state the developer has to take care of.
Application state persisted to the database (business state): In fact, this state is read and mutated highly concurrently by different threads. Fortunately, most persistence frameworks support the unit of work pattern: State is replicated from the database for each thread accessing it (as a copy). The thread modifies it and writes it back in a database transaction. Modifying state in this way (in a transaction) is an appropriate concurrency strategy. See also software transactional memory (STM), which is supported e.g. by Clojure.
"Classical" concurrently accessed state: For example heavy calculations done explicitly concurrently. In web applications, a request might spawn several other threads to do an expensive calculation. This is the "classical" case, where thread programming might surely become hard. This is definitively a situation where Scala or Closure or any other language with a better concurrency model might be appropriate.

Summary
In summary, if there is no state, there is no problem. Immutable state is also not a problem. State, that is only accessed by a single thread is no problem. And state that is modified in a unit of work/transaction is not a problem. And in web applications most state is of one of these kinds (and by the way, this is mostly because of the stateless, request based nature of HTTP). In this sense, Java is quite well prepared for the multi-core future in the field of web applications, although other JVM languages might have the better concurrency model. Nonetheless, there are some even better arguments than concurrency for some alternative languages, for example expressiveness, conciseness or productivity in general.

Update: I did a "prezi" of this post. What a fun tool! See here.