Tuesday, July 13, 2010

Lambdas in Java Preview - Part 3: Collections API

This is the third part in a series of blog posts (read part 1, part 2 and part 4) giving some practical examples of lambdas, how functional programming in Java could look like and how lambdas could affect some of the well known libraries in Java land. In this part I'll focus on how the addition of lambdas could affect one of the most used standard APIs - the Collections API.

Note: While writing this post, a new proposal has been published, that uses a different syntax for lambdas, removes function types and uses more type inferencing. I'll post a short description of the changes soon. The new proposal would change some of the examples in this post, but the examples are still essentially valid and useful.


The Collections API is one of the most used APIs around in Java. Amongst its mostly used collection types are lists, sets and maps. Also the Collections class, which provides utility methods for operating on these collections is commonly used. The reason why I chose the Collections API as an example on how lambdas might influence existing APIs is that certain kinds of operations on collections can be written much more natural in a functional programming style than in an imperative style. This is also one of the reasons for the growing number of projects that aim to bring functional elements to Java in the form of libraries, e.g. Google Collections or lambdaj.

Status quo
One of the common operations on lists is filtering a list for elements matching a certain condition, say for example filtering a list of integers for the even ones. We could implement a method findEven like this:
List<Integer> findEven(List<Integer> list) {
  List<Integer> result = new ArrayList<Integer>();
  for (Integer n : list) {
    if (n % 2 == 0) result.add(n);
  }
  return result;
}
We can call it with a list of integers and it will return a list containing the even integers. Works just fine, but what if we'd also want a method that returns only the odd elements? There is no straight way of building an abstraction for that, so we'd write a method findOdd which would look 99.3865% the same as findEven.

Google Collections to the rescue we can use the Iterables.filter method which takes a list and filters it by a given Predicate.
Iterable<Integer> even = Iterables.filter(list,
  new Predicate<Integer>() {
    @Override
    public boolean apply(final Integer n) {
        return n % 2 == 0;
    }
  });
This filter method is way more flexible as it can take any predicate you like to filter by. But actually it really doesn't look nice, at least unless you build your own library of reusable predicates.

Simplification through lambdas
Now, if we have a look at it, the essential part of the predicate is just the expression n % 2 == 0. The rest is just boilerplate. With lambdas we can actually remove all(!) of this boilerplate. To do that, we implement a filter method quite similar to that of the Google Collections, but instead of taking a predicate of type Predicate as the second argument, it'll take the predicate as a lambda:
public static <T> List<T> filter(List<T> list, 
    #boolean(T) predicate) {
  List<T> result = new ArrayList<T>();
  for (T t : list) {
    if (predicate.(t)) result.add(t);
  }    
  return result;
}
Say we have implemented this filter method in a class called FCollections, calling it looks like this then:
List<Integer> list = Arrays.asList(4,3,2,6,4);
List<Integer> even = FCollections.filter(list, #(Integer n)(n % 2 == 0));
Compare that to calling the Google Collections filter method. It's much more concise, readable and expressive.

Extension methods
Before we have a look at more operations on collections of this sort, I want to mention a feature that will probably also be introduced in JDK7 - extension methods.

Notice that we have put our filter method in a utility class FCollections. But filter should probably rather be a method of List or Collection itself. Unfortunately, adding a new method to these interfaces would break existing sub classes of List or Collection, something we definitively wouldn't want. Now, extension methods allow for adding a new method to an interface and specifying a default implementation of this method:
public interface List<E> extends Collection<E> {
    
    List<E> filter(#Boolean(E) predicate) import static Collections<E>.filter;
    ...
Now, if a sub class of List doesn't implement a filter method itself Collections.filter is called by default. Collections.filter would look like our FCollections.filter method - it takes a list as the first argument followed by the predicate lambda and it must return a List.

So with extensions methods it would be possible to add new operations on collection types like those showed in this post without breaking existing sub classes (at least as long as these subclasses didn't implement an incompatible filter method before). But extension methods are not the primary topic of this post (search the web for extension methods or public defender methods for more details), let's go back and see some more useful operation on collections.

More operations
There's a bunch of operations that can be implemented in a more functional style, e.g. iterating a list and applying a function to each element.
public class MyList<T> extends ArrayList<T>
  
  public void each(#void(T) f) {
    for (T e : this) {
      f.(e);        
    }     
  }
Instead of iterating with a for loop, we can call the each method like this:
list.each(#(Integer n)(System.out.println(n)));
Another example is the map function that takes a function, applies it to each element of the list and returns the result as a list:
public <S> List<S> map(#S(T) f) {
  List<S> result = new List<S>();
    for (T e : this) {
        result.add(f.(e));        
    }    
    return result;
}
We could use it to create a list of squares from a list of integers:
List<Integer> squares = list.map(#(Integer n)(n*n));
foldLeft is a another method that combines the elements of a list together using a function f from left to right (foldRight does the same from right to left) starting with a value z, i.e. the result is f(f(f(f(f(z,a1),a2),a3),a4),a5)
public <S> S foldLeft(S z, #S(S, T) f) {
    S acc = z;
    for (T e : this) {
        acc = f.(acc, e);
    }    
    return acc;
}
A simple way to use it would be to calculate the sum of the elements:
Integer sum = list.foldLeft(new Integer(0), #(Integer a, Integer b)(a+b));
reduce does the same, but without an initial value.
public T reduceLeft(#T(T, T) f) {
    if (isEmpty()) throw new UnsupportedOperationException("reduce on empty list"); 
    if (size() == 1) return get(0);

    T acc = get(0);
    for (T e : subList(1, size())) {
        acc = f.(acc, e);
    }    
    return acc;
}
Integer sum = list.reduceLeft(#(Integer a, Integer b)(a+b));
The find method searches for the first element matching a given predicate and returns it:
public T find(#Boolean(T) predicate) {
    for (T e : this) {
        if (predicate.(e)) return e;
    }    
    return null;
}
Integer found = list.find(#(Integer n)(n>5));
There are many more possibilities to use predicates in this way, here are some short examples:
boolean hasPrime = list.exists(#(Integer n)(isPrime(n));
boolean allPrimes = list.forall(#(Integer n)(isPrime(n));
boolean countPrimes = list.count(#(Integer n)(isPrime(n));
boolean withoutPrimes = list.remove(#(Integer n)(isPrime(n));
Pair<List<Integer>> oddAndEven = list.partition(#(Integer n)(n%2==0));

To summarize, as you can see the Collections API could benefit in a lot of ways from the addition of lambdas. For some sort of operations, lambdas would increase readability, conciseness and expressiveness. Extension methods could make it possible to have some of the new operations directly in the collection classes instead of in utility classes like Collections.




3 comments:

cde said...

Why use the predicates directly ? Why would you want to re-implement all that when this would work :

List list = ...
Iterables.filter(list, #(Integer)(n%2==0))

This will work because it will result in the automatic conversion of the lambda into a predicate (since predicate has only a single "pure virtual" method (sorry I come from C++), and it will fill that in)

Nick Wiedenbrück said...

Yes, re-implementing any existing interfaces wouldn't make much sense. Same with the FileFilter example in part 2 (as mentioned). The Predicate interface is part of the Google Collections library though. I'd rather see it as part of the Collections API itself.

Currently I'm working on the 4th part, where this will also be a topic, again. So, stay tuned.

Ran Biron said...

The sheer amount of extra method calls is overwhelming.
It looks like Lambdas rely heavily on the JVM ability to inline code. I hope this is taken into account and that Lambda expressions optimization is in place.