Baby Steps in Scala

This post is copied over from my internal blog inside IBM, just for posterity after I’ve left. It was first published in January 2011.

As a new years resolution to myself I have started to learn Scala, a static, type-inferenced; object-oriented but yet functional programming language that runs on the JVM. After ten years of concentrating mainly on server side Java programming I figured it was time to broaden my skill set a bit, as well as stretch the brain cells. I chose Scala because at heart I am a middleware/server guy and whilst I’ve worked on a product and with a team that does a lot with Javascript, it has never really grabbed me.

I’ve started reading Beginning Scala by David Pollak via Books 24×7, as well as various web articles, including a series on DeveloperWorks. However, the best way to learn new things is always to apply them to real problems. Therefore I took one particular, fairly self contained coding problem from Lotus Connections 3.0 and re-implemented it in Scala. Here’s how.

Group Synchronization in Scala

In various places in Connections we support access control via LDAP groups. We have no way of knowing when group membership changes, so typically retrieve the list of groups a user is in when they log in. This information is then used a lot during their session so it is beneficial to cache it (typically for ten minutes before we refresh the cache) It may also be beneficial to cache this data persistently so any part of the system can use it. Ideally we would simply use a distributed in-memory cache such as WebSphere Extreme Scale, but we don’t yet, so where appropriate we cache some things in the database.

So, there is a fairly straightforward bit of code that would need to synchronize the groups the user is in with the cached set of groups we have for the user in the DB. Most of the time they will simply be the same, but sometimes they will be added to new groups, or removed from old ones. So what this code would basically do is:

  1. Retrieve the list of groups the user is in from the authoritative source (e.g. LDAP)
  2. If they are in no groups, ensure the cached entries (if any) are removed)
  3. If they are in one or more groups, compare that with the cached version and add any new groups to the cache, and remove any groups from the cache they are no longer in.

For my Scala investigation I started out with the following assumptions:

  • I need to call the implementation of this sync service from existing Java code
  • I have two existing Java services for interacting with the authoritative source and the persistent cache.
  • I have an existing Java interface for the sync service that I need to ensure is implemented

One of the great things about Scala is that because it runs on the JVM and is compiled down into byte code, it is fully able to interact with existing Java code. Java can invoke Scala and vice versa. Scala classes can implement Java interfaces and all sorts of goodness. You can run Scala code in WebSphere.

So, first off, here is the Java Interface of the sync service itself. It is very simple:

/**
 * Provides methods to synchronize data stored locally in a 
 * persistent cache with the authoritative source.
 * 
 * @author aspender
 *
 */
public interface GroupSyncService {

	/**
	 * Retrieve the groups that the given person ID is in
	 * from the definitive resource, and update the 
	 * persistent cache that we use ourselves.
	 * 
	 * @param personId the id of the person to sync
	 */
	public void syncGroups(String personId);
	
}

I also have two existing services defined by interfaces called AuthoritativeSource and PersistentCache. The former provides a single method to retrieve the groups the user is in. The latter retrieves the cached copy of the groups, as well as providing methods to update the cache.

So, as a starting point, here is an example Java implementation of the GroupSyncService:

package com.adrianspender.groupsync.java.impl;

import java.util.HashSet;
import java.util.Set;

import com.adrianspender.groupsync.java.*;

/**
 * The Java version of our group sync service.
 */
public class GroupSyncServiceImpl implements GroupSyncService {
	
	private AuthoritativeSource authoritativeSource;
	private PersistentCache persistentCache;

	public GroupSyncServiceImpl(AuthoritativeSource authoritativeSource, PersistentCache persistentCache) {
		this.authoritativeSource = authoritativeSource;
		this.persistentCache = persistentCache;
	}

	public void syncGroups(String personId) {

		Set<String> currentGroups = authoritativeSource.getGroups(personId);
		
		if(currentGroups.size() == 0 ) {
			persistentCache.removeAllGroups(personId);
		} else {
			Set<String> cachedGroups = persistentCache.retrieveAllGroups(personId);
			
			Set<String> addedGroups = new HashSet<String>();			
			for(String groupId : currentGroups) {
				if(!cachedGroups.contains(groupId)) {
					addedGroups.add(groupId);
				}
			}
			persistentCache.addNewGroups(personId, addedGroups);

			Set<String> removedGroups = new HashSet();
			for(String groupId : cachedGroups) {
				if(!currentGroups.contains(groupId)) {
					removedGroups.add(groupId);
				}
			}
			persistentCache.removeGroups(personId, removedGroups);
		}
	}
}

I’ve deliberately left out comments and it also isn’t as defensive as i’d normally write code (e.g. no null checks – something which Scala doesn’t need so much anyway) The code is easy to follow, but note the fact there are two for loops doing the calculation of the difference between the two sets of groups (first one works out the groups that are not in the cache, the second works out the groups that should no longer be in the cache)

So, here’s my first go at a Scala version:

package com.adrianspender.groupsync.scala.impl

import com.adrianspender.groupsync.java._
import scala.collection.JavaConversions._

/*
 * The Scala version of our group sync service..
 */
class GroupSyncServiceImpl(authoritativeSource: AuthoritativeSource, persistentCache: PersistentCache) extends GroupSyncService {
	
	def syncGroups(personId: String) : Unit = {

		val currentGroups = authoritativeSource.getGroups(personId).toSet
		
		if (currentGroups.size() == 0) {
			persistentCache.removeAllGroups(personId)
		} else {
			val cachedGroups = persistentCache.retrieveAllGroups(personId).toSet
			
			persistentCache.addNewGroups(personId, currentGroups diff cachedGroups)
			persistentCache.removeGroups(personId, cachedGroups diff currentGroups)
		}
	}
}

This does exactly the same thing. The unit tests I wrote work exactly the same regardless of whether you use the Java or Scala version. The same Java Junit test works against both. Also note that the Scala version uses exactly the same Java implementations of AuthoritativeSource and PersistentCache. And that the Scala class implements the same GroupSyncService Java interface.

Of course, the other major thing to notice is the syntactic difference. There is no constructor – the variables of the class are defined on the class signature. There are no semi-colons. There are strange looking statements like ‘currentGroups diff cachedGroups’ which to a Java programmer may look more familiar if written as currentGroups.diff(cachedGroups) – which you could. However the syntax is still inherently familiar and easy to understand. This is certainly a big benefit of Scala. There’s much less need for nested blocks of code thanks to some of the syntactic sugar, as well as some of the capabilities of the languages and it’s libraries. The Java class is 47 lines to Scala’s 24 lines.

I’m not going to go into a line by line analysis of the Scala version, this blog post isn’t meant to be a tutorial. However one other point of note is how easy the collection libraries in Scala make working with arrays, lists, sets and maps. In this case I use the diff member of the scala.collection.immutable.Set class to extract the values from one set that are not present in the other. Simple, and something that in Java you have to loop to do. The collection capabilities really come into their power when combined with the functional aspects of Scala – for instance being able to pass a functional block of a code as a parameter to a method allows you do to complex things in single lines of code. There isn’t an example of functional programming in this particular example, but it is definitely something I need to get my head around further.

One thought on “Baby Steps in Scala

  1. Pingback: Drive Through… » Blog Archive » Unit testing in Scala

Leave a Reply