Wednesday, 16 February 2011

Cleaner Code with the Scala Type System

Last week I happened to be sitting with a colleague looking at some prototype Java code. We came across a method that neither of us was particularly happy with, but which represented the typical messy code that occurs when wanting to do anything involving types and classes in Java. My initial comment was that it "would be so much cleaner to do in Scala". I therefore thought I'd better have play with the code to prove (if only to myself) that I was right. This post discusses the result of my effort and my conclusions from having done this work.

The Problem

In order to understand the code, let's first look at the problem that it is tying to solve:

The domain is one related to content that can appear on a website. We have a persistent domain model representing this content. However, a number of the content items are sourced from and can be updated in external systems (e.g. a CMS system or a remote content provider). We therefore need to synchronise the state of these external systems into our domain model from time to time. The java code that we were looking at was inside this synchronisation component.

The content is retrieved from the external systems, wrapped in a subtype of ForeignObject. The subtype is based on where the content was obtained from. The ForeignObject implementations can examine the content they contain and return information about its type and its unique business identifier. The synchronisation code then uses this information to lookup the associated object in the persisted domain (which may or may not exist). The ForeignObject and domain object are then passed to a mapper that updates the domain object from data that it extracts from the ForeignObject instance.

The full source code and tests described below can be obtained from my Github account:

The Java Code

Let's first take a look at the Java code that we were less than happy about. Firstly, the ForeignObject implementation that returns the detail about the type of content that it contains:

public class CMSForeignObject implements ForeignObject {
     // Other code removed

    public Class getLocalClass() {
        if ("Story".equals(type)) return Story.class;
        else if ("Image".equals(type)) return Image.class;
        else if ("BinaryObject".equals(type)) return BinaryObject.class;
        else if ("RelatedLinks".equals(type)) return RelatedLinks.class;
        else throw new IllegalStateException("Unknown type: " + type);

As we can see it looks at some string based type field that it extracts from the wrapped content and returns the associated domain class that represents that content. Perhaps not the nicest code and returning class instances always feels a bit wrong. However, I don't have too many problems with this approach. Where the messy code comes in is in the synchronisation logic that uses this class:

public class SynchronisationService {
     // Setup code remove for clarity

    @SuppressWarnings( { "unchecked" })
    public Content sync(final ForeignObject foreignObject) {
        return findById(foreignObject.getLocalClass(), foreignObject);

    private  T findById(final Class clazz, final ForeignObject foreignObject) {
        if(Story.class.equals(clazz)) {
            CMSForeignObject cmsForeignObject = (CMSForeignObject)foreignObject;
            return (T)repository.findStoryById(cmsForeignObject.getId());
        else if(Image.class.equals(clazz)) {
            CMSForeignObject cmsForeignObject = (CMSForeignObject)foreignObject;
            return (T)repository.findImageById(cmsForeignObject.getId());
        else if(BinaryObject.class.equals(clazz)) {
            CMSForeignObject cmsForeignObject = (CMSForeignObject)foreignObject;
            return (T)repository.findBinaryObjectById(cmsForeignObject.getId());
        else {
            throw new IllegalStateException(
                "We have managed to try and synchronise an unsupported class. This is a bug.");

How's that for a nice bit of logic? Imagine when we have many different types! I know this is only prototype code and there's certainly plenty we can do to improve it. However, it's quite typical to see this sort of stuff in production code whenever you have to go down into type based logic in Java.

The First Scala Version

To come up with a better solution, the first thing I did was to just convert the Java code directly into Scala. Nothing special here, other than using Scala syntax and modifying to return Option[Story] rather than null. So, first the ForeignObject implementation:

class CMSForeignObject(typeName: String, val id: String) extends ForeignObject {
  def localClass[C <: Content]() = typeName match {
    case "Story" => classOf[Story]
    case "Image" => classOf[Image]
    case "BinaryObject" => classOf[BinaryObject]
    case "RelatedLinks" => classOf[RelatedLinks]
    case _ => throw new IllegalStateException("Unknown type: " + typeName)

And the synchronisation code:

class SynchronisationService(repository: ContentRepository) {

  def sync(foreignObject: ForeignObject): Option[Content] = findById(foreignObject.localClass, foreignObject)

  private def findById[T <: Content](clazz: Class[_], foreignObject: ForeignObject) = {
    if ( clazz == classOf[Story] )
    else if ( clazz == classOf[Image] )
    else if ( clazz == classOf[BinaryObject] )
    else throw new IllegalStateException("We have managed to try and synchronise an unsupported class. This is a bug.");

Works fine, but not really much better than the Java code.

Adding Scala Type Goodness

Once I had an initial Scala version I then spent a couple of hours playing around with the code and some Scala type goodness. The resulting synchronisation code is:

class SynchronisationService(repository: ContentRepository) extends ContentFinder {

  def sync(foreignObject: ForeignObject): Option[Content] = findById(foreignObject)

  private def findById[C <: Content, FO <: ForeignObject](foreignObject: FO) =
    Option(finder(findStory) orElse finder(findImage) orElse 
           finder(findBinaryObject) apply foreignObject)

  private def findStory(foreignObject: CMSForeignObject) = repository findStoryById (foreignObject id)
  private def findImage(foreignObject: CMSForeignObject) = repository findImageById (foreignObject id)
  private def findBinaryObject(foreignObject: CMSForeignObject) = repository findBinaryObjectById (foreignObject id)

No nasty if/else block. No need to deal with Class instances. No nasty type casting. Much more readable. Hopefully this is much more friendly for developers and will be far easier to maintain in the future as we come to add more types of content to be synchronised. So, how did I manage to achieve this much more readable code?

The first piece of the puzzle was to eliminate the returning of Class instances from the ForeignObject. Instead we now return a new ContentType instance:

class CMSForeignObject(typeName: String, val id: String) extends ForeignObject {

  def contentType[C <: Content, FO <: ForeignObject] = {
    val contentType = typeName match {
      case "Story" => new ContentType[Story, CMSForeignObject]
      case "Image" => new ContentType[Image, CMSForeignObject]
      case "BinaryObject" => new ContentType[BinaryObject, CMSForeignObject]
      case "RelatedLinks" => new ContentType[RelatedLinks, CMSForeignObject]
      case _ => throw new IllegalStateException("Unknown type: " + typeName)
    contentType.asInstanceOf[ContentType[C, FO]]

The ContentType instance supports generics on both the type of content and the type of foreign object. It captures the actual erased types used via implicit manifests on its constructor:

class ContentType[C <: Content, FO <: ForeignObject]
        (implicit val mC: Manifest[C], val mFO: Manifest[FO]) {

  def compatibleWith(cClass: Class[_], foClass: Class[_]) = (cClass == mC.erasure) && (foClass == mFO.erasure)

It also provides a compatibleWith method that is used internally within the finder code that we will look at below.

This final piece of the puzzle is the implementation of theContentFinder trait that is mixed in to the synchronisation code. This is where most of the type magic occurs in order to simplify the actual synchronisation code:

trait ContentFinder {
  implicit protected def finderToPartialFunction[FO <: ForeignObject, C]
        (handler: LookupHandler[FO, C])(implicit mFO: Manifest[FO], mC: Manifest[C]) =
    new PartialFunction[AnyRef, C] {
      def apply(foreignObject: AnyRef) =
        if ( isDefinedAt(foreignObject) ) handler(foreignObject.asInstanceOf[FO])
        else throw new IllegalStateException("We have managed to try and synchronise an unsupported class. This is a bug.")

      def isDefinedAt(foreignObject: AnyRef) =
        foreignObject.asInstanceOf[FO].contentType.compatibleWith(mC.erasure, mFO.erasure)

  protected def finder[FO <: ForeignObject, C](lookup: FO => Option[C]) = new LookupHandler(lookup)

  protected class LookupHandler[FO <: ForeignObject, C](lookup: FO => Option[C]) {
    def apply(foreignObject: FO) = lookup (foreignObject).getOrElse(null).asInstanceOf[C]

The first insight that I had when working with this code is that the if/else condition across classes is actually a partial function across content types. We have a ForeignObject that can return a ContentType instance that contains the type information about the type of content and the type of foreign object. We therefore just have to see if there is a partial function defined at that particular combination and if there is then we can apply it.

The resulting implementation therefore defines a finder method that takes a function containing the actual code that should be applied to the ForeignObject/Content pair. It returns this wrapped in a LookupHandler instance. The synchronisation code can then just define find functions for each type that it can deal with and pass them to the finder method. Finally we define an implicit conversion for the LookupHandler instances that convert them to partial functions so that we can compose them with the orElse method. The partial function captures the erased types from the handler using implicit manifests and compares these with those captured by the content type to see if the function can be applied.


We can clearly see that the resulting synchronisation code is significantly cleaner, more easily understood and more maintainable than the original Java code. However, this comes at the expense of slightly more lines of Scala code and some fancy logic around finders and implicit partial functions. Is the benefit really worth the extra?

I can see both sides of the argument. I can certainly understand that for the Scala novice that the ContentFinder is somewhat impenetrable. However, its the synchronisation code that is likely to be updated a number of times as more types of content are bought into the domain. The code to deal with all of the type goodness is unlikely to change that much and can be easily maintained by more experienced Scala developer. Therefore the readability and maintainability of the business code wins out in my mind over some extra, well abstracted complexity.

This tends to be the typical approach to the Scala language and one of the reasons that it can be so incredibly powerful. Use all the power of the language and its incredible type system in the framework and support code so that the actual application logic can be as minimal and simple as possible.

1 comment:

  1. Although the ContentFinder looks rather complicated from someone who doesn't know Scala's point of view, the resulting code is definitely a lot simpler. This kind of example will surely assist in the introduction of languages such as scala into java teams.