Wednesday 28 September 2011

Type Classes and Bounds in Scala

I've been doing some initial experiments with the Scalaz library (https://github.com/scalaz/). This is a great extension to the Scala standard library that adds a wide range of functional programming concepts to the language. Many of its features are based around the concept of Type Classes.

The type class concept originally comes from Haskell, and provides a way to define that a particular type supports a certain behavioural concept without the type having to explicitly know about it. For example, a type class may define a generic 'Ordered' contract and implementations can be defined for different types separate from the definition of those types. Any type that has an implementation of the 'Ordered' type class can then be used in any functions that work with ordering.

The Scala language provides a way to support type classes via its implicit mechanism. In order to get my head around this fully I created some examples to experiment with how this works. Now I fully grasp the mechanisms, the Scalaz library makes much more sense. I therefore thought I'd share my experiment in case it proves useful to anyone and as an aide-mémoire to myself for the future.

In addition, my examples also make use of the different type bounds mechanisms provided by Scala, so I will demonstrate these in my examples as well.

The full source code can be found in the following gist: https://gist.github.com/1247752.

Without Type Classes

In the first example, I built a simple solution that doesn't make any use of the type class concept. This solution is much like you would implement in Java, with an interface defining the behaviour and classes that implement this interface:

trait Publishable {  
  def asWebMarkup: String
}

case class BlogPost(title: String, text: String) extends Publishable {
  def asWebMarkup =
    """|

%s

|
%s
""".stripMargin format(title, text) }

Here we have a trait Publishable that indicates that something can represent itself as a web markup string. Then we have a case class that implements the trait and provides the method to return the markup. Next, we declare a class that can do the publishing:

class WebPublisher[T <: Publishable] {
  def publish(p: T) = println(p.asWebMarkup)
}

Note that we have defined this as a templates class and that we have used the <: bounds notation to require that our type T is only valid if it extends the Publishable trait. All we need to do now is use it:

val post = new BlogPost("Test", "This is a test post")
val web = new WebPublisher[BlogPost]()
web publish(post)

Ok, so that works fine. However, what happens when we either can't or don't want BlogPost to implement the Publishable trait? There can be many reasons for this: perhaps we don't have the BlogPost domain object source; perhaps the object is already quite complex and we don't want to pollute it with publishing knowledge; perhaps it's shared by multiple teams or projects and only ours needs publishing knowledge. So, what do we do?

Without type classes there are some options: we might extends the domain class to implement the trait; we might create a wrapper class or we might create a helper utility. However, all of these result in a level of indirection and complication in our code. Let me explain...

If we internalise the knowledge of the wrapper/helper/sub-class in our publishing code we end up with some horrific type matching:

class WebPublisher {
  def publish(p: AnyRef) = p match {
    case blogPost: BlogPost => BlogPostPublishHelper.publish(blogPost)
    case _ => …
  }
}

However, if we externalise the knowledge of publishing then our client code has to do the conversion:

class WebPublisher {
  def publish(p: Publishable) = ...
}

web publish(new PublishableBlogPostWrapper(blogPost))

Clearly, both solutions lead to fragile boilerplate code that pollutes our main application logic. Type classes provide a mechanism for isolating and reducing this boilerplate so that it is largely invisible to both sides of the contract.

Type Classes with Implicit Views

Fortunately, Scala provides a mechanism whereby we can declare an implicit conversion between our BlogPost and a Publishable version of our instance. So, let's start with the simple stuff:

trait Publishable {
  def asWebMarkup: String
}

case class BlogPost(title: String, text: String) 

So, we now have a blog post that doesn't implement the Publishable trait. Let's now define the conversion that can implicitly turn our blog post into something that supports publishing (we would typically add this into our publishing code rather than the domain object in order to isolate all knowledge of Publishable to just the area that needs it):

implicit def BlogPostToPublishable(blogPost: BlogPost) = new Publishable {
  def asWebMarkup =
    """|

%s

|
%s
""".stripMargin format(blogPost title, blogPost text) }

Our conversion just creates a new Publishable instance that wraps our blog post and implements the required methods. But, how do we make use of this? We have to change our web publisher very slightly:

class WebPublisher[T <% Publishable] {  
  def publish(p: T) = println(p.asWebMarkup)
}

All we have in fact changed is the <: bounds to a <% bounds. This new one is called a view bounds and defines that we can instantiate a WebPublisher with type T only if there is a view (in this case as implicit conversion) in scope from T to Publishable.

Our code to call this remains the same:

val post = new BlogPost("Test", "This is a test post")
val web = new WebPublisher[BlogPost]()
web publish(post)

Fantastic, our publish code gets objects that it knows are publishable, while our client code can just pass domain objects. We have all the boilerplate for conversion separated out from the main code.

However, while this all seems good, this approach does have its downsides. As you can see we are using the wrapper approach: creating a new instance of a Publishable class that wraps the original object. In a high volume system, the additional object allocations of new Publishable wrapper instances on each call to the publish method may have some less than ideal memory and garbage collection impacts.

The other problem with this approach is that is reduces our ability to compose methods in interesting ways. The reason for this is that once the publishing code has actually invoked the implicit conversion it now has the Publishable wrapper rather than the original object. If it passes this Publishable to other methods or classes than these are not aware of the original wrapped type or instance.

We can overcome this problem by modifying the Publishable trait to have a generic parameter type and support a get method to extract the wrapped value - but this then should really be called PublishableWrapper and some of the simplicity starts to break down.

I think the root of the problem here is that type classes are a very functional concept and implicit views tries to coerce these into a hybrid object/functional world. Fortunately, as of Scala 2.8 there is an additional way to implement the type class approach that leads to a more functional style of coding...

Type Classes with Implicit Contexts

An alternative to implicitly converting one class to a wrapper version of that class that adds additional behaviour is to follow more of a helper like approach. In this model we provide an implicit evidence parameter that implements the type class specific behaviour. This is a much more functional approach in that we don't alter the type we are working on. So, on with the code...

trait Publishable[T] {     
  def asWebMarkup(p: T): String
}

case class BlogPost(title: String, text: String)

This is our new Publishable trait and BlogPost class. Note that our Publishable is no longer intended to be implemented or used as a wrapper. Instead, it is now a contract definition of functional behaviour that takes a parameter of type T and transforms it into a String. A much cleaner abstraction. In fact, we can even create a single object instance that implements this behaviour for blog posts:

object BlogPostPublisher extends Publishable[BlogPost] {     
  def asWebMarkup(p: BlogPost) =
     """|

%s

|
%s
""".stripMargin format(p title, p text) }

We are also going to need our implicit. This time however, rather than being a conversion it becomes an evidence that we have something that implements Publishable for the type BlogPost:

implicit def Publishable[BlogPost] = BlogPostPublisher 

We also need to update our WebPublisher a bit:

class WebPublisher[T: Publishable] {  
  def publish(p: T) = println(implicitly[Publishable[T]] asWebMarkup(p))
}

There are two interesting things about this class. First, the bounds has now switched from view (<%) to context (:). The context bounds effectively modifies the class declaration to be:

class WebPublisher[T](implicit evidence$1: Publishable[T]) {  ...
}

The second change is the use of implicitly[Publishable[T]] which is a convenience for getting the implicit evidence of the correct type so that you can call methods on it.

Our code to call this remains exactly the same:

val post = new BlogPost("Test", "This is a test post")
val web = new WebPublisher[BlogPost]()
web publish(post)

One advantage that should be immediately obvious is that we are no longer creating new instances for each implicit use. Instead, we are using the implicit evident parameter (which is in this case an object) and passing our instance to it. This is more efficient in terms of allocations.

Also, we explicitly show where we make use of the implicit evidence, which is clearer. This also means that we are never creating a new wrapped type of T that we pass on to other code. We always pass on instances of type T that have implicit evidence parameters that allow T to behave as a particular type class instance.

Conclusion

We have looked at two different approaches to solving the problem of adding behaviour to an existing class without requiring it to explicitly implement a particular contract or extend a specific base class. Both of these make use of Scala implicits and type classes.

Implicit conversions with view bounds provides an approach where a type T can be converted into the type required by the type class. This works, but there are issues associated with the wrapping or transformation aspects of the conversion.

Implicit evidence parameters within context bounds overcome these problems and provide a far more functional approach to solving the same problem.

Thursday 8 September 2011

Learnings From A Scala Project

I’ve recently been working as an associate for Equal Experts. I’ve been part of a successful project to build a digital marketing back-end solution. The project started out life as a JVM project with no particular language choice. We started out with Java and some Groovy. However, the nature of the domain was one that leant itself to functional transformations of data so we started pulling in some Scala code. In the end we were fully implementing in Scala. This post describes the learning gained on this project.

The Project

The project team consisted of five experienced, full-time developers and one part-time developer/scrum master. One of the developers (me) was proficient in Scala development (18 months, part-time) while the others were all new to the language.

The initial project configuration (sprint zero) was based on a previous Java/Groovy project undertaken by Equal Experts. The initial tooling, build and library stack was taken directly from this project with some examples moved from Groovy to Scala. The initial project stack included:

  • Gradle (groovy based build system) with the Scala plugin bolted in
  • IntelliJ IDEA with the Scala plugin
  • Jersey for RESTful web service support
  • Jackson for Json/Domain Object mapping
  • Spring Framework
  • MongoDB with the Java driver

Tooling

The biggest hurdle faced by the project was tooling, specifically tool support for the Scala language. While Scala tooling has come on a huge amount over the last year, it is still far from perfect. The project faced two specific challenges: build tool support and IDE support.

Build Tool Support

The Gradle build tool is primarily a build tool for Java and Groovy. It supports Scala via a plugin. Similar plugins exist for other build tools like Maven and Buildr.

What we experienced on this project was that this was not the ideal scenario for building Scala projects. Build times for loading the Scala compiler were rather slow and support for incremental compilation was not always as good as we would have liked. More often than not we just had to undertake a clean and build each time. This really slowed down test-driven development.

Currently (in my opinion) the only really workable build solution for Scala is the Simple Build Tool, also known as SBT (https://github.com/harrah/xsbt/wiki). This is a Scala specific build tool and has great support for fast and incremental Scala compilation plus incremental testing.

With hindsight it would probably have been better to switch over from Gradle to SBT as soon as the decision was made to use Scala for the primary implementation language.

IDE Support

The Scala plugin for IntellJ IDEA is one of the best editors for Scala. In particular its code analysis and highlighting has come on in leaps and bounds over the last few months. However, the support for compilation and running tests within the IDE is still less than ideal. Even on core i7 laptops we were experiencing long build times due to slow compiler startup. Using the Fast Scala Compiler (FSC) made a difference, but we found it somewhat unstable.

What was more annoying was that after failed compilations in IDEA, we often found that incremental compilation would no longer work and that a full rebuild of the project was required. This was VERY slow and really hampered development.

One proven way of doing Scala development within IDEA is to use the SBT plugin and run the compilation and tests within SBT inside an IDEA window. This has proven very successful on other Scala projects I have been involved with and again with hindsight we should have investigated this further.

Some developers on the team also found the lack of refactoring options, in IDEA, when working with Scala somewhat limiting. This is a major challenge for the IDE vendors given the nature of the language and its ability to nest function and method definitions and support implicits. Personally this doesn’t bother me quite as much as my early days were spent writing C and C++ code in vi!

Libraries

Scala has excellent integration with the JVM and can make good use of the extensive range of Java libraries and frameworks. However, what we experienced in a number of cases was that integrating with Java libraries required us to implement more code than we would have liked to bridge between the Java and Scala world. Also, much of this code was less than idiomatic Scala.

What we found as the project progressed was that it became increasingly useful to switch out the Java libraries and replace them with Scala written alternatives. For example, very early we stopped using the MongoDB Java driver and introduced the Scala driver, called Casbah. Then we added the Salat library for mapping Scala case classes to and from MongoDB. This greatly simplified our persistence layer and allowed us to use much less and more idiomatic Scala code.

Aside: If you are working with MongoDB then I strongly recommend Scala. The excellent work by Brendan McAdams of 10gen and the team at Novus has created a set of Scala libraries for MongoDB that I think are unrivaled in any other language. The power of these libraries, while maintaining simplicity and ease of use, is amazing.

We never got a chance to swap out some of the other major libraries that the original project structure was built on. However with more hindsight we should perhaps have made the effort to make these swaps as they would have resulted in much less and much cleaner code. Some specific swaps that we could have done:

  • Jersey for Unfiltered or Bowler
  • Jackson for sjson or lift-json
  • Spring for just plain Scala code

Scala projects work well with Java libraries. However when a project is being fully implemented in Scala then it’s much better to swap these Java libraries for their Scala implemented alternatives. This results in much less and more idiomatic Scala code being required.

Team Members

Having highly experienced team members makes a big difference on any project and this one was no exception. The other big advantage on this project was that all of the developers were skilled in multiple languages, even though most had not known Scala before. One thing I have found over many years is that developers who know multiple languages can learn new ones much more easily than those who only know one. I’d certainly recommend recruiting polygot developers for new Scala projects.

It also helped having at least one experienced Scala developer on the team. The knowledge of the Scala eco-system was helpful as was the ability to more quickly solve any challenging Scala type errors. However, the main benefit was knowledge transfer. All of the team members learned the Scala language, libraries and techniques much more quickly by pairing with someone experienced in the language.

Additionally, as the experienced Scala developer I found it very valuable working with good developers who were still learning the language. I found that explaining concepts really clarified my understanding of the language and how to pass this on to others. This will be of great benefit for future projects.

Any new project that is planning on using Scala should always bring on at least one developer who already has experience in the language. It creates a better product and helps the rest of the team become proficient in the language much more quickly.

Object-Oriented/Functional Mix

We found Scala’s ability to mix both object-oriented and functional approaches very valuable. All team members came from an object-oriented background and some also had experience with functional approaches. Initially development started mainly using object-oriented techniques, but leaning towards immutable data wherever possible.

Functional approaches were then introduced gradually where this was really obvious, such as implementing transformations across diverse data sets using map and filter functions. As the team got more familiar with the functional programming techniques more functionality was refactored into functional style code. We found this made the code smaller, easier to test and more reliable.

Perhaps the only real disadvantage of moving to more functional leaning Scala code is that some code became less readable for external developer with less exposure to the functional style. It became necessary to explain some of the approaches taken to external developers: something that would probably have not been necessary with more procedural style code. This is worth bearing in mind for projects were there will be a handover to a team not skilled in Scala and functional programming techniques.

Conclusions

The project was very successful. We delivered on time and more features than the customer originally expected. I don’t think we would have achieved this if the implementation was in Java, so from that point the smaller, more functional code base produced by developing in Scala was a real winner. Using Scala with MongoDB was a major benefit. Tools was the real let-down on the project. Some problems could be alleviated by moving from Java libraries and solutions to their Scala equivalents. However, there is still some way to go before Scala tools (expecially IDEs) match the speed an power of their Java equivalents.