Thursday, 30 September 2010

Simple and Elegant Property Model

In this post I talked about the need to create simple and elegant solutions to software problems rather than over engineer. I used as example a property mechanism that I have been working on for my new product. Having now got this to a state where I believe it is both simple and elegant I thought I'd share the evolution of the Scala code as a way of exploring this topic in more detail.

The goal of this particular set of classes is to provide a mechanism whereby a (super)user can define definitions for the properties that can be set against an item. The user creating the item then fills in values for these properties. Both users will be utilising a web interface for defining and setting properties. In addition properties may be mandatory and optional and can support default values.

Please note that all the code below was developed iteratively using a Test-Drive Development (TDD) approach. I'm not going to show the tests here, so please be aware that every iteration through the model was first backed by a test to define the required behaviour.

So, after my initial attempt at a solution that became far too complex and over engineered I rolled back to the simplest thing that I could possibly write:

case class PropertyDefinition(name: String)

case class Property(name: String, value: String)

object Property {
  def apply(definition: PropertyDefinition, value: String) = Property(definition.name, value)
}

This simple starting point meets the minimum requirements of a definition of a property and then a property that maps a name to a string value. While this is indeed simple, it's too simple in that it doesn't meet all the requirements and it's also not particularly elegant in the way it restricts values to be strings.

The next step was to add support for mandatory/optional functionality. The property definition was easy to change:

case class PropertyDefinition(name: String, mandatory: Boolean = false)

However, the property concept is more challenging as it is now possible to have properties with no value associated. With the general Scala goal of avoiding using null to represent no value, I instead modified property to hold and Option[String] instead:

case class Property(name: String, value: Option[String])

object Property {
  def apply(definition: PropertyDefinition, value: Option[String]) = value match {
    case None if !definition.mandatory => new Property(definition.name, None)
    case Some(_) => new Property(definition.name, value)
    case _ => throw new IllegalArgumentException("Value must be supplied for property")
  }
}


This is still a pretty simple solution. However it's still missing the requirement for default value support and it now seems less elegant because instead of creating a property with

Property(nameDefinition, "value")

I have to use

Property(nameDefinition, Some("value"))

Also, I'm still linked tightly to String value types. First I decided to deal with the default value requirement:

case class PropertyDefinition(name: String, mandatory: Boolean = false, defaultValue: Option[String] = None)

case class Property(name: String, value: Option[PropertyValue])

object Property {
  def apply(definition: PropertyDefinition, value: Option[String]) = value match {
    case None if definition.defaultValue != None => new Property(definition.name, definition.defaultValue)
    case None if !definition.mandatory => new Property(definition.name, None)
    case Some(_) => new Property(definition.name, value)
    case _ => throw new IllegalArgumentException("Value must be supplied for property")
  }
}


I now have the simplest solution that meets all my requirements. Many developers would stop at this point, but I still feed uncomfortable that the code is not particularly elegant. I don't want to over engineer additional complexity but I'd like to have a cleaner interface to the classes that I have created so far and I'd also like to leave more scope for future changes where possible.

My first improvement was to pull out the tight coupling to a String and instead introduce a PropertyValue class as a wrapper around a string:

case class PropertyValue(content: String) 

case class PropertyDefinition(name: String, mandatory: Boolean = false, defaultValue: Option[PropertyValue] = None)

case class Property(name: String, value: Option[PropertyValue])

object Property {
  def apply(definition: PropertyDefinition, value: Option[PropertyValue]) = value match {
    case None if definition.defaultValue != None => new Property(definition.name, definition.defaultValue)
    case None if !definition.mandatory => new Property(definition.name, None)
    case Some(_) => new Property(definition.name, value)
    case _ => throw new IllegalArgumentException("Value must be supplied for property")
  }
}


Now my property code is not fixed to a string, although my initial implementation of property value is still string based. This has the benefit of keeping the initial version simple but providing scope for someone to refactor the implementation of property value in the future to support different types or other requirements.

So, one element of elegance of the code is improved. Unfortunately it comes at the expense of another. To create a property you now have to type:

Property(nameDefintion, Some(PropertyValue("value"))

Yuk! I need a way to easily build an Option[PropertyValue] from a String in order to avoid all the boiler plate code. This is where Scala implicit conversions come to the rescue. I think their overuse can create code which can be very difficult to debug as you are never certain what implicit conversions are being applied. However in this case they are perfect for bringing the final level of elegance to the code:

object PropertyValue {
  implicit def wrapString(content: String) = content match {
    case "" => None
    case _ => Some(new PropertyValue(content))
  }
  implicit def unwrapString(value: Option[PropertyValue]) = value match {
    case Some(v) => v.content
    case None => ""
  }
}

Given these implicit conversions I can now type:

val property = Property(nameDefinition, "value')

val content: String = property.value

This ticks all my boxes for simplicity and elegance. The code is clean and simple; all the requirements are met. The interface to the classes is also clean and intuitive. However, the solution is also elegant in that it provides ways for future extension without adding excessive additional complexity. Future developers should be able to enhance this code to support more features (such as validation or type safe values) without having to rewrite it from scratch. Time to move on to the rest of my domain model...

Wednesday, 29 September 2010

Simple and Elegant

One of the most common traits that I come across in us software developers is the tendency to over-engineer our code. We try to future-proof our code and we try to make it deal with scenarios  that may never even be a requirement for the system we are building. Recognising when this is happening and being able to create a simpler and more elegant solution is something we should all be striving towards.

As an example, I'm currently using some of my personal time to build a solution for classifying large amounts of information (more about this in a future post). I'm currently undertaking a TDD exploration of the core domain model. One of the requirements that I have identified is the ability for a user to define a set of properties that can be held against each piece of information. Each property has a name that maps to a value - a fairly simple concept.

As I was writing the test cases and domain classes for this requirement I drifted down a route of allowing typed property values. Thus, rather than just String values, they could be Int, Boolean, ThingaMaBob or whatever you want them to be. However, definitions of properties and assignment of values are managed via a user interface rather than in compiled code so this solution started to grow very complex as it dropped into a world of runtime type validations and conversions and so forth. 

As I wrote more and more code the solution began to smell more and more like over engineering (when you start needing Scala Manifests in a domain model you know something isn't quite right). I could see myself getting bogged down in getting this working, so I decided to take a step back and look at the requirement again. Was I sure that I would actually need typed property values? No, maybe in future but not for any specific cases I could currently identify. Was having typed property values allowing me to get a greater understanding of my domain model? No. Was the requirement for typed property values a valid one, or just an excuse for over engineering? The ultimate conclusion was that I could actually do everything I wanted for the foreseeable future with simple String values and some basic validation rules.

With my new found simplification in hand I dumped a whole load of complex code and ended up with a very simple, very elegant solution that works perfectly for my current needs.

Sadly, I all too often see developers go down this same route but fail to realise that they are over engineering. They then spend an excessive amount of time building a solution that is fragile, difficult to test and unresponsive to changes in requirements that might be different to those they originally imagined when building their solution. These over engineered solutions just contribute to the technical debt of projects that allow them to exist.

Always aim for the simplest and most elegant solution that meets the current set of requirements - nothing more, nothing less. Don't over engineer and don't try to predict future requirements. Note the emphasis on elegant: just because something is very simple doesn't mean it is a good solution. A simple solution can still be built from messy, poorly structured code that is just as difficult to maintain and enhance as its complex counterpart.

If we all focus on this goal of simple, elegant code then software becomes simpler, easier to understand and much easier to maintain and enhance. This then benefits us all through increased productivity and improved ability to deliver.

Tuesday, 21 September 2010

The Elusive Inspiration

I'm currently working on a software idea that I have had running around in my head for a number of years. It's based around concepts for classification of information, exploration of relationships between this information and how this changes over time. The domain model is fairly complex as I want to keep it fairly generic and configurable so that individual users can customise it for their own types of information and classifications.

Unfortunately I have come up against a mental block when it comes to this domain model. I think that the ideas have been going round in my head for so long and I have so many different variations and alternatives that I  just can't find a way to move forward with something concrete.

This has got me thinking about the mental process and how to unblock my creative self on this project. To this end I've been considering the following ideas...

Go Away and Do Something Else
Often I find that if I can't get my head around something then going off and doing a different project for a while allows my brain time to unwind and process my ideas in the background. After a while I then come back to my original project and more often than not this allows me make good progress. However, on this current project I've been going off and doing something else for ages and it's time I made some progress.

Go For a Thinking Walk
I love walking - especially going somewhere I haven't been before and just exploring. Often I find that during a walk of this type I am able to focus and concentrate on a single topic until I come to some form of resolution. I think I'll try this one at lunchtime today.

Whiteboard Session
I am a very visual person. I find exploring concepts on a whiteboard a great solution for fleshing out designs and examining alternatives. I love colours as well and often find a visual with different element types in different colours really clarifies my understanding. This will probably be my next approach to explore the results of my thinking walk.

Exploratory Coding
Sometimes even the best thinking and most perfect whiteboard drawings just don't capture the essence of a problem. At times like these a couple of hours of hacking out an exploratory solution in code can clarify all sorts thoughts and ideas. When doing this sort of coding I like to use a very dynamic language (Ruby is my general choice) as I find these allow me to capture lots of thinking in a short amount of time.  Increasingly I'm also doing exploratory coding in a Functional language (usually Haskell) as I find this a great way to structure my thinking.

Test Driven Development
A completely alternative approach is to avoid all the up front thinking of how the domain model will work. Instead, define some test cases that describe what the domain model should do and then derive the model out of these tests. My final implementation will certainly be carried out in this way, but for the moment I'm still thinking at a higher level. I feel that I need to better understand exactly what I'm trying to achieve before writing production code.

How do you resolve mental block issues such as these? What other techniques are available?

Additions:


Build a Mind Map
Another thought that I had was to build a mind map to explore all of the thoughts and alternatives. By restricting this mind map to just the domain model that I am interested in I might be able to look at the problem in a new light. I use ConceptDraw MINDMAP on my Mac for this purpose.

Monday, 13 September 2010

How much new technology?

As a software architect I regularly have to battle with the question of which and how many new technologies to introduce on any new projects that I'm working on. New projects (or ones offering a full architecture overhaul) are sufficiently rare that there are always numerous language, product or library developments since the last project. So, how do you decide what to include and what to exclude?

The 50% Rule
Over a number of years and many projects I have come to rely on a 50% rule for new technologies. Basically put, any new project can contain up to 50% new technologies, with the remainder being technologies that are well understood and that have been previously used by the majority of team members.

As an example, a number of years ago I started work on a greenfield project for a company that wanted to build some web-based administration tools for their existing (config file driven) applications. This was their first foray into Java and web development and I was given a team consisting of one senior developer and a number of junior devs with basic Java/Jsp training plus a bit of struts 1.x. 

After some careful consideration we decided to introduce three major new technologies: Maven for builds, Hibernate for persistence and Spring Framework for the core application framework. We then decided to stick with proven technologies that the team had existing experience of for the remainder of the stack: MySQL as a database and Struts, JSP and JSTL for the presentation layer. Even though there was considerable motivation for using an alternative presentation stack (Tapestry, which I was using at the time or the early release of what would later become Apache Wicket - which I was a contributor to) the risk of a completely new technology stack, top to bottom, was just too large. 

The advantage of taking this mixed approach to technology is that there was a lot less uncertainty on the project and the team is able to be immediately productive using the technologies they already know. The business is therefore more comfortable that progress is being made (rather than the team just playing with 'cool' stuff) and the team therefore has the leeway to learn how to get the most out of the new technologies and to correct any initial mistakes in how they are being used. I just don't feel that this would be the case if every technology being utilised was a new one.

However, sometimes a project is so radically different from anything that has gone before that you just can't find 50% existing technology to bring with you. Thus...

The Incremental Rule
Occasionally, a project departs so much from what has gone before that almost all the technology must be new. By definition this usually means technology that has not been used in a previous project, may not be familiar to the team and/or may never before have been put into production. For these projects some additional 'up-front' work is essential in order to de-risk the new technologies being used.

By way of an example, I am currently doing some initial explorations into an idea for a product for subject classification, relationship navigation and searching. This project will need to deal with very large data sets (> 100 million subjects) and need to support a high degree of concurrency and scalability. For this reason I'm looking into some particular technology choices for this application: Scala, Akka and Cassandra. Given the move to Scala as the main programming language I'm also planning on using the Simple Build Tool (SBT) and investigating the Lift web framework as well. I'm moving my version control from Subversion to Git. Finally, the application will be cloud based so I'm looking to automate deployment and scalability on AWS.

All of the above is a massive amount of new technology to take on board. Were I to just jump in and start the project I'm guaranteed that I would get into difficulty and end up with either major architecture problems or need to rewrite large parts of the application as I learn more about each technology. Instead. I've broken the technologies down into a number of smaller 'starter' projects so that I can get familiar with each in isolation and then build on more technologies gradually.

Starting out, I've built a couple of simple projects in Scala, using SBT, to get familiar with the language and to improve my Functional Programming knowledge. Next, I've started to add a web-based UI using the Lift framework. Once I'm happy with these I'm then going to build something using Scala, Akka and Cassandra. Thus, by the time I actually start building the final solution, the only unknown technology will be automated deployment to AWS (and even this is not a total unknown as I've manually deployed to AWS before). 

Building something with so much new technology is always a big risk. But, by taking an incremental approach at de-risking each technology I can manage the risk within acceptable levels and ensure that my final solution is not crippled by lack of knowledge of my chosen technologies.

This then follows on to my final rule....

The Build a 'Strawman' Rule
Regardless of how you select the technologies that will be included in the project, the first iteration should always be to build an end-to-end 'strawman' architecture. This strawman should include all of the selected technologies playing nicely together. Builds should be automated and initial automated test suites (at all levels) should be up and running. Finally, any automated deployment should be in place. The strawman doesn't need to do much functionally, but if it contains enough processing to also allow some basic scalability and performance tests as well then even better.


By selecting just the right blend of new and existing technologies and spending time de-risking when there are many new technologies we can ensure that we always start a project with confidence that we can make the technologies work for us. Then, by starting with an architectural 'strawman' we further ensure that the technologies work together and eliminate the huge integration risk that we would otherwise hit late in the project when it might be too late to resolve it.

Thursday, 9 September 2010

Rich vs Anaemic Domain Models

Is your domain model a rich one, or is it one of those ultra anaemic ones? Where should business logic live - encapsulated inside the domain or in a number of higher-level services/controllers? Over and over I seem to have had the same debate with different companies and developers. So, once and for all here is my definitive view on the subject...

Anaemic Domain Models
An anaemic domain model is one typified by the Java BeansTM style of programming: simple Java domain classes containing just fields and setter/getter methods for those fields; logic for manipulating the domain objects is contained in higher level classes (typically a service layer). 

For example, consider the following simple anaemic domain model consisting of a Person and a list of Addresses associated with that person:

1:  public class Person {  
2:    
3:    private Long id;  
4:    private String forename;  
5:    private String surname;  
6:    private Date dob;  
7:    private List<Address> addresses;  
8:    
9:    public Person() {  
10:    }  
11:    
12:    public Long getId() {  
13:      return id;  
14:    }  
15:    
16:    public void setId(Long id) {  
17:      this.id = id;  
18:    }  
19:    
20:    public String getForename() {  
21:      return forename;  
22:    }  
23:    
24:    public void setForename(String forename) {  
25:      this.forename = forename;  
26:    }  
27:    
28:    public String getSurname() {  
29:      return surname;  
30:    }  
31:    
32:    public void setSurname(String surname) {  
33:      this.surname = surname;  
34:    }  
35:    
36:    public Date getDob() {  
37:      return dob;  
38:    }  
39:    
40:    public void setDob(Date dob) {  
41:      this.dob = dob;  
42:    }  
43:    
44:    public List<Address> getAddresses() {  
45:      return addresses;  
46:    }  
47:    
48:    public void setAddresses(List<Address> addresses) {  
49:      this.addresses = addresses;  
50:    }  
51:    
52:    @Override  
53:    public String toString() {  
54:      ...  
55:    }  
56:    
57:    @Override  
58:    public boolean equals(Object o) {  
59:      ...  
60:    }  
61:    
62:    @Override  
63:    public int hashCode() {  
64:      ...  
65:    }  
66:  }  
67:    
68:    
69:    
70:  public class Address {  
71:    
72:    private String line1;  
73:    private String line2;  
74:    private String line3;  
75:    private String town;  
76:    private String county;  
77:    private String postcode;  
78:    private String countryCode;  
79:    
80:    public Address() {  
81:    }  
82:    
83:    public String getLine1() {  
84:      return line1;  
85:    }  
86:    
87:    public void setLine1(String line1) {  
88:      this.line1 = line1;  
89:    }  
90:    
91:    public String getLine2() {  
92:      return line2;  
93:    }  
94:    
95:    public void setLine2(String line2) {  
96:      this.line2 = line2;  
97:    }  
98:    
99:    public String getLine3() {  
100:      return line3;  
101:    }  
102:    
103:    public void setLine3(String line3) {  
104:      this.line3 = line3;  
105:    }  
106:    
107:    public String getTown() {  
108:      return town;  
109:    }  
110:    
111:    public void setTown(String town) {  
112:      this.town = town;  
113:    }  
114:    
115:    public String getCounty() {  
116:      return county;  
117:    }  
118:    
119:    public void setCounty(String county) {  
120:      this.county = county;  
121:    }  
122:    
123:    public String getPostcode() {  
124:      return postcode;  
125:    }  
126:    
127:    public void setPostcode(String postcode) {  
128:      this.postcode = postcode;  
129:    }  
130:    
131:    public String getCountryCode() {  
132:      return countryCode;  
133:    }  
134:    
135:    public void setCountryCode(String countryCode) {  
136:      this.countryCode = countryCode;  
137:    }  
138:    
139:    @Override  
140:    public String toString() {  
141:      ...  
142:    }  
143:    
144:    @Override  
145:    public boolean equals(Object o) {  
146:      ...  
147:    }  
148:    
149:    @Override  
150:    public int hashCode() {  
151:      ...  
152:    }  
153:  }  
154:    

Then, we define some services on top of the domain model that obtain, use and update the domain objects to implement the functionality required by the business:

1:  public class PersonService {  
2:    
3:    private final PersonRepository repository;  
4:    
5:    public PersonService() {  
6:      repository = new PersonRepository();  
7:    }  
8:    
9:    public Long getPersonId(String surname, String forename) {  
10:      return repository.findPerson(surname, forename);  
11:    }  
12:    
13:    public Person getPerson(Long personId) {  
14:      return repository.getPerson(personId);  
15:    }  
16:    
17:    public void addAddress(Long personId, Address newAddress) {  
18:      Person person = repository.getPerson(personId);  
19:    
20:      List<Address> addresses = person.getAddresses();  
21:      if ( addresses == null ) {  
22:        addresses = new ArrayList<Address>();  
23:        person.setAddresses(addresses);  
24:      }  
25:      addresses.add(newAddress);  
26:    }  
27:    
28:    public void makeDefaultAddress(Long personId, Address defaultAddress) {  
29:      Person person = repository.getPerson(personId);  
30:    
31:      List<Address> addresses = person.getAddresses();  
32:      if ( addresses == null || !addresses.contains(defaultAddress) ) {  
33:        throw new IllegalArgumentException();  
34:      }  
35:    
36:      // Default address is always the first address in the list  
37:      addresses.remove(defaultAddress);  
38:      addresses.add(0, defaultAddress);  
39:    }  
40:  }  
41:    
42:    
43:  public class MailShotService {  
44:    
45:    public void sendMailShot(Person person, Long mailShotId) {  
46:      List<Address> addresses = person.getAddresses();  
47:      if ( addresses == null || addresses.isEmpty() ) {  
48:        // No mailshot can be sent  
49:        return;  
50:      }  
51:    
52:      Address sendTo = addresses.get(0);  
53:    
54:      // Code here to locate the mailshot and call the printing routine!  
55:    }  
56:  }  
57:    

The above code is overly simplistic (and somewhat contrived), but it demonstrates a key problem with this approach, namely that the encapsulation of the addresses property is broken. Specifically:

1) The fact that addresses are stored as a list is exposed to the service layer (and beyond). In fact, the PersonService is even responsible for creating the list instance. Changing the way addresses are stored in Person would mandate changing all the services (and perhaps controllers, pages and so on) that work with Person objects.

2) The knowledge that the first address in the list is the default address has escaped the domain model into the service layer. In particular there are two different services that both contain this knowledge. Should we want to change this approach we have to change and test code in two places (or more likely we change it in one place, forget the other and then wonder why our app behaves inconsistently).

Now, many proponents of the anaemic domain approach will tell you that the above problems can be avoided by correctly implementing your service layers. For example, only one service class is ever used to deal with Person. Any other services, controllers or whatever that need to access Person must use this service to do so. For example, the PersonService could have a new method: getDefaultAddress which would be called by the MailShotService. However, in my experience this never works for the following reasons:

1) Unless your developers are INCREDIBLY disciplined then this approach will always be violated. It's right before a deadline and a developer needs to access the default address from some controller in the system. Will they do all the work to inject the PersonService or will they just pull the first element off the address list? Most likely the second, and as soon as it's been done once then you can guarantee that that code will at some time be reused as a template for other code and the problem just proliferates from there. In 15 years I have never seen an anaemic domain pattern where this hasn't happened.

2) You end up with the higher level services and the controllers all having to inject large numbers of other services in order to get anything done. This results in a tighter coupling of the system and significantly increases the complexity of unit and integration testing (unit: need to define many mocks; component: need to pull in almost the whole system to test just one component). In every case I've seen, the anaemic domain pattern done in this way results in a small handful of controllers or services that pull in almost every other service in the system, which makes them really difficult to test and even more difficult to modify.

In my humble opinion, the anaemic domain model should be considered one of the most destructive anti-patterns of our time. It breaks the concept of good object oriented design and encapsulation and leads to service layers (and above) that become difficult to maintain and overly complex.

Rich Domain Models
An alternative is the rich domain model, where we attempt to encapsulate as much information about the domain inside the actual domain classes. We then expose these rich objects to higher levels, which can utilise the domain objects directly with less need for services containing arbitrary domain and business logic.

Looking at our RICH Address and Person objects:

1:  public class Person {  
2:    
3:    private Long id;  
4:    private String forename;  
5:    private String surname;  
6:    private Date dob;  
7:    private final List<Address> addresses;  
8:    
9:    public Person(final String forename, final String surname, final Date dob) {  
10:      this.forename = forename;  
11:      this.surname = surname;  
12:      this.dob = new Date(dob.getTime());  
13:      this.addresses = new ArrayList<Address>();  
14:    }  
15:    
16:    public Long getId() {  
17:      return id;  
18:    }  
19:    
20:    public void setId(final Long id) {  
21:      this.id = id;  
22:    }  
23:    
24:    public String getForename() {  
25:      return forename;  
26:    }  
27:    
28:    public void setForename(String forename) {  
29:      this.forename = forename;  
30:    }  
31:    
32:    public String getSurname() {  
33:      return surname;  
34:    }  
35:    
36:    public void setSurname(String surname) {  
37:      this.surname = surname;  
38:    }  
39:    
40:    public Date getDob() {  
41:      return new Date(dob.getTime());  
42:    }  
43:    
44:    public void setDob(Date dob) {  
45:      this.dob = new Date(dob.getTime());  
46:    }  
47:    
48:    public void addAddress(final Address address) {  
49:      addresses.add(address);  
50:    }  
51:    
52:    public void removeAddress(final Address address) {  
53:      addresses.remove(address);  
54:    }  
55:    
56:    public Collection<Address> getAllAddresses() {  
57:      return Collections.unmodifiableCollection(addresses);  
58:    }  
59:    
60:    public void makeDefaultAddress(final Address address) {  
61:      if ( !addresses.contains(address) ) {  
62:        throw new IllegalArgumentException();  
63:      }  
64:    
65:      addresses.remove(address);  
66:      addresses.add(0, address);  
67:    }  
68:    
69:    public Address getDefaultAddress() {  
70:      if ( addresses.isEmpty() ) throw new IllegalStateException();  
71:      else return addresses.get(0);  
72:    }  
73:    
74:    @Override  
75:    public String toString() {  
76:      ...  
77:    }  
78:    
79:    @Override  
80:    public boolean equals(Object o) {  
81:      ...  
82:    }  
83:    
84:    @Override  
85:    public int hashCode() {  
86:      ...  
87:    }  
88:  }  
89:    
90:    
91:  public class Address {  
92:    
93:    private final String line2;  
94:    private final String line1;  
95:    private final String line3;  
96:    private final String town;  
97:    private final String county;  
98:    private final Postcode postcode;  
99:    private final Country country;  
100:    
101:    public Address(final String line1, final String line2, final String line3,  
102:            final String town, final String county, final Postcode postcode,  
103:            final Country country) {  
104:      this.line1 = line1;  
105:      this.line2 = line2;  
106:      this.line3 = line3;  
107:      this.town = town;  
108:      this.county = county;  
109:      this.postcode = postcode;  
110:      this.country = country;  
111:    }  
112:    
113:    public String getLine1() {  
114:      return line1;  
115:    }  
116:    
117:    public String getLine2() {  
118:      return line2;  
119:    }  
120:    
121:    public String getLine3() {  
122:      return line3;  
123:    }  
124:    
125:    public String getTown() {  
126:      return town;  
127:    }  
128:    
129:    public String getCounty() {  
130:      return county;  
131:    }  
132:    
133:    public Postcode getPostcode() {  
134:      return postcode;  
135:    }  
136:    
137:    public Country getCountry() {  
138:      return country;  
139:    }  
140:    
141:    @Override  
142:    public String toString() {  
143:      ...  
144:    }  
145:    
146:    @Override  
147:    public boolean equals(final Object o) {  
148:      ...  
149:    }  
150:    
151:    @Override  
152:    public int hashCode() {  
153:      ...  
154:    }  
155:  }  
156:    
157:    
158:    

From the above you can see a couple of significant changes. Firstly, I've made as much of the data as possible immutable. In particular I've made addresses immutable and thus to change an address you have to remove the old one and insert a new one.  This stops any users of the domain model from making changes to objects that should always be managed within the domain. Secondly, the addresses field is fully encapsulated. Users of the domain model know none of its implementation detail and they cannot manipulate the contents of the underlying collection as this is never exposed in a mutable form.

Additionally, I've added specific types for Postcode and Country which will encapsulate all the conversion and validation logic for converting between user entered Strings and their actual meaning - logic that would normally be in a controller or service in an anaemic domain model.

This approach greatly simplifies the services layer. The PersonService needs only to provide methods to get the Person and the MailShotService can just call a simple get method on the Person object:

1:  public class PersonService {  
2:    
3:    private final PersonRepository repository;  
4:    
5:    public PersonService() {  
6:      repository = new PersonRepository();  
7:    }  
8:    
9:    public Long getPersonId(final String surname, final String forename) {  
10:      return repository.findPerson(surname, forename);  
11:    }  
12:    
13:    public Person getPerson(final Long personId) {  
14:      return repository.getPerson(personId);  
15:    }  
16:  }  
17:    
18:    
19:  public class MailShotService {  
20:    
21:    public void sendMailShot(final Person person, final Long mailShotId) {  
22:      Address sendTo = person.getDefaultAddress();  
23:    
24:      // Code here to locate the mailshot and call the printing routine!  
25:    }  
26:  }  
27:    

However, this is not the end of the story as there are still improvements to be made. In particular, exposing the ability to change a domain object in a layer above the services still is problematical: needs session in view pattern; may result in multiple places needing modification if a new cross-cutting concern is required (e.g. audit changes). So, it still has some of the weaknesses of the anaemic model.

We can therefore refine the domain objects even further through introduction of an interface to represent the public face of our main domain entities that expose only the accessor functionality:

1:    
2:  public interface Person {  
3:    
4:    Long getId();  
5:    
6:    String getForename();  
7:    
8:    String getSurname();  
9:    
10:    Date getDob();  
11:    
12:    Collection<Address> getAllAddresses();  
13:    
14:    Address getDefaultAddress();  
15:  }  
16:    
17:    
18:  public class BasicPerson implements Person {  
19:    
20:    ...  
21:  }  
22:    

Now we can return a Person instance that clients of the domain object can use to access the domain details, but they cannot mutate them via this interface. Then, I can update the PersonService to contain all the methods for modifying Person instances:

1:  public class PersonService {  
2:    
3:    private final PersonRepository repository;  
4:    
5:    public PersonService() {  
6:      repository = new PersonRepository();  
7:    }  
8:    
9:    public Long getPersonId(final String surname, final String forename) {  
10:      return repository.findPerson(surname, forename);  
11:    }  
12:    
13:    public Person getPerson(final Long personId) {  
14:      return repository.getPerson(personId);  
15:    }  
16:    
17:    public void modifyPerson(final Long personId, final String forename,  
18:                 final String surname, final Date dob) {  
19:      BasicPerson person = repository.getBasicPerson(personId);  
20:      person.setForename(forename);  
21:      person.setSurname(surname);  
22:      person.setDob(dob);  
23:    }  
24:    
25:    public void addAddress(final Long personId, final Address newAddress) {  
26:      BasicPerson person = repository.getBasicPerson(personId);  
27:      person.addAddress(newAddress);  
28:    }  
29:    
30:    public void removeAddress(final Long personId, final Address newAddress) {  
31:      BasicPerson person = repository.getBasicPerson(personId);  
32:      person.removeAddress(newAddress);  
33:    }  
34:    
35:    public void makeDefaultAddress(final Long personId, final Address defaultAddress) {  
36:      BasicPerson person = repository.getBasicPerson(personId);  
37:      person.makeDefaultAddress(defaultAddress);  
38:    }  
39:  }  
40:    

Finally, I make the constructor of the BasicPerson protected and add a factory so that there is now only one place that Person instance can be created (I also did the same for address, but it's pretty simple so I haven't shown it here):

1:  public class PersonFactory {  
2:    
3:    private final PersonRepository repository;  
4:    
5:    public PersonFactory() {  
6:      repository = new PersonRepository();  
7:    }  
8:    
9:    public Person create(final String forename, final String surname, final Date dob) {  
10:      BasicPerson person = new BasicPerson(forename, surname, dob);  
11:      repository.add(person);  
12:      return person;  
13:    }  
14:  }  
15:    

Thus, I have clean, well encapsulated domain objects that don't leak details of their internal implementation to the outside world. Wherever possible data has been made immutable to avoid accidental change. Any clients of the domain model can access its state via the exposed interface, but only the service can modify this state - thus making a single location for adding cross-cutting concerns (such as audit). I can therefore be certain that any code using my domain model will not be dependent on implementation details and will not be impacted by changes (provided I ensure the Person interface contract doesn't change).

You just don't get these benefits from an anaemic model without taking incredibly great care and ensuring that every developer who uses your code in the future also takes the same level of care. With a rich, well encapsulated domain model you protect yourself and those who use your code in the future by preventing the bad usage patterns from ever happening (even accidentally).

Wednesday, 8 September 2010

Programming Challenge: BINGO Part 3

In previous posts here and here I started to look at generating Bingo cards using a Functional Programming approach. Firstly I built a generator for individual rows. Then  I created a recursive generator for building single Bingo cards. In this post I look at building multiple cards to create a Bingo book page.

(Unfortunately the code for this post is on my poorly MacBook Pro which I haven't got back yet. I'm therefore posting descriptions/lessons learnt only - sorry!)

UK Bingo pages are made up of 6 separate cards arranged in one column. As you may recall, each card has 3 rows of 9 cells each, with 5 of these cells containing numbers and 4 containing blanks. That makes 90 cells in total.

Each number 1 to 90 appears only once on the page. Numbers 1 to 9 can only be in the first column of any card, 10 to 19 only in the second column and so on.

To generate the pages I adopted the same approach as for cards: generate templates recursively; validate they need the card rules; then insert the numbers. If the page is not valid then generated cards are gradually rolled back and regenerated until a valid page is produced.

This approach worked fine for cards, and only a small number of rollbacks was required until a valid card was created. However, the rules for pages are somewhat more complex. After writing the recursive code I was finding that in some cases over 300 rollbacks were required to generate a valid page. Page generation was averaging at 15ms per page on my very fast MBP.

This made me consider re-evaluating my approach. In the end I decided to stick with it the way it is. Were my goal to generate vast numbers of pages in large batch runs then performance would be insufficient. However I'm looking at just being able to generate a small number of pages for a family game, so even a second or so to do so would not be a problem. A pragmatic approach to performance!

Incidentally I did go back to revisit this and try to write a generator that always produces totally valid pages first time (without any rollbacks). I implemented this in Haskell, but I never managed to come up with a set of rules/patterns that worked in every case. I'll leave that as an exercise to the reader. Jam donut and credit on my blog to them contributor of the first working Functional solution!

Now all that remains is hacking together a simple web front end for displaying the generated cards so that they can be printed out. Time to play with the Lift framework I think.

Thursday, 2 September 2010

When Points Become Hours

I was recently working on a Story Pointing exercise for one of my customers who is running a Scrum based project using User Stories (for those not familiar with User Stories, Scrum and Story Pointing, please take a look here and here). This exercise took the form of a Planning Poker session (for info on Planning Poker look here). This all works great in theory, but this project had fallen into one particular trap.....
The notion of a Story Point is that it is supposed to be a measure of story size and complexity relative to other stories in the same backlog. So, if I start with a story that I estimate as 2 points then a story that seems twice as big and complex would be 5 points (on a scale 1, 2, 3, 5, 8, 13, 20, 40, 100), while a story half the size and complexity would be 1 point. It shouldn’t be necessary to know exactly how the story will be implemented in order to gauge its size and complexity relative to other stories.
However, this particular project had over time reached a stage whereby a Story Point had become associated with a particular number of hours development and test effort. This is a dangerous situation to get in to, but very easy to achieve - especially on long running projects like this one where a large amount of historic data had been collected supporting this association.
Two specific problems that occur when Story Points become measures of number of hours of implementation effort are:
Solutionising During Story Pointing
People by nature are generally much more comfortable thinking in terms of tasks and how long they will take as opposed to relative size and complexity. Therefore, when there is a link between the two they tend to switch to the more comfortable approach of asking “how long would it take me to do this piece of work?”. Once they have this figured out they then convert this value back to Story Points.
The problem with this is that to answer the ‘how long’ question you have to know what the solution will be. Therefore you have to work out how you would implement the story then break it down into tasks and then estimate these tasks - usually all done in your head in the space of a few minutes. Chances are some tasks will be missed - meaning the estimate is wrong. Also likely is that when a team later comes to implement the story they will pick a different solution (or the best solution might have changed) and therefore come up with a completely different estimate. 
Any Story Points derived in this way are therefore meaningless unless the people estimating the story immediately go out an implement it using the exact solution used during the pointing process.
Unwillingness to Commit
The other problem with linking Story Points and implementation hours is that people then feel that giving a Story Point estimate is a commitment to complete the story in the associated number of hours. If people feel that this commitment will be enforced then they either become unwilling to commit to a Story Point estimate until they have every last detail nailed down - hence nothing gets sized or sizing takes forever. Alternatively they inflate their point estimates by a significant (usually random) amount to account for uncertainty and unknowns and thus the sizes are no longer meaningful. 
So, when a project gets into this situation, how does it get out of it? Unfortunately, the only real way to overcome the association between Story Points and hours is to repoint the entire backlog. Start with a medium size/complexity story that is pretty well understood (but which has not been implemented or estimated) and  pick an arbitrary size somewhere in the middle of the point range chart (say 5 or 8). This ensures there is no starting link between hours and size. Then estimate all other stories relative to this starting story. This could be a major undertaking for a project with a huge backlog - but it's essential in order to move forward.
And, how do you avoid getting into the same situation again? That is fortunately easier: don’t track hours against stories. A Story Point is a measure of size relative to other stories. Hours are a way of measuring the progress of a team within a sprint. There’s no link between the two and trying to put one in place is always a bad idea.

Wednesday, 1 September 2010

Why Every Developer Should Learn Functional Programming

Nineteen years ago I started my degree course. Our first term of programming was spent learning functional programming using the Miranda language (from which Haskell is derived). At the time I remember most people on the course, myself included, being puzzled by this approach. General consensus was that it must be being done to create a level playing field by invalidating any previous programming experience. When the term was complete we moved on to other languages: first Ada, then C and finally C++. Functional Programming was all but a distant memory.

Many years on (and a number of languages later) I'm now finding those early forays into the Functional world to be of more and more value.

A couple of years ago I learnt Ruby. With support for closures and being able to pass blocks as parameters to methods there were already some functional concepts in play. Now my language of choice is Scala (a hybrid object-functional language) and I'm finding myself developing more and more in the purely functional style, writing much less imperative code.

In fact I've recently dug out my old college notes on Miranda and read some books on Haskell. I've also worked through some examples and prototypes in Haskell in order to bring my Functional Programming skills back up to scratch.

I was amazed at how quickly I dropped back into the Functional style. It got me thinking and I looked back over many years of code. I was surprised to find that it had never really gone away! Even after many years of Java and C++ programming I could still see a heavy influence of Functional Programming in the code I had written and am still writing.

For example, I've always been a great user of immutable data and a far greater user of recursion than my peers. I've also always made regular use of passing functional objects as parameters to methods as a way of separating application specific behaviour from generic algorithms. Those tendrils of my early programming education were all over many years of code!

Then another thought came into my head. Project metrics have always shown that my defect rate is significantly lower than my peers. I'd always put this down to my attention to detail and thoroughness of testing. Could it in fact be the influence of the Functional style on my code? Code written using Functional Programming languages is known to be easier to reason about, and thus less bug prone, mainly due to its lack of side effects. By following elements of this style in my imperative language programming have I bought some of this benefit along with me? After thinking back over the years and looking at more code, I think that maybe I have.

Additionally, the approaches taught to us for Functional Programming were heavily based on a discipline of reasoning about code. Think about the inputs to the function; think about its outputs; think about all the cases to be covered; think about non-success cases; avoid side effects; create an uncomplicated solution and so on. Looking at how I write code today I can see this reasoning discipline present in my thought process. Comparing my thinking to other programmers that I know don't have a Functional Programming background I can clearly see the difference in the way I construct code to the way that they do. Could it be more than just Functional techniques that make the difference to my productivity and quality? I think perhaps it is. I think that that early discipline for reasoning about code and attacking problems with a Functional mindset really has made a difference to my ability to code. 

So, there we have it. Every developer should have a thorough grounding in Functional Programming. Not just because it's going to be vitally important in the future multi-core, multi-threaded, cloud based landscape. But because the influence of Functional Programming makes for better developers who are able to created higher quality code that is easier to reason about. Who would have thought that those University lecturers of 19 years ago would have got it so right. Thanks guys!