Coincidental Relationships

Posted in software -

I bought a few technical books for work recently. The plan is to pass them around, have at least two different people review each, and then compare notes. I’m trying to set a good example by going first - “lead by doing” and all that. I’m reading Better, Faster, Lighter Java. There’s a bit where they’re talking about inheritance, with the example of Customers and Employees both being examples of People. A Customer “is-a” Person, and an Employee “is-a” Person. This brings up a pet peeve/programming pitfall.

Inheritance is a cool tool, and as such, there’s the temptation to use it when it isn’t really appropriate. Particularly, problems arise when you assume that similarities between real-world objects translate into inheritance relationships between the data objects you’re using to represent them. Yes, I have a particular case in mind.

I write software for law enforcement agencies. In the real world, suspects, victims, witnesses, and officers are all people. Even the application users are people. It seems obvious to come up with a generalized Person data type. Human beings have a lot of descriptive information you can associate with them: Name, age, height, eye color, address, and so on. By defining all people in these terms, we can build generic data displays that are simple, elegant, and wrong.

The problem is that in practice, the information associated with each type of person is very different. Particularly, officers and suspects have almost nothing in common. Officers are generally only identified by last name and badge number. Suspects will not have a badge number (don’t go there), and may not have a last name. They will have anything else you could use to identify them, which depending on the circumstances could be a lot or very little. They may just have a first name, maybe a nickname. They probably have a detailed physical description, including descriptions of tattoos and scars. If they were apprehended, they might even have ID numbers, fingerprints, and so on. For witnesses, you’ll just have name and contact info - probably address and phone number. Victims will have that, plus information about injury severity and location, treatment, and hospitalization. (I’m not even going to get into application users, who will have passwords, roles, and so on.)

Furthermore, anytime you want to display this information, you have to treat these people differently. You can’t just display a list of people. The officer has to be clearly identified. The suspect has the aforementioned wealth or dearth of information, and also needs to be singled out. Victim, also probably important. Maybe witnesses, associates, and so on can be lumped in together, but that’s about it.

The problem is that we have a generic Person, but anytime we do anything with that data, we have all sorts of conditional checking to do. We can’t really treat them the same way - we have to see what they really are: If the Person is an Officer, then there’s an implication (and perhaps only an implication) that they’ll have a badge number and last name. By the time you’re done, there’s no generic Person code that isn’t overridden in one of its children. You’ve just created another layer of code that someone else will have to dig through to figure out what’s going on.

So how do you avoid this? Don’t try to be so goddamn clever. Don’t start out with some sophisticated relational hierarchy. Write the classes as distinct and separate. You may notice some duplication as you’re writing it, but don’t refactor it right away. The amount of typing you do is not the limiting factor in writing software. Wait until the functionality is in place and you’ve got it ironed out. Then you can see if you simplify or clarify anything by creating a parent class for common functionality.

The key issue is whether there’s a causal link, a real correlation, between your classes; or is it just coincidence? In this case, any commonalities between suspects, victims, and officers are coincidental. If you start requiring first name for officers, or age for witnesses, do you want it to impact the others? Probably not. Does your class hierarchy mean you write less code, or more? That’s the bottom line: Does it simplify things?

Newer article
About Me