Design Patterns: DRY

A while ago, I was working with a younger man and as I was doing a code review, I came across some code that looked kind of like this:

readColumnA () {
      columnA = columnReader("columnA");
      processColumn(columnA);
}

readColumnB () {
      columnB = columnReader("columnB");
      processColumn(columnB);
}

It isn’t really important what was in the file, or what the processing being done was.  These two methods are basically identical, the only difference is the value passed to columnReader.

I suggested that he change this code to look more like this:

columnProcessor (String columnName) {
      column = columnReader(columnName);
      processColumn(column);
}

Then, you could call it for each column you need to process.

When you remove duplication like this you are making your code DRY.  DRY stands for don’t repeat yourself.  In the first example you can see that we repeated the algorithm for processing the columns for each column we needed to process.  We can see that the only thing that changed was the column name, and extract that part out, and then have a generic method to process columns.

My colleague had heard of DRY, but thought that he shouldn’t use it here since he was pretty sure that we would change and want to process columnA differently than columnB in the future.  While this argument makes a certain amount of sense, we need to remember that we are not psychic, and while we can speculate on how a software project might evolve, we are never really sure.  The requirements and priorities change under our feet, and our guesses are often wrong.

In my opinion, and in my experience it is always better to keep your code clean because you want to be able to make changes easily.  If, at some point, in the example above, column A and column B processing diverge, then (and only then) you should change the code to accommodate this.

 

Code Reviews

The idea of code reviews is fairly well known.  Many businesses practice them regularly, many others agree that it is a good idea.  But, there aren’t a lot of resources out there to help a business understand how to do good reviews.

Why do them?

Let’s start off with a quick review seeing as most of us understand why.

The most common reason people state for doing code reviews is to catch bugs.  When you have a second set of eyes look at a piece of code, they may see some edge cases that you did not account for, or see other silly mistakes that we all make.  This is important, but I think there are other very important benefits of doing code reviews.

Code reviews are an excellent learning platform for both the developer and the reviewer. It is pretty clear why code reviews are a learning opportunity for the developer. The developer gets a chance to gain feedback from his/her colleagues on the code they produced, which sometimes reveals new techniques and new “best practices” that they may not have known.

However, perhaps less clear is why this can be a learning opportunity for the reviewer.  All of the benefits that the original developer gets from having someone review their code can be applied to the reviewer as well.  They may see new things they have not encountered before.  Additionally, the reviewer gets an opportunity to learn about the new feature that was added.  This leads to shared understanding of the code, and greater collective code ownership.

Code standardization can also be a benefit of code reviews, if you decide to have people review on this.  There are a lot of tools that can help make sure you are applying coding standards as well.

Catching defects at this stage will allow us to fix them with minimal effort.  Sometimes a design defect will be built upon and then it could impact a much wider part of the system so when we go back to fix it, it may cause ripple effects throughout the code-base.

What should we review?

There are a lot of discussions about what should be reviewed in a code review.  One that often comes up is coding standards.  As I said above, there are a lot of automated tools that can help ensure that standards are being applied across the board, but they won’t catch things like bad method names or bad variable names to name a few.  I think this is valuable input.  Since many of these “violations” are easily fixed, I think it is reasonable to ask that they are fixed before the task is considered complete.

An obvious think to look for in a code review is bugs.  If you see a bit of code that may cause an error in certain circumstances, please call it out, even if you aren’t sure.  It is better to have the conversation about the piece of code and decide that it is fine than it is to let it slide and later find out that it was an issue all along.

Lastly, if you see some code that could be improved, I think it is acceptable to mention this in your review.  Those changes may not need to happen at this time, but it is a learning opportunity.  If there isn’t time to address the refactor (and people agree that this is a good change), make sure to note it down somewhere (preferrably your ticket tracking software).

Who should review?

I hear this one quite often. I think any developer that might touch the software in question should be involved in the review.  I often hear pushback that this will take too long for the indivdual developers, or that the review might sit out there too long and we will not be able to merge the code.

Doing the code review before the code goes into production may take a little longer upfront, but there are a lot of benefits (as I described above) to doing so.  We can potentially catch errors before we start building on top of them and relying on the faulty code for other parts of the system.

As for the review sitting out too long, there are lots of things that you can put in place to ensure that this doesn’t happen.  Daily stand-ups (or at the very least team check-ins) allow the team to mention that a code review is waiting for reviewers.  You can set up rules for how many people need to review and approve before the code is merged so that you don’t have to wait for the entire team.  Generally speaking, this is usually not an issue because the team wants to get work done.

What about pairing?

In many cases pairing can take the place of the traditional code review.  You get two sets of eyes on the code as it is being written, which can avoid many of the issues we are trying to catch in a code review.

However, I would recommend that code reviews are good practice when pairing especially in a few key circumstances:

a) When the pair is two junior developers.   A more experienced set of eyes might be able to shed some light on potential issues the junior pair would not have known to look for.

b) The new feature is very complex.  If a new feature is very complex having additional eyes on it is a good thing.  Doing this could catch potential bugs the developers did not see, but it also helps the entire team to understand the new code.

How should we review?

This is probably the most important part of this whole article.  We are all a little scared of the code review.  We don’t want to look dumb or bad in front of our colleagues and showing our work can be a little intimidating.

I have heard a couple of really good pieces of advice on this front.

  1. Always assume best intentions.  This goes for the reviewer and the reviewee.  The person who wrote the code is trying to do the right thing…perhaps there is a good reason it is written the way it is.  The person doing the reviewing is also trying to do the right thing, tone doesn’t carry well in written language.
  2. When you are commenting on a piece of code, be nice.  Use “we should” instead of “you should”.
  3. Ask questions about the code rather than accusing them of doing something wrong. “Why did you do this?  Would this other thing have been better?”
  4. Talk to the person face to face.  Sometimes a written discussion leads you in circles, or inflames one or both parties.  Have a face to face if there is an issue that you are unable to resolve.

 

High Cohesion

When I first started learning about object oriented programming, and really digging into the best practices, loose coupling and high cohesion kept cropping up.  I had a hard time trying to keep them straight and remember which one I wanted more of and which one I wanted to reduce.  My biggest difficulty was I couldn’t keep the definitions of coupling vs cohesion straight!

Coupling refers to how much a set of classes rely on one another.  If we have two classes, A and B that each use methods from the other, these classes would be tightly coupled.  If we have those same two classes and only one needs methods from the other, then those are more loosely coupled.

The little metal ends on a garden hose that you can twist together are called couplings.  You use them to join two pieces of hose together.  I like to use this to help me remember that coupling is about how two (or more) classes interact with one another.

Cohesion refers to how much the methods in a particular class belong together.  For example, let’s say we have a class representing a soup can.  The soup can should know about how big it is, what it is made out of, what shape it is and what color it is.  If we start adding information about the soup that is stored inside the can, then we are breaking the cohesion.  Information about the soup is not important to the actual can.  The information about the soup should be contained in a separate class that the can could know about.

Thinking about a cohesive group of people helps me remember what this means.  A cohesive group of people are people that work very well together and really seem to belong together, much the same way as a cohesive class design has methods that really belong together.

I have another post about coupling here.

High Cohesion

When we talk about cohesion, we are really talking about how well the ideas in a class or a data structure belong together.  I mention data structure  here because this principle can and should be used when designing database tables, or any data storage scheme, as well as when you are designing a class.  We are gong to focus on cohesion in code in this article.

Now that we have a better understanding of what cohesion means, why is it important?  Why should we worry about creating things that are really cohesive?  The program will work even if we throw a bunch of different concepts into one pile, right?

Technically, yes.  It is possible to write an entire application in one file.  And…you might even be able to hold all of the context of the application in your head at once while you are writing it.  Just because you can , doesn’t mean you should.

Why not put it all into one class?

There is a term for a class that ends up knowing too much about too many things…”god class”.  This is not the good kind of god, it is the spiteful, vengeful type of god, and you really don’t want to go creating them.

When you have a class that knows too much about too many things, it makes changing that class REALLY dangerous.

When a class knows too much, and does too many things, making a small change can impact many parts of the application, since it is likely used there too!  When we break things out into small classes, those classes are often used in fewer places, and their methods are very well defined.

Also, when everything is in one class, there is often a fear of touching that class.  We cannot understand everything that it is trying to do, so we don’t want to make a change that will cause unknown issues.  When we are afraid to touch classes, we don’t try to improve them, we just try to patch the hole and get out as quickly as possible.  This is not good for the overall health of our application, and is not good for our overall mental health.

 

Single Responsibility

High cohesion and the SRP (Single Responsibility Principle) go hand in hand.  When you design your classes, they should have one main purpose, one main reason to change.

Breaking your classes down this way not only means that each individual class will be changed less often, but it also helps us humans reason about the system as a whole.  Often systems are very complex, and contain a lot of concepts.  If we can break down the ideas into smaller and smaller parts, we can more easily understand these tiny parts, and can then build up our mental model much more easily.

 

Should we store generated files?

Unfortunately, the answer isn’t as simple as yes or no.

I suggest asking yourselves a few questions to help you determine whether it makes sense to store the generated files/resources.

Does storing these files/resources hurt you or your team in any way?

Are you running low on storage space because of these files?  Often the sheer number of these resources can become unmanageable, especially when you start looking at the space they are consuming in source control.

Does it take extra effort to actually store these resources?  If your team needs to go through extra steps to ensure that these items are being stored, they may be a candidate for dropping.

Does the existence of these files confuse people?  When I first started working with React.js, we stored both the source and the generated files.  I didn’t know what the difference was, or where I should be making changes.

Does it take longer to store or retrieve files because of the number/size of these extra files?  This may be hard to discern.  One way to get a feel for this is to simply look at the amount of storage dedicated to these temporary/generated files compared to the rest of the files.  If it seems a large portion is generated files, perhaps you could try removing them from the storage (maybe in a test location).

Are they easily reproduced?

Do you have an automated way to regenerate these files for individual developers? Ideally, the answer is yes (or at least nearly automated).  This could include a step in the build process (that you execute locally when you make changes, or is kicked off auto-magically), or a process that is waiting in the background for someone to make

Does reproducing them happen quickly? If the answer is no, then it might make sense to store these files.  If the files need to be regenerated often, though, even though the process takes time, it might make sense to not store them, as the stored files will be of little to no value.  If this is the case, then I would invest some time in ways to speed up the generation process, or remove the need for the files all together.

Can an individual reproduce them on his/her machine?  Sometimes files need to be generated on a machine with a certain setup (OS, applications, etc) and there are only a few of those places available.  If this is the case, then it might make sense to store the files, especially if they don’t need to be reproduced often.

Do the files need to be reproduced often?

If you need to reproduce the files often, because you are changing inputs, or changing environments, then you probably don’t want to store them.  Storing them is buying you very little in this case, since you will likely reproduce them and overwrite what was previously there.

 

Is it bigger than a breadbox?

In the waterfall method of project management, we are asked to estimate how long a large piece of functionality will take to complete.  Then, all of those parts are added up to get a timeline of when things will be done, and we’re off.

As humans we can estimate how long a small, simple task will take relatively accurately.  When that task becomes larger, and/or more complex, we rapidly get worse at estimation.

Agile methodologies ask us to break the tasks down into smaller bite-sized pieces, which allows us to more accurately estimate how long a task will take.  If this is true, then why do people still do relative sizing in agile projects where the tasks are broken down appropriately?

Experience

Where both of these systems fail is when you take into account who is doing the work.  We all know that each developer works at a different speed.  There is often a strong correlation between the speed at which a developer can complete a task and their experience level (though, there are certainly times when this is not true).

So, now let’s imagine that we have a team of 5 developers and a rather large project we want to complete.  When we go to estimate, who’s estimate do we take?  Do we go with the lowest estimate, the highest, an average?  Do we divide the work up among the developers and get them to estimate the work they are assigned and hope we divided fairly evenly?  What about vacations, sick time, or someone leaving the company?

Relative sizing to the rescue!

Ok.  So, like anything this is not a panacea.  It works, but it takes effort and commitment to make it work.  Let’s start with a definition of relative sizing.

Relative sizing is simply the process of looking at the stories and determining how big a task is in comparison to other tasks.  The nice thing is most people will agree that task a is smaller than task b most of the time.

When doing relative sizing, most teams pick some series of numbers to assign to the sizes (we used powers of 2, others do prime numbers).  When picking numbers, I suggest not using 1,2,3,4… since this is too fine grained (you want some wiggle room).

Now that you have a scale, each team member can give estimates on tasks using these numbers.  When there is a discrepancy between team members, use that as an opportunity to discuss why there are differences.

Sometimes the difference is because the task is unclear, and this gives you an opportunity to clear up any possible misunderstandings.  Sometimes one developer knows something the others don’t and he/she gives a very different estimate.  This is a good learning opportunity.  Sometimes you will have very small discrepancies, like one developer estimates a 2 and the other a 4, there may not be any differences in understanding, just one thought it was bigger than the other, in that case we typically went with the larger estimate, though you can make your own rules.

You may be wondering how assigning tasks points is going to help you determine when a project will get done. Good! Because I haven’t shared that yet.

After you have points assigned to your cards you will need to plan out a couple of iterations (a defined period of time to do work, often 2 weeks).  Ask the developers how much they think they can do in this iteration and only put in the tasks they think they can accomplish.  After a couple of iterations, you can add up the total points the team completed and take the average.  That will be their velocity.

You will notice when a team first starts out their velocity will probably fluctuate as they get used to estimating and working on the new project.  Be sure to keep an up to date feel for their velocity.  I would recommend not using all of the previous iterations for the average since you may have some anomalies especially at the beginning.

With their velocity and the stories estimated for the project, you should be able to project when a project will be complete.  And, you don’t have to go to all of the trouble of assigning out tasks to individuals!

 

Learning in the real world

How many times have you heard a kid say, “when am I going to use this in the real world”? How many times have you said it?

My niece is taking some programming courses in college and she told me she was having difficulty with one of the classes. When I asked, she said she just didn’t understand how what she was learning would be applied.  The professor had forgotten to tell the class the why of what they were learning.

This happens throughout our learning lifetime.  I wrote a post a while ago (Searching for the why) where I talked about making sure to include the why when talking about processes as well as coding techniques.  Knowing the why helps us to apply the how better.

Does knowing the why help us learn too?

I think understanding why we are learning a new skill helps motivate us to learn it.  Let’s take for example, learning a new language.  I have tried many times to learn a new language, and I have only been partially successful.  I was most successful learning some Spanish when I was planning on going to a Spanish speaking country.  Since then, it has been on and off.

I don’t know anyone who speaks Spanish.  I live in the midwest and we don’t have a lot of native speakers in the area.  I have little to no reason to learn it, other than I have this fantasy in my head of me being able to speak multiple languages fluently.

Sometimes the skill that we are learning is simply a stepping stone to something more complex, and having a concrete why for that may be difficult at best.  This happens a lot throughout our school years.  But, that is the why right there.  Because it is crucial to understanding something greater.

I encourage our teachers, our mentors and our parents to help the learners of the world to understand the whys of what they are learning as well as the whats and the hows.  I encourage our learners to ask the teachers they whys as well.

Design Patterns: Facade

What are Design Patterns?

Most of you already know what a design pattern is, but I want to make sure this blog is accessible to everyone.  A design pattern is a set of “best practices” for how to structure your code to solve certain problems.  It is not a completed library that you an plug in to your code base, but more like a template helping you determine how to structure the code you are writing.

There are a couple of good books out there describing Design Patterns.  The one most people will point you to is the Gang of Four (Design Patterns: Elements of Reusable Object-oriented Software) book.  I know it is probably blasphemy to admit, but I have actually not read that one…I have, however, read the Head First Design Patterns book and I highly recommend it.

Today we are going to dive into the Facade pattern.

Facade Pattern

The definition of the word facade helps me remember what this pattern is used for:

  • an outward appearance that is maintained to conceal a less pleasant or creditable reality.

We often use the facade pattern to hide either a complex part of the system (or subsystem) or to simplify the use of some subsystem. And, sometimes we use it to simply hide a subsystem, third-party library, etc.

It is obvious why we want to create a class that understands the complexity of a subsystem and keep that complexity out of the other parts of your code.  However, it is probably less obvious why you would want to simply hide a subsystem or third-party library behind a facade.

If you hide your subsystem, or third-party library behind a facade, it will allow you to more easily switch this system out for a new one, or even a new version without having to do major refactoring.  If there is only one place that is dealing with this system, there is only one place where you need to make changes.

Secondly, using a facade can help you to mock out systems that you interact with for testing.  Sometimes it can be difficult to fake out a system you are interacting with, but if every part of your system uses the facade, then you simply have to mock out the facade in the rest of the application, and only worry about figuring out how to mock the real system out for the tests around the facade.

 

Lastly, when you hide things behind a facade it allows you to delay making major architectural decisions.  You can determine what you want the API to this system to look like, and even fake out the interactions until you have had a chance to decide how you want to implement the subsystem.

* Photo by Zacharie Gaudrillot-Roy