Commit Early, Commit Often - The Sane Way To Work

A recent post by Ben Collins-Sussman talks about the benefits of frequent commits. This is a must read for all developers that work as part of a team.

Ben talks about some potential benefits to distributed version control, but didn’t go into too much detail. All of my source code repositories use Subversion. I often use DVCS as a way to let me work offline. With a DVCS I am able to make very small commits and take advantage of version control when I’m on a plane, in a park or a variety of other places. I am then able to sync up all of my commits as individual commits into the Subversion repository. The trick is to not go more than a day or two in between the syncs.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Reddit
  • Technorati

Using Simple, Useful Class Names

In a previous post I complained about developers using worthless names for Classes. A couple of people called me on the fact that I never really defined what a good name is or how I come up with them.

During the application design process I’ll typically do a little domain modeling. Each problem space usually already has its own terminology and concepts. This is where I get the bulk of the names I use for Classes. For example, at a bank you may find terms like Customer, Account and Interest.

There are also common names I use when implementing a pattern. These common names are almost always used in conjunction with a domain term. A couple of quick examples:

  1. CustomerFactory
  2. HttpAdapter
  3. SmtpAdapter
  4. AccountAdapter

This is not rocket science. As an object oriented developer I want to model a solution in terms of real world objects or at least concepts from the problem domain. Why would I call an Account a PiggyBank or a Customer a John? I wouldn’t. Would you? Try asking the project stakeholder to define withdrawal rules for a John getting some spending money from the PiggyBank.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Reddit
  • Technorati

Class Naming Really Does Matter

Time and time again I see programmers making up nutball names for their domain objects. This makes it really hard to understand the code without diving in. I deal with a significant amount of code on a daily basis and if I had to read every line I’d be in big trouble.

Instead I prefer to look and the classnames and imports to get a good feel for what the code does and what the relationships are. Sure sometimes I have to dig into the initializer (init) of a class and maybe a method or two. But in general I really don’t need or want to read everything all of the time.

Package names are different. I don’t mind when it is not immediately obvious what a package does when reading the name. FefiFofum is a fine name for a package. But what if that were the class name? Is it a FIFO queue? Is it a simple data object? What a pain!

For the sanity of other programmers please use good names for your classes!

Update: I have another post that explains how I come up with good names.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Reddit
  • Technorati

Measuring Cyclomatic Complexity Of Python Code

Complex code is hard to manage, hard to isolate and hard to unit test. For these reasons it is more difficult and costly to modify. In other words you should try to avoid complex code.

Many software metrics exist to measure the complexity of code. One such metric is cyclomatic complexity. Cyclomatic complexity (CC) is the measure of linearly independent paths through a program. The algorithm was developed by Thomas McCabe back in the 1970s.

CC really measures the amount of branching in a suite of code. Suites with more than seven branches are considered suboptimal and should be looked at for refactoring. The number seven was chosen because it is believed to be the average number of things a human being can concurrently hold in their head. CC is well covered on the internet so if you want to know more Google it.

I implemented the CC algorithm using a very simple AST visitor. A CC number is calculated for each Module, Class, Method and Function in a file. The program currently calculates, but does not print the results for nested classes or nested functions.

Metric Frenzy
Use metrics as a guide to show where there may be a need for refactoring. Don’t take them too seriously. Just because the complexity number is slightly above optimal doesn’t mean the code sucks. Metrics are not the definitive answer on code quality. So take them with a grain of salt.

Getting The Code
The program and unit tests are available in my Subversion repository. Just download the files into any directory on your system. You will need at least pygenie.py and cc.py.

I am probably going to create a new home on Google code for this stuff. It will be announced in a follow up post.

Running The Program
The program expects one or more Python filenames or fully qualified module names to be passed in on the command line. For example:
./pygenie.py complexity mycode.py
- or -
./pygenie.py complexity mycode.py dir0/dir1/mod.py
- or -
./pygenie.py complexity dir0.dir1.mod

Running the program will print the results to standard output. This is a proof of concept and not a polished application so don’t expect real fancy output.

Interpreting The Results
The output is a table of three columns: suite type, suite name and the complexity number. The suit type could have the following values: X for a module, F for a function, C for a class and M for a method. The suite name is the fully qualified name of a suite. The complexity number is just a simple integer representing the suite’s complexity. The rows are sorted by the complexity number in descending order.

Only things that have a high complexity number are shown by default. If you want to see all of the complexity values you can use the –verbose option. For example:
./pygenie.py complexity –verbose dir0.dir1.mod

Here is an example of running the cc.py code throught itself:

dstanek% ./pygenie.py complexity example.py
Module: example
Complexity Chart:
type name                             complexity
M    AClass.runtests                  28
F    fall_down                        10
F    run_away                         9
X    cc                               8
M    BClass.dosomething               8
F    duck_and_cover                   8

Code that is not shown because its complexity number is seven or less in not proven to be good. The design may be faulty, variables obfuscated or many other things.

Closing Thoughts
It is good practice to try to keep the complexity of code to a minimum. Code with a low complexity number is less risky to change and easier to test. This should not be the only way to judge your code, just a supplement.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Reddit
  • Technorati