Measuring Cyclomatic Complexity Of Python Code

Complex code is hard to manage, hard to isolate and hard to unit test. For these reasons it is more difficult and costly to modify. In other words you should try to avoid complex code.

Many software metrics exist to measure the complexity of code. One such metric is cyclomatic complexity. Cyclomatic complexity (CC) is the measure of linearly independent paths through a program. The algorithm was developed by Thomas McCabe back in the 1970s.

CC really measures the amount of branching in a suite of code. Suites with more than seven branches are considered suboptimal and should be looked at for refactoring. The number seven was chosen because it is believed to be the average number of things a human being can concurrently hold in their head. CC is well covered on the internet so if you want to know more Google it.

I implemented the CC algorithm using a very simple AST visitor. A CC number is calculated for each Module, Class, Method and Function in a file. The program currently calculates, but does not print the results for nested classes or nested functions.

Metric Frenzy
Use metrics as a guide to show where there may be a need for refactoring. Don’t take them too seriously. Just because the complexity number is slightly above optimal doesn’t mean the code sucks. Metrics are not the definitive answer on code quality. So take them with a grain of salt.

Getting The Code
The program and unit tests are available in my Subversion repository. Just download the files into any directory on your system. You will need at least pygenie.py and cc.py.

I am probably going to create a new home on Google code for this stuff. It will be announced in a follow up post.

Running The Program
The program expects one or more Python filenames or fully qualified module names to be passed in on the command line. For example:
./pygenie.py complexity mycode.py
- or -
./pygenie.py complexity mycode.py dir0/dir1/mod.py
- or -
./pygenie.py complexity dir0.dir1.mod

Running the program will print the results to standard output. This is a proof of concept and not a polished application so don’t expect real fancy output.

Interpreting The Results
The output is a table of three columns: suite type, suite name and the complexity number. The suit type could have the following values: X for a module, F for a function, C for a class and M for a method. The suite name is the fully qualified name of a suite. The complexity number is just a simple integer representing the suite’s complexity. The rows are sorted by the complexity number in descending order.

Only things that have a high complexity number are shown by default. If you want to see all of the complexity values you can use the –verbose option. For example:
./pygenie.py complexity –verbose dir0.dir1.mod

Here is an example of running the cc.py code throught itself:

dstanek% ./pygenie.py complexity example.py
Module: example
Complexity Chart:
type name                             complexity
M    AClass.runtests                  28
F    fall_down                        10
F    run_away                         9
X    cc                               8
M    BClass.dosomething               8
F    duck_and_cover                   8

Code that is not shown because its complexity number is seven or less in not proven to be good. The design may be faulty, variables obfuscated or many other things.

Closing Thoughts
It is good practice to try to keep the complexity of code to a minimum. Code with a low complexity number is less risky to change and easier to test. This should not be the only way to judge your code, just a supplement.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Reddit
  • Technorati

Comments

  • Nice work! I just gave it a go in a casual project of mine and it didn't point out any issues. (It's a simple project. :) )


    You should add pygenie to PyPI.

  • Thanks, handy code, worked first time.

  • That would make a nice extension to pylint's existing recommendations about code complexity and refactoring.

  • There's some recent work that suggests that the sweet spot for cyclomatic complexity is closer to 24 than it is to 7. See http://www.sdtimes.com/content/article.aspx?Art... (which, if you track down the study, is based on open source Java projects, so mileage may vary vs. Python).
  • dstanek
    @Josh

    I initially wrote the code a few weeks before PyCon. Since then I have been using it to keep my code simple. I feel that its worked, but thats really best left to poor programmer that has to deal with my code :-)
  • Excellent work. Thanks for sharing it with the rest of us. Has it helped your code yet or is it too early to use for production work?
  • very cool. As an aside, you may want to get familiar with the new _ast module, http://docs.python.org/dev/library/_ast since it replaces the deprecated compiler module in python 2.6 and 3.0
  • This is a really interesting post and would make a great clepy presentation.
blog comments powered by Disqus