Measuring Cyclomatic Complexity Of Python Code

Complex code is hard to manage, hard to isolate and hard to unit test. For these reasons it is more difficult and costly to modify. In other words you should try to avoid complex code. Many software metrics exist to measure the complexity of code. One such metric is cyclomatic complexity. Cyclomatic complexity (CC) is the measure of linearly independent paths through a program. The algorithm was developed by Thomas McCabe back in the 1970s. CC really measures the amount of branching in a suite of code. Suites with more than seven branches are considered suboptimal and should be looked at for refactoring. The number seven was chosen because it is believed to be the average number of things a human being can concurrently hold in their head. CC is well covered on the internet so if you want to know more Google it. I implemented the CC algorithm using a very simple AST visitor. A CC number is calculated for each Module, Class, Method and Function in a file. The program currently calculates, but does not print the results for nested classes or nested functions. Metric Frenzy Use metrics as a guide to show where there may be a need for refactoring. Don't take them too seriously. Just because the complexity number is slightly above optimal doesn't mean the code sucks. Metrics are not the definitive answer on code quality. So take them with a grain of salt. Getting The Code The program and unit tests are available in my Subversion repository. Just download the files into any directory on your system. You will need at least pygenie.py and cc.py. I am probably going to create a new home on Google code for this stuff. It will be announced in a follow up post. Running The Program The program expects one or more Python filenames or fully qualified module names to be passed in on the command line. For example: ./pygenie.py complexity mycode.py - or - ./pygenie.py complexity mycode.py dir0/dir1/mod.py - or - ./pygenie.py complexity dir0.dir1.mod Running the program will print the results to standard output. This is a proof of concept and not a polished application so don't expect real fancy output. Interpreting The Results The output is a table of three columns: suite type, suite name and the complexity number. The suit type could have the following values: X for a module, F for a function, C for a class and M for a method. The suite name is the fully qualified name of a suite. The complexity number is just a simple integer representing the suite's complexity. The rows are sorted by the complexity number in descending order. Only things that have a high complexity number are shown by default. If you want to see all of the complexity values you can use the --verbose option. For example: ./pygenie.py complexity --verbose dir0.dir1.mod Here is an example of running the cc.py code throught itself:

dstanek% ./pygenie.py complexity example.py
Module: example
Complexity Chart:
type name                             complexity 
M    AClass.runtests                  28          
F    fall_down                        10          
F    run_away                         9          
X    cc                               8          
M    BClass.dosomething               8          
F    duck_and_cover                   8

Code that is not shown because its complexity number is seven or less in not proven to be good. The design may be faulty, variables obfuscated or many other things. Closing Thoughts It is good practice to try to keep the complexity of code to a minimum. Code with a low complexity number is less risky to change and easier to test. This should not be the only way to judge your code, just a supplement.

March 31, 2008 at 10:30 AM | categories: python, coding | View Comments

david stanek's digressions