Tuesday, June 22, 2010

Counting lines of code

A mate of mine asked me how could he count the lines of a project. I said that an easy way is just to use unix command line utilities and I proposed just this line:

find . -name *.java -exec grep -v -P \
    "^[\s]*(\*|//|/\*|import|$)" \
    '{}' ';' | wc -l

But what does it?
- First if finds all java files an print them to the standard output.
- Then it removes empty, import and comment lines.
- Finally it count the lines.

After a while, I though that perl could be used to discard comments more accurately. So I produced this command line replacing grep by perl:

find . -name "*.java" -exec perl -n -e \
   '$i=1 if(/^\s*\/\*/);print $_ if(!$i && !/^\s*(\/\/|import|$)/);$i=0 if($i && /\*\//)' \
   '{}' ';' | wc -l

Finally someone pointed to me to check sloccount, a command-line tool able not only to count lines but also to estimate the effort and the cost in dollars. Take a look to the results of the gwtupload library:

$ sloccount gwtupload-project
[...] 
SLOC Directory SLOC-by-Language (Sorted)
4346    samples-gae     java=3966,xml=380
3048    core            java=2956,xml=92
1360    jsupload        java=608,perl=551,xml=201
704     gae             java=614,xml=81,sh=9
615     samples         java=403,xml=212
569     tomcat          xml=569
512     top_dir         xml=482,sh=30

Totals grouped by language (dominant language first):
java:          8547 (76.63%)
xml:           2017 (18.08%)
perl:           551 (4.94%)
sh:              39 (0.35%)

Total Physical Source Lines of Code (SLOC)                = 11,154
Development Effort Estimate, Person-Years (Person-Months) = 2.52 (30.20)
 (Basic COCOMO model, Person-Months = 2.4 * (KSLOC**1.05))
Schedule Estimate, Years (Months)                         = 0.76 (9.13)
 (Basic COCOMO model, Months = 2.5 * (person-months**0.38))
Estimated Average Number of Developers (Effort/Schedule)  = 3.31
Total Estimated Cost to Develop                           = $ 339,973
 (average salary = $56,286/year, overhead = 2.40).
[...]
Please credit this data as "generated using David A. Wheeler's 'SLOCCount'."