Commentary:

What’s being profiled:
In my simple profiling Perl scripts I count the number of iterations I see a given header tag, along with count the time reported by clock_gettime(3). I’ve placed simple printf’s around the following subroutines/sections of code, as I believe they are critical (high number of iterations, large amount of disk access time) pkg_install sections:

add/perform.c:

  • setenv(_TOP)

  • +CONTENTS fopen(3)

lib/file.c:

  • copy_file(..)

  • move_file(..)

  • copy_hierarchy(..)

  • unpack(..)

lib/plist.c:

  • real_plist(..)

  • write_plist(..)

  • delete_package(..)

  • delete_hierarchy(..)

  • plist_cmd(..)

  • add_plist(..)

  • add_plist_top(..)

Changes made over last couple weeks: A couple of weeks ago I tried to put a 10 line buffer in read_plist(..). This is what I refer to as the “old results” in the graph. Over this last week I took the read_plist(..) function and converted it from a function like this (“old”):
while(!feof(file_p)) {
    /** fgets(file_p) **/
    /** process line in plist_cmd(..) **/
}
to a function like this (“new”):
/** malloc buffer large enough to hold +CONTENTS **/
while(!feof(file_p)) {
    /** add content via fgetc **/
}

/** split lines via strtok("n") and process in plist_cmd(..) **/
I also took the liberty of converting the plist_cmd(..) to what I consider a faster branch selection process. Instead of doing something like this (“old”):
if(!strcmp({command string directive <em>1</em>}, command_str_p)) {
    return PLIST_{command string directive <em>1</em>};/** command string directives <em>2</em> through <em>n-1</em> are here **/
} else if(!strcmp({command string directive <em>n</em>}, command_str_p)) { return PLIST_{command string directive <em>n</em>};
} else return PLIST_FAIL;

I tried to unroll strcmp a bit by checking for the first character of the command, then run strcmp on a shortened version of the string, decreasing the search radix of the string accordingly. This results in code similar to this (I’m using adapted bits of the real thing because it’s easier) (“new”):
if(command_str_p[0]  'c') {
    if(command_str_p[1]  ‘o’) {
        if( !strcmp(“mment”, &command_str_p[2]) ) { /* comment */
            /** In the real code it checks for DEPORIGIN and ORIGIN data here. **/
            return PLIST_COMMENT;
        } else if(!strcmp(“nflicts”, &command_str_p[2])) /* conflicts */
            return PLIST_CONFLICTS;
    }
}

/** process the rest of the command string directives here **/
return PLIST_FAIL;


I also modified the heuristics so that what I thought the higher probability directives were processed first, then the lower probability heuristics last.

Results interpretation:
I’ve taken the rounds from my old modifications (10 line buffer, then process line by line), and compared them against my new modifications (buffer all lines, then process line by line; modified directive processing heuristics).

Most of the values appear to be constant and the mins and maxes are remaining in the same ballpark as one another (*) (which I would expect). However, the amount of time for read_plist has been decreased a large amount 1-0.000957/0.000793 = ~20%, and the other raw amounts of time are consistent with this value (see throughput ratios off to right side of the HTML file).

So, while my overall results appear to have benefited little in the sections that I’ve modified (mostly read_plist), I appear to have positively affected all other sections of code that I’m profiling by the same ratio (20% decrease, or Old/New = ~1.2 in the Throughput Ratio column). Also, although my debug statements have blown up the size of plist_cmd iterations and time spent in that section, the Time Spent Throughput Ratio hasn’t increased significantly (~17%), and I believe once I solve the bug I’m seeing (*), the numbers encountered should drop as well (as long as gcc isn’t doing some smart loop unrolling or other optimizations with strcmp when compiled with -O2 -march=pentium4; if true, I’ll need to revise my heuristics).

Summary:
My results, although small appear to have positively affected all profiled instructions apart from plist_cmd (*). Once that issue is solved there should be a decrease across the board for all profiled functions.

Similar optimizations may be possible in other sections of code. I’ll need to research whether or not that’s possible.

Notes:
(*) Unfortunately I’ve come across some sort of bug where my debug/profiling statements are making it into plist_cmd(..) and I’ve estimated that it’s tripled the amount of iterations. I think that the bug is being caused by piped commands read in via STDIN to pkg_add. I plan to solve this by using libarchive and specifying packages via arguments on the command line, instead of via STDIN like it’s done currently.

All results are available at http://students.washington.edu/youshi10/posted/atk-results.tgz, and more results will be forthcoming (in particular after I come up with a more streamlined means of converting data from CSV to Excel / HTML formatted files).

Add to del.icio.us - Digg this article