Undocumented cProfile Features

Created On:

The cProfile module is currently the best profiler for Python, it is bundled in the standard library and gives accurate results. The documentation, however leaves out a lot of functionality that is useful.

The documentation explains that the main entry point to the module is cProfile.run() where you pass as a string which is to be passed to the exec statement. There is an optional second argument which is a filename where the profile data will be serialized in a binary format. If you do not provide a filename, a report will be printed to the console. The report1 looks like this:

      2706 function calls (2004 primitive calls) in 4.504 CPU seconds

Ordered by: standard name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     2    0.006    0.003    0.953    0.477 pobject.py:75(save_objects)
  43/3    0.533    0.012    0.749    0.250 pobject.py:99(evaluate)
 ...

If you do serialize the data to a file you can use the pstats module to load the data which lets you sort the statistics along a value and then print the report to the console. The module also lets you print the callers/callees of a function as well.

This is the documented use of the module, however there are plenty of undocumented features that allow you to gain more control over the profiler.

Redirect pstats output to a StringIO object

First, the Stats class constructor in pstats takes an undocumented keyword argument stream that lets you capture the output of the printing commands. This way you can display the output somewhere else, like a HTML report or parse it for data.

Example:

import cProfile
import pstats
import StringIO

def foo():
    print "Hello World"

cProfile.run('foo()', 'statsfile')

stream = StringIO.StringIO()
stats = pstats.Stats('statsfile', stream=stream)
stats.print_stats()

# Stream now contains the report text.
# Can be accessed with stream.getvalue()

print stream.getvalue()

This should give the following output:

Hello World
Fri Jun 29 09:17:49 2012    statsfile

         3 function calls in 0.000 seconds

   Random listing order was used

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 test.py:5(foo)

Load the statistics file

The file that cProfile dumps the data to is binary and the module documentation does not explain what format it is in. According to the module source code , the file is a marshaled dictionary that maps a tuple describing functions to a tuple of statistics.

Example:

import cProfile
import marshal
import tempfile
import pprint

def foo():
    print "Hello World"

statsfile = tempfile.NamedTemporaryFile()

cProfile.run('foo()', statsfile.name)

stats = marshal.load(statsfile.file)

pprint.pprint(stats)

This should give the following output:

{('<string>', 1, '<module>'): (1, 1, 3e-06, 2.4e-05, {}),
 ('test.py', 6, 'foo'): (1,
                         1,
                         2.1e-05,
                         2.1e-05,
                         {('<string>', 1, '<module>'): (1,
                                                        1,
                                                        2.1e-05,
                                                        2.1e-05)}),
 ('~', 0, "<method 'disable' of '_lsprof.Profiler' objects>"): (1,
                                                                1,
                                                                1e-06,
                                                                1e-06,
                                                                {})}

The output dictionary maps a tuple describing a function (filename, line number, name) to a tuple of statistics. The tuple’s data is (number of calls, number of non-recursive calls, total time, cumulative time, subcall statistics). You can easily take this dictionary and do further processing.

Profile a function

Instead of giving cProfile a string to pass on to exec you can actually profile a function by passing in the function and arguments. This is better than using the module level run function because you do not need to construct a string for an exec statement. In addition you can now easily access the profiler’s data directly meaning we don’t need to marshal data from the disk.

Example:

import cProfile
import pprint

def foo():
    print "Hello World"

profiler = cProfile.Profile()

# If `foo` had other arguments they could be passed as the next arguments
# to `runcall`
profiler.runcall(foo)

pprint.pprint(profiler.getstats())

This produces the following output:

Hello World
[_lsprof.profiler_entry(code="<method 'disable' of '_lsprof.Profiler' objects>", callcount=1, reccallcount=0, totaltime=0.0, inlinetime=0.0, calls=None),
 _lsprof.profiler_entry(code=<code object foo at 0x107767db0, file "test.py", line 4>, callcount=1, reccallcount=0, totaltime=1.3e-05, inlinetime=1.3e-05, calls=None)]

This returns an array of lsprof entry objects. This array contains the raw data from the profiler and gives you the most amount of data possible. Each object contains callcount. recallcount, totaltime, inlinetime attributes which are the number of times called, number of times recursively called, total time spent in the function and time spent in the function but not in subcalls respectively. The code attribute is either a python code object or a string describing functions that cannot have code objects. The calls attribute will contain profiler entries for subcalls within that function. With this data you can perform even more sophisticated analysis and filter out entries that are not useful to you.

Conclusion

There is a lot of undocumented functionality in the cProfile module. Leveraging some of this undocumented functionality can let you construct better profiling tools for your Python application.


  1. This was copied from the Python 2.7.3 documentation.↩︎