Undocumented cProfile Features
The cProfile module is currently the best profiler for Python, it is bundled in the standard library and gives accurate results. The documentation, however leaves out a lot of functionality that is useful.
The documentation explains that the main entry point to the module is cProfile.run()
where you pass as a string which is to be passed to the exec
statement. There is an optional second argument which is a filename where the profile data will be serialized in a binary format. If you do not provide a filename, a report will be printed to the console. The report1 looks like this:
If you do serialize the data to a file you can use the pstats module to load the data which lets you sort the statistics along a value and then print the report to the console. The module also lets you print the callers/callees of a function as well.
This is the documented use of the module, however there are plenty of undocumented features that allow you to gain more control over the profiler.
Redirect pstats
output to a StringIO
object
First, the Stats
class constructor in pstats
takes an undocumented keyword argument stream
that lets you capture the output of the printing commands. This way you can display the output somewhere else, like a HTML report or parse it for data.
Example:
This should give the following output:
Load the statistics file
The file that cProfile dumps the data to is binary and the module documentation does not explain what format it is in. According to the module source code , the file is a marshaled dictionary that maps a tuple describing functions to a tuple of statistics.
Example:
This should give the following output:
The output dictionary maps a tuple describing a function (filename, line number, name) to a tuple of statistics. The tuple’s data is (number of calls, number of non-recursive calls, total time, cumulative time, subcall statistics). You can easily take this dictionary and do further processing.
Profile a function
Instead of giving cProfile
a string to pass on to exec
you can actually profile a function by passing in the function and arguments. This is better than using the module level run
function because you do not need to construct a string for an exec
statement. In addition you can now easily access the profiler’s data directly meaning we don’t need to marshal
data from the disk.
Example:
This produces the following output:
This returns an array of lsprof entry objects. This array contains the raw data from the profiler and gives you the most amount of data possible. Each object contains callcount
. recallcount
, totaltime
, inlinetime
attributes which are the number of times called, number of times recursively called, total time spent in the function and time spent in the function but not in subcalls respectively. The code
attribute is either a python code object or a string describing functions that cannot have code objects. The calls
attribute will contain profiler entries for subcalls within that function. With this data you can perform even more sophisticated analysis and filter out entries that are not useful to you.
Conclusion
There is a lot of undocumented functionality in the cProfile
module. Leveraging some of this undocumented functionality can let you construct better profiling tools for your Python application.
This was copied from the Python 2.7.3 documentation.↩︎