A trace capture tool based on interposing malloc API calls.
Runs over the trace file and produces rudimentary statistics about the allocations e.g. minimum request size in bytes, maximum request size in bytes, and average request size in bytes.
Print all the allocation requests for a given trace. This program generates one of the data files plotted by trace_plot.m, namely the scatter plot of allocation size requests. The input is the trace file.
Converts the list of allocations generated by trace_allocs.c into a binned list with unique allocation sizes and counts as output. This generates one of the data files plotted by plot.m, namely the scatter plot of allocation counts by size.
An N-th sample down-sampling program that down-samples a single column of numerical data into N-th samples with min, mean, max, and value at N-th sample. Down-sampling data is the only way to visualize large datasets. Sampling supports starting at a given line offset, and ending at a given ending line offset.
An Octave compatible script to turn load down-sampled plot data into useful graphs for analysis of the workload.
Compute the "block size" RSS from the raw trace. This does not require any simulation and uses existing information in the trace to provide a rough estimate of the RSS in use by the allocator (not the actual application under test).
Convert a malloc trace format file into a workload file that can be loaded by the malloc simulator.
The malloc simulator. It can run a workload file.
Dump a malloc trace in a human readable form.
The trace capture interposer is built as a shared object you preload into your application. Here's a simple example:
$ MTRACE_CTL_VERBOSE=1 LD_PRELOAD=./libmtrace.so ls mtrace: writing to /tmp/mtrace.mtr.706 . . . ls output . . . mtrace: 3 entries recorded in /tmp/mtrace.mtr.706
The interposer tells you where the trace is stored (/tmp/mtrace.mtr.706 in this example). Since the interposer appends the application's PID to the filename, it's (mostly) safe to export LD_PRELOAD in general, to apply to many applications at once.
If you set the environment variable MTRACE_CTL_FILE it will override the default of /tmp/mtrace.mtr.XXX, where XXX will be replaced by the PID of the process. The PID will still be appended to whatever you specify (see MTRACE_CTL_PID). If the program you're tracing calls exec(), a second trace with the same pid will have a sequence number appended (like mtrace.mtr.706.1).
If you set the environment variable MTRACE_CTL_PID it will override the default suffix that is used to replace XXX, including the sequence number. The use of this environment variable is largely for testing to get a consistent trace file name.
If you set the environment variable MTRACE_CTL_VERBOSE to 1 it will cause some additional information to be printed to stderr during the run of the tracer. The information printed includes the name of the trace file (when tracing starts), and the number of records written (when tracing ends). The data is written to stderr to avoid writing this to useful program output in stdout.
If you set the environment variable MTRACE_CTL_BACKTRACE to 1 then
the tracer will additionally capture the address of the caller of
the API thus allowing you to identify the location of an invalid
allocation or deallocation. This is a very costly operation because
the unwinder is invoked at each API trace to determine the caller's
address. This is disabled by default, but it you need to find the
caller of certain functions then you can use this environment variable
to record more information (the size of the recorded data does not
increase). You must additionally capture
/proc/PID/maps of the
process in order to correlate the caller's address to a loaded object.
If you set the environment variable MTRACE_CTL_CHILDREN to 0 then the tracer will not trace children processes created by the traced process. Only the process loading the preloaded tracer will be traced when MTRACE_CTL_CHILDREN is set to 0. All children processes will see MTRACE_CTL_CHILDREN set to 2 to indicate they should not trace, but the trace infrastructure will still be initialized such that programmatic tracing could continue with the right API calls.
There are some caveats to using this interposer:
First, if your application calls exec() (example: you're tracing /bin/sh) the tracer will not have an opportunity to do any cleanup work. If you see trace files that are 64MiB but only contain a few entries, this is what happened.
Second, calls to realloc may suffer what we call "trace inversion" in the trace file. This is because there's a time between our recording the pointers, and realloc doing a malloc/free pair, where other threads might reuse those pointers and record that use. Example:
THREAD-A THREAD-B ptr2=realloc ptr1 ptr2=malloc copy data free ptr1 ptr1=malloc -- trace ptr1 -- trace ptr1,ptr2
In this example, realloc gets called first, but the trace sees the malloc first - and, apparently, sees malloc returning ptr1 before seeing realloc free ptr1. This is the opposite order of what happens inside the malloc code in libc.so, but we have no way of recording our trace between realloc's malloc and free.
Note that trace2wl has code to detect and properly handle trace inversion.
cp ~/my.trace.bz2 . bunzip2 my.trace.bz2
trace2wl my.wl ./my.trace 5: invalid pointer 0x21be010 passed to free: 0:not seen count is 2 vs 2 vals is 2 ls -alt *.wl -rw-rw-r--. 1 carlos carlos 310751804 Mar 12 11:34 my.wl
trace_statistics ./my.trace Min allocation size: 1 Max allocation size: 1638399 Mean allocation size: 399
trace_allocs ./my.trace > myw1_allocs.log
trace_hist.sh ./myw1_allocs.log > myw1_binned_allocs.log
trace_block_size_rss ./my.trace > myw1_block_size_rss.txt
trace_run ./my.wl myw1_VmSize.log myw1_VmRSS.log myw1_ideal_RSS.log 911,490,514,189 cycles 260,418,362 usec wall time 371,781,997 usec across 2 threads 58,863,616 bytes Max RSS (15,556,608 -> 74,420,224) Starting VmRSS 15556608 (bytes) Starting VmSize 16633856 (bytes) Starting MaxRSS 15556608 (bytes) Ending VmRSS 71335936 (bytes) Ending VmSize 172900352 (bytes) Ending MaxRSS 74420224 (bytes) 49,322 Kb Max Ideal RSS sizeof ticks_t is 8 Avg malloc time: 336 in 32,070,748 calls Avg calloc time: 9,945 in 1,468,151 calls Avg realloc time: 655 in 765,824 calls Avg free time: 199 in 33,497,886 calls Total call time: 32,576,066,690 cycles
trace_sample myw1_VmSize.log myw1_ds100_VmSize.log 100 trace_sample myw1_VmRSS.log myw1_ds100_VmRSS.log 100 trace_sample myw1_ideal_RSS.log myw1_ds100_ideal_RSS.log 100
The GNU C Library malloc tracing utilities are free software. See the file COPYING.LIB for copying conditions, and LICENSES for notices about a few contributions that require these additional notices to be distributed. License copyright years may be listed using range notation, e.g., 1996-2015, indicating that every year in the range, inclusive, is a copyrightable year that would otherwise be listed individually.