A trace capture tool based on interposing malloc API calls.
Runs over the trace file and produces rudimentary statistics about the allocations e.g. minimum request size in bytes, maximum request size in bytes, and average request size in bytes.
Print all the allocation requests for a given trace. This program generates one of the data files plotted by trace_plot.m, namely the scatter plot of allocation size requests. The input is the trace file.
Converts the list of allocations generated by trace_allocs.c into a binned list with unique allocation sizes and counts as output. This generates one of the data files plotted by plot.m, namely the scatter plot of allocation counts by size.
An N-th sample down-sampling program that down-samples a single column of numerical data into N-th samples with min, mean, max, and value at N-th sample. Down-sampling data is the only way to visualize large datasets. Sampling supports starting at a given line offset, and ending at a given ending line offset.
An Octave compatible script to turn load down-sampled plot data into useful graphs for analysis of the workload.
Compute the "block size" RSS from the raw trace. This does not require any simulation and uses existing information in the trace to provide a rough estimate of the RSS in use by the allocator (not the actual application under test).
Convert a malloc trace format file into a workload file that can be loaded by the malloc simulator.
The malloc simulator. It can run a workload file.
Dump a malloc trace in a human readable form.
The trace capture interposer is built as a shared object you preload into your application. Here's a simple example:
$ LD_PRELOAD=./libmtrace.so ls mtrace-ctl: writing to /tmp/mtrace.out.706 . . . ls output . . .
The interposer tells you where the trace is stored (/tmp/mtrace.out.706 in this example). Since the interposer appends the application's PID to the filename, it's (mostly) safe to export LD_PRELOAD in general, to apply to many applications at once.
If you set the environment variable MTRACE_CTL_FILE it will override the default of /tmp/mtrace.out. Note that the PID will still be appended to whatever you specify. If the program you're tracing calls exec(), a second trace with the same pid will have a sequence number appended (like mtrace.out.706.1).
There are some caveats to using this interposer:
First, if your application calls exec() (example: you're tracing /bin/sh) the tracer will not have an opportunity to do any cleanup work. If you see trace files that are 64MiB but only contain a few entries, this is what happened.
Second, calls to realloc may suffer what we call "trace inversion" in the trace file. This is because there's a time between our recording the pointers, and realloc doing a malloc/free pair, where other threads might reuse those pointers and record that use. Example:
THREAD-A THREAD-B ptr2=realloc ptr1 ptr2=malloc copy data free ptr1 ptr1=malloc -- trace ptr1 -- trace ptr1,ptr2
In this example, realloc gets called first, but the trace sees the malloc first - and, apparently, sees malloc returning ptr1 before seeing realloc free ptr1. This is the opposite order of what happens inside the malloc code in libc.so, but we have no way of recording our trace between realloc's malloc and free.
Note that trace2wl has code to detect and properly handle trace inversion.
cp ~/my.trace.bz2 . bunzip2 my.trace.bz2
trace2wl my.wl ./my.trace 5: invalid pointer 0x21be010 passed to free: 0:not seen count is 2 vs 2 vals is 2 ls -alt *.wl -rw-rw-r--. 1 carlos carlos 310751804 Mar 12 11:34 my.wl
statistics ./my.trace Min allocation size: 1 Max allocation size: 1638399 Mean allocation size: 399
trace_allocs ./my.trace > myw1_allocs.log
hist.sh ./myw1_allocs.log > myw1_binned_allocs.log
block_size_rss ./my.trace > myw1_block_size_rss.txt
trace_run ./my.wl myw1_VmSize.log myw1_VmRSS.log myw1_ideal_RSS.log 911,490,514,189 cycles 260,418,362 usec wall time 371,781,997 usec across 2 threads 58,863,616 bytes Max RSS (15,556,608 -> 74,420,224) Starting VmRSS 15556608 (bytes) Starting VmSize 16633856 (bytes) Starting MaxRSS 15556608 (bytes) Ending VmRSS 71335936 (bytes) Ending VmSize 172900352 (bytes) Ending MaxRSS 74420224 (bytes) 49,322 Kb Max Ideal RSS sizeof ticks_t is 8 Avg malloc time: 336 in 32,070,748 calls Avg calloc time: 9,945 in 1,468,151 calls Avg realloc time: 655 in 765,824 calls Avg free time: 199 in 33,497,886 calls Total call time: 32,576,066,690 cycles
sample myw1_VmSize.log myw1_ds100_VmSize.log 100 sample myw1_VmRSS.log myw1_ds100_VmRSS.log 100 sample myw1_ideal_RSS.log myw1_ds100_ideal_RSS.log 100