glibc-malloc-trace-utils

Created 3 years ago
Maintained by codonell
glibc malloc trace post-processing utilities
Members 3
Naveen N. Rao committed 6 months ago

glibc-malloc-trace-utils

glibc malloc trace post-processing utilities

mtrace.c:

A trace capture tool based on interposing malloc API calls.

statistics.c:

Runs over the trace file and produces rudimentary statistics about the allocations e.g. minimum request size in bytes, maximum request size in bytes, and average request size in bytes.

trace_allocs.c:

Print all the allocation requests for a given trace. This program generates one of the data files plotted by trace_plot.m, namely the scatter plot of allocation size requests. The input is the trace file.

hist.sh:

Converts the list of allocations generated by trace_allocs.c into a binned list with unique allocation sizes and counts as output. This generates one of the data files plotted by plot.m, namely the scatter plot of allocation counts by size.

sample.c:

An N-th sample down-sampling program that down-samples a single column of numerical data into N-th samples with min, mean, max, and value at N-th sample. Down-sampling data is the only way to visualize large datasets. Sampling supports starting at a given line offset, and ending at a given ending line offset.

trace_plot.m:

An Octave compatible script to turn load down-sampled plot data into useful graphs for analysis of the workload.

block_size_rss.c:

Compute the "block size" RSS from the raw trace. This does not require any simulation and uses existing information in the trace to provide a rough estimate of the RSS in use by the allocator (not the actual application under test).

trace2wl.c:

Convert a malloc trace format file into a workload file that can be loaded by the malloc simulator.

trace_run.c:

The malloc simulator. It can run a workload file.

trace_dump.c:

Dump a malloc trace in a human readable form.

Capture Tutorial

The trace capture interposer is built as a shared object you preload into your application. Here's a simple example:

    $ LD_PRELOAD=./libmtrace.so ls
    mtrace-ctl: writing to /tmp/mtrace.out.706
    . . . ls output . . .

The interposer tells you where the trace is stored (/tmp/mtrace.out.706 in this example). Since the interposer appends the application's PID to the filename, it's (mostly) safe to export LD_PRELOAD in general, to apply to many applications at once.

If you set the environment variable MTRACE_CTL_FILE it will override the default of /tmp/mtrace.out. Note that the PID will still be appended to whatever you specify. If the program you're tracing calls exec(), a second trace with the same pid will have a sequence number appended (like mtrace.out.706.1).

There are some caveats to using this interposer:

First, if your application calls exec() (example: you're tracing /bin/sh) the tracer will not have an opportunity to do any cleanup work. If you see trace files that are 64MiB but only contain a few entries, this is what happened.

Second, calls to realloc may suffer what we call "trace inversion" in the trace file. This is because there's a time between our recording the pointers, and realloc doing a malloc/free pair, where other threads might reuse those pointers and record that use. Example:

    THREAD-A           THREAD-B
    ptr2=realloc ptr1
    ptr2=malloc
    copy data
    free ptr1
               ptr1=malloc
               -- trace ptr1
    -- trace ptr1,ptr2

In this example, realloc gets called first, but the trace sees the malloc first - and, apparently, sees malloc returning ptr1 before seeing realloc free ptr1. This is the opposite order of what happens inside the malloc code in libc.so, but we have no way of recording our trace between realloc's malloc and free.

Note that trace2wl has code to detect and properly handle trace inversion.

Simulation Tutorial

  • Get the trace.
    cp ~/my.trace.bz2 .
    bunzip2 my.trace.bz2
  • Convert trace to workload.
    trace2wl my.wl ./my.trace
    5: invalid pointer 0x21be010 passed to free: 0:not seen
    count is 2 vs 2
    vals is 2
    ls -alt *.wl
    -rw-rw-r--. 1 carlos carlos 310751804 Mar 12 11:34 my.wl
  • Get simple trace statistics, just to sanity check (trace_plot.m does this too).
    statistics ./my.trace
    Min allocation size: 1
    Max allocation size: 1638399
    Mean allocation size: 399
  • Extract rough allocation trace (for histogram).
    trace_allocs ./my.trace > myw1_allocs.log
  • Extract binned allocation histogram (for plotting).
    hist.sh ./myw1_allocs.log > myw1_binned_allocs.log
  • Pre-simulation rough block size trace (sanity check).
    block_size_rss ./my.trace > myw1_block_size_rss.txt
  • Run simulation with memory log recording.
    trace_run ./my.wl myw1_VmSize.log myw1_VmRSS.log myw1_ideal_RSS.log

    911,490,514,189 cycles
    260,418,362 usec wall time
    371,781,997 usec across 2 threads
    58,863,616 bytes Max RSS (15,556,608 -> 74,420,224)
    Starting VmRSS 15556608 (bytes)
    Starting VmSize 16633856 (bytes)
    Starting MaxRSS 15556608 (bytes)
    Ending VmRSS 71335936 (bytes)
    Ending VmSize 172900352 (bytes)
    Ending MaxRSS 74420224 (bytes)
    49,322 Kb Max Ideal RSS

    sizeof ticks_t is 8
    Avg malloc time:    336 in 32,070,748 calls
    Avg calloc time:  9,945 in  1,468,151 calls
    Avg realloc time:   655 in    765,824 calls
    Avg free time:      199 in 33,497,886 calls
    Total call time: 32,576,066,690 cycles
  • Downsample the data.
    sample myw1_VmSize.log myw1_ds100_VmSize.log 100
    sample myw1_VmRSS.log myw1_ds100_VmRSS.log 100
    sample myw1_ideal_RSS.log myw1_ds100_ideal_RSS.log 100
  • Plot the data.
    • Start up Octave.
    • Open trace_plot.m.
    • Fill in "bias=" with simulator starting RSS bias value.
    • Set down sampling ratio "down_sample_ratio = 100" to match your down sample.
    • Set "path=" and "xxx_file" variables to point to your data files (path is prefix).
    • Tweak block_start/block_end for linear regression lines, this is in event# start/end.
    • Run trace plotting.
    >> trace_plot