Quantcast
Channel: Brendan's blog » performance
Viewing all articles
Browse latest Browse all 34

Revealing Hidden Latency Patterns

$
0
0

Latency Heat Map

Response time – or latency – is crucial to understand in detail, but many of the common presentations of this data hide important details and patterns. Latency heat maps are an effective way to reveal these. I often use tools that provide heat maps directly, but sometimes I have separate trace output that I’d like to convert into a heat map. To answer this need, I just wrote trace2heatmap.pl, which generates interactive SVGs.

I explained how latency heat maps work in the 2010 article “Visualizing System Latency” (ACMQ, CACM). I’ve previously shared interesting examples in: Rainbow Pterodactyl, Icy Lake, ZFS L2ARC.

Problem

I whipped up a simple example to explain this, using disk I/O latency (I have plenty of real-world examples, but explaining them can get sidetracked). This is a single disk system, with a single process performing a sequential synchronous write workload.

Using iostat(1M) to examine average latency (asvc_t):

$ iostat -xnz 1
[...]
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0  220.0    0.0 9635.8  0.0  1.0    0.0    4.6   0  99 c1d0
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0  203.0    0.0 8976.2  0.0  1.0    0.0    5.1   0  99 c1d0

I could plot average latency (as many monitoring products do), but the average is seriously misleading, and doesn’t explain what’s really happening. And since latency is so important for performance, I want to know exactly what is happening.

I had “iosnoop -Dots” running, which collected two minutes of per-I/O latency and other details:

# iosnoop -Dots > out.iosnoop
^C
# more out.iosnoop
STIME(us)   TIME(us)    DELTA(us) DTIME(us) UID   PID D    BLOCK   SIZE      COMM PATHNAME
9339835115  9339835827  712       730       100 23885 W 253757952 131072    odsync <none>
9339835157  9339836008  850       180       100 23885 W 252168272   4096    odsync <none>
9339926948  9339927672  723       731       100 23885 W 251696640 131072    odsync <none>
[...15,000 lines truncated...]

I/O latency is the “DELTA(us)” column. This file was thousands of lines long – too much to read.

Latency Histogram: With Outliers

The latency distribution can be examined as a histogram (using R, and a subset of the trace file):

This shows that the average has been dragged up by latency outliers: I/O with very high latency.

This is a fairly common occurrence, and it’s very useful to know when it has occurred. Those outliers may be individually causing problems, and can be easily be plucked from the trace file for further analysis; eg:

# awk '$3 > 50000' out.iosnoop_marssync01
STIME(us)   TIME(us)    DELTA(us) DTIME(us) UID   PID D    BLOCK   SIZE      COMM PATHNAME
9343218579  9343276502  57922     57398      0      0 W 142876112  4096      sched <none>
9343218595  9343276605  58010     103        0      0 W 195581749  5632      sched <none>
9343278072  9343328860  50788     50091      0      0 W 195581794  4608      sched <none>
[...]

Most of the I/O in the histogram was in a single column on the left.

Latency Histogram: Zoomed

Zooming in, by generating a histogram of the 0 – 2 ms range:

The I/O distribution is bimodal. This also commonly occurs for latency or response time in different subsystems. Eg, the application has a “fast path” and a “slow path”, or a resource has cache hits vs cache misses, etc.

But there is still more hidden here. The average latency reported by iostat hinted that there was per-second variance. This histogram is reporting the entire two minutes of iosnoop output.

Latency Histogram: Animation

I rendered the iosnoop output as per-second histograms, and generated the following animation (a subset of the frames):

Not only is this bimodal, but the modes move over time. This had been obscured by rendering all data as a single histogram.

Heat Map

Using trace2heatmap.pl to generate a heat map from the iosnoop output.

Click for an interactive SVG version, and compare to the animation above.

The command used was:

$ awk '{ print $2, $3 }' out.iosnoop | ./trace2heatmap.pl --unitstime=us \
    --unitslatency=us --maxlat=2000 --grid > heatmap.svg

Without “–grid”, the grid lines are not drawn (making it more Tufte-friendly); see the example.

trace2heatmap.pl gets the job done, but it’s probably a bit buggy – I spent three hours writing it (and more than three hours writing this post about it!), really for just the trace files I don’t already have heat maps for.

Heat Maps Explained

It may already be obvious how these work. Each frame of the histogram animation becomes a column in a latency heat map, with the histogram bar height represented as a color:

Click for higher resolution.

Production Use

If you want to add heat maps to your monitoring solution, then great! However, note that tracing per-event latency can be expensive to perform. DTrace minimizes the overheads as much as possible using per-CPU buffers and asynchronous kernel-user transfers; other tools (eg, strace, tcpdump) are expected to have higher overhead. This can cause problems for production use: you want to understand the overhead, including when using DTrace, before tracing events.

Heat maps have been used successfully in production – and recorded at a one-second granularity 24x7x365 – by some products built upon DTrace. These use the DTrace aggregating feature to pass a quantized summary of latency, instead of every event, to user-level, reducing the data transfer by a large factor (eg, 1000x). This summary may consist of a per-second array with about 200 elements for different latency ranges, each containing the count of events, and is from the DTrace aggregating actions @quantize, @lquantize, or @llquantize (best). This array is then resampled (downsampled) to the resolution desired for the heat map (usually down to 30 or so levels).

Examples of products that can generate heat maps in a production-efficient way, are the Oracle ZFS Storage Appliance, Joyent Cloud Analytics, and Circonus.

There is also Tracelytics, although I can’t comment on its efficiency, as I don’t know its internals yet.

Other Uses

Heat maps (and trace2heatmap.pl) can be used to examine metrics other than latency, such as offset, I/O size, and utilization. For examples, see the heat maps in Visualizing Device Utilization, and Subsecond Offset Heat Maps.

Background

Bryan and I developed latency heat maps in 2008 for the ZFS Storage Appliance. For more background, see Visualizing System Latency.

Thanks to Deirdre for helping with another post!


Viewing all articles
Browse latest Browse all 34

Trending Articles