­Using CDF-Plots for performance evaluation

Cumulative distribution functions (CDF) of response times come in handy to evaluate the performance of software functions. CDFs give you a tool to quickly state "X% of all measured calls return in less than Y seconds". This can be used to formulate or support Service Level Agreements (SLA): e.g. "90% of all requests must complete in less than 1 second".

Measure response times of your software program, module or whatever. Save the response times as a simple plain text file — each line containing one float.

Simple Python script "create-cdf-data.py" to generate CDF-data from a text file with response times:

#!/usr/bin/python

import sys

filename = sys.argv[1]
fd = open(filename, 'rb')

responseTimes = []
for line in fd.readlines():
if len(line.strip()):
responseTimes.append(float(line.strip()))

responseTimes.sort()

prob = 0
for time in responseTimes:
prob += 1.0 / len(responseTimes)
print "%.4f\t%.4f" % (prob, time)

Use the python script:

./create-cdf-data.py response-times.txt > cdf-data.txt
Gnuplot commands to plot the CDF-data:
set terminal svg size 500 375 enhanced
set key right bottom
set xlabel "Response Time [s]"
set ylabel "Probability"

set output "cdf-plot.svg"
plot "cdf-data.txt" using 2:1 title "My Prog" with linespoints

You can use Inkscape to open the SVG and save it as PDF. The result will look something like below:

Example CDF plot

See also

Attachments:

Created by Henning Jacobs
Guerra Creativa - Creative Crowdsourcing