Flavio Poletti bio photo

Flavio Poletti

Irreducible Perler.

Email Comics Twitter LinkedIn MetaCPAN Github Stackoverflow Pinterest

RRDtool is a wonderful tool for collecting and graphing data.

RRDtool is the OpenSource industry standard, high performance data logging and graphing system for time series data. RRDtool can be easily integrated in shell scripts, perl, python, ruby, lua or tcl applications.

Take a look at the website for additional information… and read on for some things that I find useful.

Data collection

Data are collected into the database and fetched from it. The collection is split into two parts: how they are read, and how they are stored.

Reading of data is specified through the description of a Data Source, or DS. See the docs about rrdtool create for the details, but it’s useful to know that:

  • GAUGEs are inputs that can go up and down. Like a temperature, the voltage at some pin or the amount of money in a bank account.
  • COUNTER is for meters that can only increase, like e.g. the number of times that you a light turns on, the quantity of bits that enter an interface or that the sun rises in the morning.
  • DERIVE can be used for the same kind of data that a GAUGE is for, but focuses on the difference with respect to the previous read instead of the absolute value. This can be useful e.g. if you want to track an increase or decrease rate for a quantity. The docs page about rrdtool create also has additional remarks about the relation between DERIVE and COUNTER, so give it a try if you’re having trouble with your COUNTERs.
  • ABSOLUTE is for counters that get reset upon reading. So, each time you read the value you reset the counter, do you?

RRDtool is mostly interested into rates, so all of the above are actually translated into a rate, except for GAUGEs that are stored as-is (so that you can track things that actually have little to do with rates). If you want to graph the stock market, use GAUGE.

Times

Time handling in RRDtool is quite interesting. It is assumed that you will feed a new set of values every step, where the step is specified in seconds. The default is 300, so you’re supposed to feed a new set of values every 5 minutes, but of course you can set what you see fit.

The relevant concepts for times in RRDtool are:

  • step, i.e. the length of the time range
  • start, i.e. when a time range starts
  • end, i.e. when a time range ends

You can set a start time when you create a database, but the real start time will be set depending on the step - in particular, as an integer multiple of the step.

It’s useful to think the line of time as a sequence of time intervals: interval 1, interval 2, …, interval N. The real start is 0, corresponding to when the epoch starts (January 1st, 1970), but time is actually a sequence of intervals and not of points.

Values stored in the database are always referred to one interval, not to a point in time.

So, what do start and end mean actually? They are used as ways to specify the intervals we are interested into. Each is first framed into one interval, then the sequence of intervals from the start’s to the end’s (included) are considered.

When we specify a point in time that separates two intervals, it is assigned to the following one. So, if the step is equal to 60 and start is 600 (separating the two intervals 540-600 and 600-660), the interval considered is 600-660. This is the same as saying that intervals are closed on the left and open on the right.

Intervals are represented with the end time of the interval. so, in the example above, if you specify start as 600, the related interval that you will get first is the one marked with 660.

Example: consider a database with a step of 60 seconds and capable of collecting up to three values. The start time has to be “quite high” to avoid incurring in some do what I mean behaviour of RRDtool.

start=600000000
items=3
step=60

rrdtool create test.rrd \
   --step $step \
   --start $start \
   DS:testdata:GAUGE:120:U:U \
   RRA:MAX:0.5:1:$items

for i in 1 2 3 ; do
   time=$(($start + $i * $step))
   rrdtool update test.rrd $time:$i
done

end=$(rrdtool last test.rrd)

rrdtool fetch test.rrd MAX --start=$start --end=start+180

The output is:

 600000060: 1.0000000000e+00
 600000120: 2.0000000000e+00
 600000180: 3.0000000000e+00
 600000240: -nan

which shows how both start, end and the marker for an interval are chosen according to what described above.

As an additional note, it has to be considered that real intervals might be compound of multiples of the configured step. For example, if you have a round robin archive (RRA) that aggregates 5 values with a step of 60, each data point actually refers to 300 seconds (5 minutes). When this RRA is accessed, the relevant start and stops will yield time intervals that align to a 300-seconds chunking of the time line starting from the origin of the epochs.

Getting the right data

If you want to be sure to get the right data out of an RRD database, you have to ensure some things:

  • you know which round robin archive you’re looking at
  • you know how many data points to ask

RRDtool will try to give you the best available data, but e.g. if you have fine grained data for the last week and you ask for data in the last ten days, you’ll hit a different RRA (if available).

To get exactly all the data in a RRA you can do as follows (assuming the database file is test.rrd):

  1. run rrdtool info test.rrd to get the relevant data. You will find something like this:

    filename = "test.rrd"
    rrd_version = "0003"
    step = 60
    last_update = 600018000
    header_size = 736
    ds[testdata].index = 0
    ds[testdata].type = "GAUGE"
    ds[testdata].minimal_heartbeat = 120
    ds[testdata].min = NaN
    ds[testdata].max = NaN
    ds[testdata].last_ds = "300"
    ds[testdata].value = 0.0000000000e+00
    ds[testdata].unknown_sec = 0
    rra[0].cf = "MAX"
    rra[0].rows = 300
    rra[0].cur_row = 157
    rra[0].pdp_per_row = 1
    rra[0].xff = 5.0000000000e-01
    rra[0].cdp_prep[0].value = NaN
    rra[0].cdp_prep[0].unknown_datapoints = 0
    rra[1].cf = "MAX"
    rra[1].rows = 300
    rra[1].cur_row = 66
    rra[1].pdp_per_row = 20
    rra[1].xff = 5.0000000000e-01
    rra[1].cdp_prep[0].value = -inf
    rra[1].cdp_prep[0].unknown_datapoints = 0
    
  2. detect the RRA - there might be many in a database, so pick your favourite. We will assume that you want to focus on rra[1] in the example above;
  3. identify the following basic variables:

    • step
    • last_update
    • pdp_per_row (rra[1].pdp_per_row in the example)
    • rows (rra[1].rows in the example)
  4. calculate the RRA interval length as superstep = step * pdp_per_row
  5. calculate the end time of the last interval with meaningful data as real_end = last_update % superstep (% representing the modulus function)
  6. consider start = real_end - superstep * rows + 1 and end = real_end - 1. The addition/subtraction of one second is to be sure to fall inside an interval instead of being at one border, just to avoid surprises (this is actually needed for end only)

You can then consider start, end and superstep for usage in rrdtool fetch (respectively for --start, --end and --resolution) and in rrdtool graph (respectively for --start, --end and --step).

The above is implemented in the following Perl program get-full-interval.pl:

#!/usr/bin/env perl
use strict;
use warnings;
use English qw< -no_match_vars >;
use List::Util qw< reduce >;
use Data::Dumper;

use RRDs;

$OUTPUT_AUTOFLUSH = 1;
my ($db, $rra_id) = @ARGV;
my $info = rrd_info($db);

my $step = $info->{step};
my $last = $info->{last_update};

my $rra       = $info->{rra}[$rra_id];
my $superstep = $step * $rra->{pdp_per_row};

my $real_end = $last - ($last % $superstep);
my $end      = $real_end - 1;
my $start    = $real_end - ($superstep * $rra->{rows}) + 1;

print "$start $end $superstep $rra->{rows}\n";

sub rrd_info {
   my ($db) = @_;
   my $raw = RRDs::info($db);
   my %retval;

   while (my ($key, $value) = each %$raw) {
      my $ref = path_to_pointer(\%retval, name_to_path($key));
      $$ref = $value;
   }

   return \%retval;
} ## end sub rrd_info

sub name_to_path {
   my ($name) = @_;
   return map {
      if (my ($name, $id) = m{^(.+?)\[(.+)\]$}mxs) {
         ($name, ($name =~ m{^(?:rra|cdp_prep)$}mxs) ? [$id] : $id);
      }
      else {
         $_;
      }
   } split /\./, $name;
} ## end sub name_to_path

sub path_to_pointer {    # see http://www.perlmonks.org/?node_id=443584
   return reduce(sub { ref($b) ? \($$a->[$b->[0]]) : \($$a->{$b}) },
      \shift, @_);
}

Call this program as:

$ get-full-interval.pl test.rrd 1

where the first parameter is the name of the RRD database and the second parametrs is the identifier of the RRA you are interested into. The program will output, in order, the following parameters:

  • value for --start
  • value for --end
  • the length of the interval (to be used as --step or --resolution where these parameters make sense)
  • the number of data points you will get (useful for setting the right --width if you want to produce a graph)

Graphing a whole database

The following program produces a graph for each variable and each RRA you have in your database, according to the hints provided in the previous section:

#!/bin/bash

db=$1
root=$(basename "$db" .rrd)
variables=$(rrdtool info "$db" | sed -n 's/^ds\[\(.*\)\]\.index.*/\1/p')
rrdtool info "$db" \
| sed -n 's/^rra\[\(.*\)\]\.cf.*"\(.*\)"$/\1 \2/p' \
| while read rra cf ; do
      ./get-full-interval.pl "$db" $rra | (
         read start end step rows
         for variable in $variables ; do
            rrdtool graph "$root-$variable-$rra-$cf.png" \
               --start $start \
               --end $end \
               --step $step \
               --width $rows \
               --disable-rrdtool-tag \
               "DEF:v=$db:$variable:$cf" \
               LINE1:v#000
         done
      )
   done

Of course this is one graph per variable without any fancy bell or whistle… start from rrdtool graph to learn all the masters’ tricks!