Faster way to serialize a Perl hash

As anyone who reads my blog knows we've been profiling a large amount of Perl code recently. A daemon process receives jobs to run and in most cases (a few are run immediately) places them on a queue in the database. For a queued job we are really interested in the turnaround time i.e., the time from seeing an incoming request, decoding it (it is in JSON), checking it, inserting the request into the database and returning a unique job ID; obviously this determines how quickly we can queue jobs.

Part of the data structure inserted into the database is a serialized copy of a hash containing information about the requested job. It is only used at the moment as a human readable copy of the job and does not need to be converted back into a Perl hash although that may be a requirement in the future e.g., for replaying jobs. There is nothing special about the stored hash; there are no blessed references, just a plain hash that may contain array references, XML data in scalars etc and it is UTF-8 encoded. The current code uses Data::Dumper to serialize this hash and although it does not take a long time it looked like something we could improve upon.

As we are already using JSON::XS, JSON is readable and JSON::XS handles UTF-8 encoded data fine I decided to pit Data::Dumper against JSON::XS.

use Benchmark::Timer;
use Data::Dumper;
use JSON::XS;
my $bm = Benchmark::Timer->new(skip => 0);

my $data = {priority => 2,clientref => 'A huge long string aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa',jobid => '366107',name => 'XXXXXXXXXX',args => {yyyyy => {},user_id => '1000',zzzzzzz => {'3146' => {aaaaa => '5.04',bbbbbb => '2'}},account_id => '4'},type => 'aaaaaaa',sessionid => '758477502DC65F69E040007F01000850',jobflags => 0};

my $iters = 100000;
my $s;

$Data::Dumper::Indent = 0;
$Data::Dumper::Terse = 1;
$Data::Dumper::Quotekeys = 0;
$Data::Dumper::Useperl = 0; # this is a default depending on what your object contains
$Data::Dumper::Sortkeys = 0; # in fairness this is the default
$bm->start('Dumper');
for (my $i = 0; $i < $iters; $i++) {
    $s = Data::Dumper->Dumper($data);
}
$bm->stop('Dumper');
print $bm->report('Dumper');

$bm->start('JSON::XS');
for (my $i = 0; $i < $iters; $i++) {
    $s = encode_json($data);
}
$bm->stop('JSON::XS');
print $bm->report('JSON::XS');

The $data hash is fairly typical job data. The first thing to note is that the defaults for Data::Dumper are not the best for producing the most compact output requiring the least work (that is fair enough). The second thing to note is that there does not seem to be any way to alter Data::Dumper output other than through global settings (please correct me if you know otherwise). When run we get:

1 trial of Dumper (16.649s total)
1 trial of JSON::XS (704.014ms total)

OK, that is for 100000 serializations but that is well over 16 times faster and the change to JSON::XS will save us quite a bit of CPU over time and we can still decode the JSON back into a hash if we need to later. Small savings like this add up. We have not compromised anything as a) we were already using JSON::XS, b) the serialization is still readable and of a similar size c) we can convert back to a Perl hash later if we need to.

Comments

Data::Dumper configuration

Re: "there does not seem to be any way to alter Data::Dumper output other than through global settings" If you use the OO interface to Data::Dumper, you can configure the output by calling methods on the object before calling Dump. (There's a method with the same name as each global config variable.)