Why is taint mode so slow?

In the last few days on and off I've been trying to locate a problem with a fairly complicated Perl daemon using POE which was taking a huge amount of time to return a large file via LWP::UserAgent. It's taken a while since in reality we were not doing a simple HTTP GET and the circumstances under which the problem occurs are difficult to set up. However, I've at last tracked it down to taint mode.

We have a 30Mb file on an Apache web server on our local 1Gb network and I've reduced the code down to:

use LWP::UserAgent;
use HTTP::Request;

my $ua = LWP::UserAgent->new();
# file.dat is approx 30Mb and Apache is configured to return .dat files in UTF-8 encoding
$get = HTTP::Request->new('GET', 'http://localhost/file.dat');
my $r = $ua->request($get);
print length($r->decoded_content);

The time taken for this script is around 3 seconds normally but add taint mode (-t) and it goes up to over 5 minutes.

We were using Perl 5.10.0 but I tried 5.10.1 and 5.12.1 also.

I used Devel::NYTProf to examine what was happening (see Long time reported for ref($x->{key})) and this is what I got:

150                                    
# spent 314s (314+141ms) within HTTP::Message::add_content which was called 7576 times, avg 41.5ms/call: # 7576 times (314s+141ms) by LWP::Protocol::__ANON__[/usr/local/share/perl/5.10.0/LWP/Protocol.pm:139] at line 137 of LWP/Protocol.pm, avg 41.5ms/call
{
151     7576    13.4ms                  my $self = shift;
152     7576    20.5ms                  $self->_content unless exists $self->{_content};
153     7576    14.4ms                  my $chunkref = \$_[0];
154     7576    14.2ms                  $chunkref = $$chunkref if ref($$chunkref); # legacy
155                                    
156     7576    60.9ms  7576    141ms   _utf8_downgrade($$chunkref);
# spent 141ms making 7576 calls to HTTP::Message::__ANON__[HTTP/Message.pm:18], avg 19µs/call
157                                    
158     7576    313s                    my $ref = ref($self->{_content});
159     7576    54.1ms                  if (!$ref) {
160     7576    335ms                   $self->{_content} .= $$chunkref;
161                                     }
162                                     elsif ($ref eq "SCALAR") {
163                                     ${$self->{_content}} .= $$chunkref;
164                                     }
165                                     else {
166                                     Carp::croak("Can't append to $ref content");
167                                     }
168     7576    235ms                   delete $self->{_parts};
169                                     }
170                                    
171                                     sub add_content_utf8 {
172                                     my($self, $buf) = @_;
173                                     utf8::upgrade($buf);
174                                     utf8::encode($buf);
175                                     $self->add_content($buf);
176                                     }

313 seconds doing my $ref = ref($self->{_content});! It seems much more likely the 313s was spent in $self->{_content} .= $$chunkref because as far as I can see this builds up the HTTP response 4K at a time concatenating onto an ever growing string. Tim Bunce took a look at my nytprof.out file and said the clicks spent in the offending line started low and kept on increasing to ridiculous levels so perhaps something is going on in taint mode which occurs on the result of the concatenation. As the string grows whatever taint mode causes has to work on an ever increasing string and hence takes longer each iteration.

For now I've had to disable taint mode as we cannot cope with this. I'd appreciate any ideas.

Comments

An even more straight forward example

In Perl 5.12.1 the following code takes less than 1s to run without taint mode and in taint mode 8 minutes!

use strict;
use warnings;
use Scalar::Util qw(tainted);

my $fd;
open($fd, ">", "file.dat");
print $fd 'x' x 4096;
close $fd;

my $data;
{
    local $/;
    open ($fd, "<", "file.dat");
    $data = <$fd>;
    close $fd;
}
print "data is tainted: ", tainted($data) ? 'yes' : 'no', "\n";

my %hash;
$hash{content} = '';

foreach (1..10000) {
    my $ref = ref($hash{content});
    #if (!$ref) {
        $hash{content} .= $data;
    #}
}

print length($hash{content}), "\n";

Use an array of strings instead of a simple string

Not a direct answer to your question, but a solution to your problem... The following, which replaces string concatenation with pushing onto an array, runs 'tainted' in less than 0.2 seconds on my laptop.
use strict;
use warnings;
use Scalar::Util qw(tainted);

my $fd;
open($fd, ">", "file.dat");
print $fd 'x' x 4096;
close $fd;

my $data;
{
    local $/;
    open ($fd, "<", "file.dat");
    $data = <$fd>;
    close $fd;
}
print "data is tainted: ", tainted($data) ? 'yes' : 'no', "\n";

my %hash;
$hash{content} = [];        # change #1

foreach (1..10000) {
    my $ref = ref($hash{content});
    #if (!$ref) {
        push @{$hash{content}}, $data;    # change #2
    #}
}

print length(join "", @{$hash{content}}), "\n";    # change #3

Interesting solution

Thanks for that hbaragar. Unfortunately the code which was causing me the problem is actually in LWP. The issue in Perl is now fixed but it was not back ported however I've been able to move up to 5.14 now so it does not matter to me any more.

reported to perlbug

See rt 75954.