It is often said don't optimize too early. I try very hard to keep to that, and I do mean very hard. In the project I am working on now we have a basically working implementation (although some functionality is still to code, it is on the edges - this probably counts as too early - oh dear). We have a rather complicated setup which I don't have the time to go into here but at its simplest it is the age old client-server relationship and involves a lot of database (Oracle via DBD::Oracle) access. The client in this case is really a daemon process receiving work to process and either processing it or passing it on to another server to be queued and worked on later.
The queueing server has had some benchmark code in it almost since day one using Benchmark::Timer and it also accepts a job which can be used to report the benchmark statistics for each type of job it queues. This works using B::T's tags like this:
To output a report of the jobs run, their total time and average time per call we simply call the reports method and get an array of this information. B::T uses Timer::HiRes and so on most systems it has very fine granularity. B::T can also be given error and confidence values, to determine, statistically, whether you need to collect more data (we never used those).
As the timing we were interested in was per job it was easy to wrap start/stop method calls around call to the job but if you want to time individual subs it can get a little annoying if there are multiple return points since each must be preceded with a stop call. In steps DashProfiler which we've only used a little recently but it rather neatly gets around the multiple return points in subs because it times until a scalar goes out of scope:
We only used DashProfiler in it's "Auto" mode as that is simple to use and I believe all we needed. At exit of your process a full dump of all the profiles is output and you can use the all_profiles_as_text method to get the profile data programatically. Like B::T it uses a hires timer, produces total and average times (and unlike B::T, the max time) and has the very definite advantage that you do not need to add code to all your exit points; as soon as $a above goes out of scope that ends the time for "mysub".
So far so good and B::T and DashProfiler have proved very useful to us in benchmarking our code and identifying long running subs. However, much of our code calls functions and procedures in Oracle packages and it is often these which slow things down. The cost of these calls can be calculated with B::T or DashProfiler by inserting the benchmark code around calls to DBI's execute method but we have a lot of calls to execute. DBI has an inbuilt profiler which can collect statistics on SQL executed; the number of times a unique piece of SQL is executed and the total and average time for each execution - very useful but a) Oracle's enterprise manager can tell us which SQL is taking a long time and even offer to analyse and optimize it and b) we also use DBIx::Log4perl to log DBI method calls, SQL and bind parameters and sometimes we want to know if DBIx::Log4perl or even Log::Log4perl is slowing us down too much. Here again, we don't want to edit all this code and add benchmark calls so in steps Devel::NYTProf.
D::N can be run on existing Perl code without modification; only adding "-d:NYTProf" to your perl command line. At exit, a nytprof.out file is created and you can use nytprofhtml to produce a fantastically detailed, browsable, linked view of all the modules, methods and even statements used with timings. This does mean your program has to run to completion or be interrupted as the processing of the output file is separate but there are controls to start/stop profiling programatically (so you could use these to stop the profiling and run the analysis separately) but the detail and navigation are so good you won't care. Every executed statement (by default) is shown from your source code with count of executions, inclusive and exclusive time and average. Wherever you see a sub/method called you can drill down or up and each sub call is annotated with timing information. You can even see the time spent in XS code. Absolutely invaluable and a must for anyone profiling Perl.
Like B::T and DashProfiler, D::N uses a fine grained timer and real time so you have to watch out for calls that can block (e.g., select() or in our case Oracle's wait_any) but we know that. Our only very small reservation at this time (and I'll update when I know more) is that I suspect D::N is slowing our Perl down and this may be impacting on our results since the time to get to calls which read sockets seems to be greater so they spend less time waiting on a blocking socket for data and so the results are slightly different than a run without D::N. However, for profiling Perl, D::N is the tool and we have found many optimizations over a very large code base and continue to use it. Many thanks to Adam Kaplan, Tim Bunce and Steve Peters.