It all started when a daemon I am developing hit a job which took too long to run and so it ignored any connecting clients whilst running the job. A long term plan exists to sort this out but it is getting in the way of some test code so I decided to write a small workaround until I am ready to implement the full solution. The workaround seemed so simple I thought it would only take half an hour to do - how wrong I was.
The daemon accepts connections from clients that describe a job that needs doing. The daemon queues the job, returns a unique job ID then considers work on its queues to do. Whilst executing a job on the queue it is not accepting new incoming connections. Each requested job is logged in a database and updated once it is finished. In the long term, the queue will be maintained in the database (instead of the daemon) and another daemon will queue jobs into the database and the existing one will pick work to do from the database and then we can run many daemons simultaneously.
The workaround was simply to create a pipe (IO::Pipe), fork (Proc::Fork), the child does the work and the parent monitors the child to see when it is finished (in it's main loop) - I've done it so many times before in C in the past I never imagined it would cause the trouble it did.
The problems started with our use of DBI as the parent has 3 open database connections and continued with a) our use of DBD::Oracle b) Oracle and c) DBIx::Log4perl. None of these bits of software like forking. Strangely, I've never written any Perl code which forks after opening a database connection.
Having implemented the fork with Proc::Fork the first problem was that when the client exited, my database connections in the parent were invalidated (Oracle said the session was closed). A quick look through the DBI docs and I'd identified the InactiveDestroy attribute which needs to be set on the DBI connections inherited by the child to stop DBI from disconnecting them when they go out of scope (at exit). However, this did not work and after a bit of head scratching I discovered it was because we wrap DBI in another package (which sets NLS_LANG, datetime format etc on connection) which calls disconnect in its DESTROY (someone, perhaps me, trying to be clean and tidy but it needed commenting out to let DBI handle the destruction and test InactiveDestroy). Having fixed this, I still had problems so I decided to go back to basics, ignore all our wrappers and DBIx modules and produce a test case with Perl and DBI:
This works fine. So next I changed this to introduce DBIx::Log4perl which we use to log all the method calls to DBI, their arguments and results. The only changes are to change the DBI->connect call to DBIx::Log4perl->connect and to use Log::Log4perl to initialize a log configuration file:
This worked and I almost left it at that except I'd seen an example on perlmonks that used the DBI clone method (I should have stopped there). The clone method clones a (connected or disconnected) DBI connection handle returning a new DBI connection handle created for the same DSN and with the same username, password and attributes. Instead of calling DBI->connect again in the child you call clone on the original DBI connection handle and get a new one back. This seemed ideal and the way to go so I changed the child connect call to clone and all hell broke lose in DBIx::Log4perl (as I later discovered was due to DBIx::Log4perl::connect not being called)
As usual Tim Bunce provided the answer and pointed me at the "connected" method in DBI which is called at the end of connect to signal a subclassed DBI we have a new connection. It seemed that all I needed to do was move the DBIx::Log4perl setup code in the overriden connect method to the new connected method. I did this and it worked but now I got a load of warnings when DBIx::Log4perl->connect was called, one for each DBIx::Log4perl attribute I was passing in the connect call - why? Well it turns out that DBI warns about attributes set in the connect it does not know about and previously, I was capturing these attributes in my overriden connect, acting upon them and then deleting them before calling connect in DBI. Now I'd moved the code down into the connected method I could not delete them or they would not get through to the connected method called from connect but that meant DBI sees them and warns.
This is where I've left it for today. My existing daemon works now with DBI, DBIx::Log4perl (with many changes), DBI::clone, Proc::Fork and IO::Pipe but I get warnings out on the terminal for every non-DBI attribute set in the connect call. Lessons learnt - well I could have stopped before spotting the clone method and all would have been ok. There were other issues a long the way too specific to detail here and this made the changes take even longer. The original plan which I was loathe to implement now seems even more attractive now than it did at the start of the day - oh well.