character|binary lob reading in DBD::ODBC

I currently maintain DBD:ODBC and needed to obtain a lob (large object) in chunks. It doesn't really matter why but in general the lob is very large and I can process it in chunks in my Perl code so it makes sense to read it in chunks instead of read it all in one go (thus requiring loads of memory). DBD::ODBC does not have this functionality currently. My first step was to see if DBI defines a lob read method I could implement. It does but although it is defined in DBI as blob_read there is no pod on it and my first pass at finding a DBD which implemented it came up with nothing (since then it has been pointed out that DBD::Oracle and DBD::Pg implement blob_read). The definition of blob_read in DBI is currently:
blob_read => { U =>[4,5,'$field, $offset, $len [, \\$buf [,$bufoffset]]'] }
which I presumed to mean read the lob at column $field from offset $offset for $len bytes into $buf and if $bufoffset is specified read it into $buf at this point. I'm presuming the $bufoffset is there to support reading the lob in chunks and appending to $buf. I have a number of problems with this:
  1. The name blob_read implies binary large objects but I want to read character large objects.
  2. You can only read lobs in ODBC via SQLGetData from the start to the end (or part way through) and you cannot jump in at an offset - of course, a DBD could throw away the unwanted bits.
  3. Some databases do not allow the reading of columns via SQLGetData out of order when other columns are bound (e.g., MS SQL Server unless you use a cursor which slows everything down) i.e., you can bind columns 1-5 then use SQLGetData for column 6 but you cannot bind columns 1,2,4,5 and use SQLGetData to read column 3 - as that would be out of order.
  4. You cannot part read column N with SQLGetData then switch to read part/all of column N+1 then go back to column N - you have to read all/part of column N then do column N+1 (and so on) and cannot got back to column N.
  5. There is no way to specify the type you want the column bound as. Of course, DBD::ODBC can default the column to its actual type but if it is a binary column you might still want to bind it as a string (where ODBC will return each byte in the binary column as 0xNN).
  6. I anticipated some issues with unicode data in clob columns. clobs (large columns containing character data) may be retrieved in ODBC as bytes in which case you may lose the meaning of the chrs if they have a code over 255 or unicode chrs encoded in so called (by ODBC) wide chrs (usually encoded in UCS-2 on WIndows). However, Perl wants unicode chrs encoded in UTF-8. The length you pass to blob_read only defines the size of the buffer which is used to read the next chunk but this will be full of 2 byte UCS-2 encoded characters. Once you have re-encoded them in UTF-8 the buffer required could be a lot larger.
  7. DBD::ODBC currently binds all columns in a select with SQLBindCol. I need to stop this happening for columns which you want to read as lobs.
Given all that I've put together an experimental odbc_lob_read method in DBD::ODBC which is currently checked in to the subversion repository in trunk and I will release over the next few days as a development release. I'd have liked to implement DBI's blob_read over the top of it but time and the concerns above have prevented me from doing so. As it stands, to read a lob you do something like this:
 $sth->bind_col($column, undef, {TreatAsLOB=>1});
 while(my $retrieved = $sth->odbc_lob_read($column, \my $data, $length)) {
      print "retrieved=$retrieved lob_data=$data\n";
 }
Optionally, you specify a type to retrieve the lob in the hashref 4th argument to odbc_lob_read (not shown above) e.g., to retrieve a binary lob as a character string you'd add "{Type => SQL_CHAR}" after $length above. For a unicode enabled build of DBD::ODBC, character lobs will be bound by default as SQL_C_WCHAR and hence will be returned into the allocated buffer of length $length as UCS-2 encoded characters. DBD::ODBC will then re-encode them in UTF-8 and grow the $data scalar to fit. In this case the returned length is in characters not bytes. I'd love some testing of this with various different ODBC drivers so if you can spare a few minutes to do so, get it from the subversion tree or wait a day or so and I'll put up a 1.24_2 development release.