DBD::Oracle 1.23 does not seem to support unicode in error strings

Spent some time today trying to work out why the system I am working on was displaying unicode strings in errors double utf-8 encoded. Looks like a bug in DBD::Oracle. I posted the following example to dbi-users list: ========== Hi, We have a procedure which calls RAISE_APPLICATION_ERROR with a unicode string but when the error is retrieved from DBD::Oracle, the string is valid UTF-8 but not marked as such. This means when it is printed to a file handle with utf-8 encoding it is double encoded (unlike unicode data retrieved from a result-set due to a select). I've not looked into DBD::Oracle code as yet but this suggests to me data retrieved from selects is marked utf-8 but errors are not. Here is a small self contained example (with output):
#!/usr/bin/perl
use warnings;
use strict;
use Encode;
use DBI;
# Note: NLS_LANG=AMERICAN_AMERICA.AL32UTF8
binmode(STDOUT, ":utf8");

my $dbh = DBI->connect(
                       'dbi:Oracle:host=xxx.yyy.local;sid=devel',
                       'xxx',
                       'xxx');

eval {$dbh->do(q/drop procedure p_martin/)};
my $proc = <<'EOT';
create procedure p_martin(arg IN VARCHAR2) IS
BEGIN
      RAISE_APPLICATION_ERROR(-20001, 'Error ' || arg);
END;
EOT

$dbh->do($proc);
$dbh->{RaiseError} = 0;
$dbh->{PrintError} = 0;
my $arg = "\x{20ac}";
my $sth = $dbh->prepare(q{BEGIN p_martin(?); END;});
$sth->bind_param(1, $arg);
if (!$sth->execute) {
    my $error = $dbh->errstr;
    print "Is utf8: " .
        (Encode::is_utf8($error, Encode::FB_CROAK) ? 'yes' : 'no') . "\n";
    print "error: $error, \n";
    print "Should be $arg\n";
    my $utf8 = Encode::decode_utf8($error);
    print "Error decoded: $utf8\n";
}
which outputs:
$ perl dbd_oracle_errors_not_unicode_example.pl
Is utf8: no
error: ORA-20001: Error €
ORA-06512: at "BET.P_MARTIN", line 3
ORA-06512: at line 1 (DBD ERROR: OCIStmtExecute),
Should be €
Error decoded: ORA-20001: Error €
ORA-06512: at "BET.P_MARTIN", line 3
ORA-06512: at line 1 (DBD ERROR: OCIStmtExecute)
I would have expected "Error €" but the euro was double utf-8 encoded as it was not marked as utf-8 internally in perl. I should have time to look at this more internally in DBD::Oracle tomorrow but I thought I'd just throw this out now as a probable bug. Of course, any help would be appreciated. Martin ========== For now I've I to wrap the call to get DBI errstr in a Encode::decode_utf8() as I haven't got time to look at fixing it in DBD::Oracle.

Comments

Patch to fix rt 46438

The following replacement for oci_error_err in oci8.c fixes this issue. The problem is simply that DBD::Oracle was not decoding the UTF-8 encoded data and using the char* DBIh_SET_ERR_CHAR instead of DBIh_SET_ERR_SV.
int
oci_error_err(SV *h, OCIError *errhp, sword status, char *what, sb4
force_err)
{
        dTHX;
        D_imp_xxh(h);
        sb4 errcode;
        SV *errstr_sv = sv_newmortal();
        SV *errcode_sv = sv_newmortal();
        errcode = oci_error_get(errhp, status, what, errstr_sv, DBIS->debug);

        /* The following should really know if Oracle is returning us
           UTF8 encoded errors or not. I do not know how you know this but
           I imagine DBD::Oracle must know this for result-set data. */

#ifdef sv_utf8_decode
        sv_utf8_decode(errstr_sv);
#else
        SvUTF8_on(errstr_sv);
#endif

        /* DBIc_ERR *must* be SvTRUE (for RaiseError etc), some */
        /* errors, like OCI_INVALID_HANDLE, don't set errcode.  */
        if (force_err)
        errcode = force_err;
        if (status == OCI_SUCCESS_WITH_INFO)
        errcode = 0;    /* record as a "warning" for DBI>=1.43 */
        else if (errcode == 0)
        errcode = (status != 0) ? status : -10000;

        sv_setiv(errcode_sv, errcode);
        DBIh_SET_ERR_SV(h, imp_xxh, errcode_sv, errstr_sv, Nullsv, Nullsv);
        return 0;       /* always returns 0 */
}
NOTE: only using SvUTF8_on as a last resort - the proper way is to use sv_utf8_decode because just turning on the UTF8 flag on a scalar is dangerous if the scalar is not really UTF8 encoded (see UTF8 related proof of concept exploit released at T-DOSE and warning from Tim last year). If the perl is new enough (I think 5.8.1) it has a nice safe sv_utf8_decode else we fall back on dangerous svUTF8_on. NOTE: see comment in code above. I don't know how to know if Oracle is supposed to be returning UTF8 encoded error strings. I imagine it is like whatever you do with result-sets to know if the data is UTF8 encoded. It should really be in a test of: if (oracle_is_returning_utf8_encoded_data)

Bug in above patch

I think the Nullsv's in the call to DBIh_SET_ERR_SV should be &sv_undef or it will segfault in DBI. John Scoles has included this patch in the rs_array branch of DBD::Oracle in subversion. Perhaps it will make it's way into the next release (1.24?) of DBD::Oracle.

I've reported this at #46438:

Update: Errors returned by procedures are not unicode strings

Tim Bunce has identified the criteria under which my patch should be applied. See the RT for the final fix.

Update: Errors returned by procedures are not unicode strings

This change is included in DBD::Oracle 1.24 RC4 so will likely make it to the full release.

Changes made it to 1.24a release

These changes made to the 1.24a final release.