DBD::Oracle and collections and a surprising speed up with InstantClient 11.2

Recently at $work I've been battling with some Perl code which retrieves data from Oracle via DBD::Oracle and a package function which returns a reference cursor. As I've mentioned before in this blog, the user has no select privilege on the database but can call package procedures/functions which return reference cursors and hence data from the database.

The query we have a problem with attempts to return multiple rows but one column is actually a list of primary keys from another table:

table1
table1_id (primary key)
table1_name

table2
table2_id (primary key)
table2_name

table3
table3_id
table1_id (foreign key on table1)
table2_id (foreign key on table2)

and table3 has many rows with table1_id the same but different table2_id values. What we want in the result from querying table3 is something like this:

table1_id, table1_name,list_of_table2_ids_having_table1_id_from_table3

Ultimately, the result-set is turned into JSON and the format should be:

[table1_id, table1_name, [table2_id, table2_id, table2_id]]

The first rather niave attempt created a function which when passed a table1_id returned a comma seperated string of concatenated table2_ids. However, it suffers from 2 problems:

  1. this function in the SQL stops Oracle optimizing it
  2. we have to split the column on the comma to get an array of table2_ids.

This solution kept popping up in the top SQL by cpu.

The second attempt was using function again but with a "SYS_CONNECT_BY_PATH" and a "DENSE_RANK" but this was also slow.

The third solution was using "WM_CONCAT" which was blindingly fast and we were happy. Something like:

SELECT
table1_id,
table1_name,
WM_CONCAT(table2_id) AS table2s
FROM (
SELECT DISTINCT
table1.table1_id,
table1.name),
table3.table2_id
FROM
table1
JOIN table3 ON table3.table1_id = table1.table1_id
JOIN table2 ON table2.table2_id = table3.table2_id AND
ORDER BY
table1.table1_id,
table3.table2_id
)
GROUP BY
table1_id,
table1_name
ORDER BY 1;

All was well (other than having to split the returned string in Perl) until the number of table2_ids for each table1_id in table3 increased beyond 4K of data and then we realised all the solutions were wrong. Although 32K would have held all possible table2_ids 4K would not and although our first solution used a varchar2(32K) in pl/sql as a reference cursor when it was returned in the result-set it had a maximum size of 4K as a varchar2 column :-(

So we turned to collections in Oracle and the SQL became:

SELECT
table1.table1_id,
table1.table1_name,
CAST(COLLECT(table2_id) AS int_array_t) AS table2s
FROM
(
SELECT DISTINCT
table1.table1_id,
table1.table1_name,
table2.table2_id
FROM
table1
JOIN table3 ON table3.table1_id = table1.table1_id
JOIN table2 ON table2.table2_id = table3.table2_id AND
ORDER BY
1,
3
)
GROUP BY
table1_id,
table1_name
ORDER BY
1;

The extra benefit of using an Oracle collection is that the list of table2_ids is returned as a Perl array reference so no more splitting in Perl when converting to JSON - it is just right. Also, we no longer care about any 4K limit as the collection bypasses that. It looked like a big win until we tested it and it was very slow at retrieving the result-set (4s for 4000 rows). Looking into this it appears collections are like lobs in that it requires DBD::Oracle to do a round trip for each collection to retrieve it and so prefetch_rows is disabled. This might have explained it except the same Perl on a different machine took only 0.6s. After much fiddling about to rule out differences it turns out the slow machine was using InstantClient 11.1 and the fast machine was using InstantClient 11.2 and after upgrading the slow machine to 11.2 it was a lot quicker. I've not identified the change in InstantClient 11.2 which is responsible but I've tested this on a few machines and it is a lot quicker with InstantClient 11.2.