The Advisory Boar

By Abhijit Menon-Sen <ams@toroid.org>
2011-03-17

Reading table data into an array

Suppose you have a table where each row represents a widget, and another table in which zero or more rows may refer to each widget, representing a set of values for some property of that widget. For example:

create table widgets (
    widget_id serial primary key,
    widget_name text unique
);

create table widget_flags (
    widget_id integer not null references widgets,
    flag_name text unique not null,
    flag_value integer not null,
    flag_colour text not null
);

Given these tables, it's easy to fetch all of a widget's flags given its id. Joining the two tables using widget_id would allow the widget's own data to be fetched with its flags, but it would result in as many rows for a widget as it had flags. I've often wished I could fetch all the flag data within a single row.

For various (well-known) reasons, it's a bad idea to store data in array columns, but using an array aggregate to read from the widget_flags table would be a neat way to stuff many rows' worth of data into a single column. For example, one could do something like:

> select widget_id,array_agg(flag_colour) from widget_flags
> group by widget_id;
 widget_id |    array_agg     
-----------+------------------
         1 | {blue,red,green}
(1 row)

But that's just one column, and what I really wanted was an array of arrays representing the values in each row. Because of various quirks of PostgreSQL's array implementation, I haven't found a truly satisfactory solution, but the definition of array_accum in the documentation semeed a reasonable starting point:

CREATE AGGREGATE array_accum (anyelement)
(
    sfunc = array_append,
    stype = anyarray,
    initcond = '{}'
);

To create an array of arrays, I changed anyelement argument to anyarray, and changed the sfunc from array_append to array_cat. The stype remained unaltered. Unfortunately, that wasn't quite enough. A query like "select aaa(array[a,b,c])" would return an array of values, not arrays. The only way to fix that was to change the initcond to '{{0,0,0}}', which meant that the first dummy element had to be removed afterwards.

CREATE FUNCTION allbutdummy(anyarray) RETURNS anyarray
AS $$select $1[2:array_upper($1,1)]$$ LANGUAGE 'SQL';

CREATE AGGREGATE aaa (anyarray)
(
    sfunc = array_cat,
    stype = anyarray,
    finalfunc = allbutdummy,
    initcond = '{{0,0,0}}'
);

With these definitions, I can do something like this:

> select widget_id,widget_name,
> (select aa(array[flag_name,flag_colour,flag_value::text])
>  from widget_flags where widget_id=w.widget_id) as flags
> from widgets w;
 widget_id | widget_name |               flags                
-----------+-------------+------------------------------------
         1 | foo         | {{a,blue,3},{b,red,4},{c,green,5}}
(1 row)

Unfortunately, this method has two major limitations. First, arrays in Postgres must contain values of the same type, so one is forced to cast some values to maintain compatibility (flag_value has been cast to text in the example above). A more serious problem is that the three-element array specified in the initcond forces every accumulated array to have exactly three elements. I have found no better way to work around this than to create a family of aggregate functions: aaa4, aaa5, ….

Thanks to RhodiumToad on IRC for helping me work out the details of this hack.

Tags: postgres • Link: etc/reading-tables-into-arrays

2010-08-16

Returning records from PL/PgSQL

I needed to write a plpgsql function that inserted rows into some tables and returned the generated primary key values to the caller. I had never needed to return multiple values from a function before, and I ran into a couple of new problems along the way.

This was my first attempt:

create function create_widget(name text, colour int) returns record as $$
declare
    wid int;
    cid int;
    ids record;
begin
    insert into widgets (name) values (name)
        returning widget_id into wid;
    insert into widget_colours (widget_id, colour)
        values (wid, colour)
        returning colour_id into cid;
    select wid::int, cid::int into ids;
    return ids;
end;
$$ language plpgsql;

When I tried this, I got a syntax error which I eventually understood to mean that my parameters could not have the same names as any columns in the table I was inserting into. Not very surprising, since the inserts look a bit ambiguous even when reading the code. Since parameter names occur only in the function definition and aren't exposed to the caller, I didn't mind working around this by renaming my parameters "p_name" and "p_colour".

What did surprise me was that I had to call the function like this:

select * from create_widget('x', 3) as cw(widget_id int, colour_id int);

In other words, the caller always had to supply a column definition list, and I couldn't (see any way to) specify what I was going to return in the function definition.

I was given two suggestions about how to work around this problem. The first was to use OUT parameters:

create function create_widget(name text, colour int,
                              widget_id OUT int, colour_id OUT int)

But this approach, predictably enough, had the same problem with the names of parameters conflicting with the names of columns, and I was much less willing to force the caller to retrieve and have to use columns named "r_widget_id" or something similar.

With Postgres 8.4 and above, I learned that I could also declare the function as returning a table, which looked exactly like what I had hoped for:

create function create_widget(name text, colour int)
    returns table (widget_id int, colour_id int)

But to my great disappointment, the conflict with the column names persists even in this form.

In the end, I decided to use "p_" prefixed parameter names, return a record, and name the columns in the caller.

Tags: postgres • Link: etc/plpgsql-returning-records