Reading table data into an array

March 17, 2011

Suppose you have a table where each row represents a widget, and another table in which zero or more rows may refer to each widget, representing a set of values for some property of that widget. For example:

create table widgets (
    widget_id serial primary key,
    widget_name text unique
);

create table widget_flags (
    widget_id integer not null references widgets,
    flag_name text unique not null,
    flag_value integer not null,
    flag_colour text not null
);

Given these tables, it's easy to fetch all of a widget's flags given its id. Joining the two tables using widget_id would allow the widget's own data to be fetched with its flags, but it would result in as many rows for a widget as it had flags. I've often wished I could fetch all the flag data within a single row.

For various (well-known) reasons, it's a bad idea to store data in array columns, but using an array aggregate to read from the widget_flags table would be a neat way to stuff many rows' worth of data into a single column. For example, one could do something like:

> select widget_id,array_agg(flag_colour) from widget_flags
> group by widget_id;
 widget_id |    array_agg     
-----------+------------------
         1 | {blue,red,green}
(1 row)

But that's just one column, and what I really wanted was an array of arrays representing the values in each row. Because of various quirks of PostgreSQL's array implementation, I haven't found a truly satisfactory solution, but the definition of array_accum in the documentation semeed a reasonable starting point:

CREATE AGGREGATE array_accum (anyelement)
(
    sfunc = array_append,
    stype = anyarray,
    initcond = '{}'
);

To create an array of arrays, I changed anyelement argument to anyarray, and changed the sfunc from array_append to array_cat. The stype remained unaltered. Unfortunately, that wasn't quite enough. A query like "select aaa(array[a,b,c])" would return an array of values, not arrays. The only way to fix that was to change the initcond to '{{0,0,0}}', which meant that the first dummy element had to be removed afterwards.

CREATE FUNCTION allbutdummy(anyarray) RETURNS anyarray
AS $$select $1[2:array_upper($1,1)]$$ LANGUAGE 'SQL';

CREATE AGGREGATE aaa (anyarray)
(
    sfunc = array_cat,
    stype = anyarray,
    finalfunc = allbutdummy,
    initcond = '{{0,0,0}}'
);

With these definitions, I can do something like this:

> select widget_id,widget_name,
> (select aa(array[flag_name,flag_colour,flag_value::text])
>  from widget_flags where widget_id=w.widget_id) as flags
> from widgets w;
 widget_id | widget_name |               flags                
-----------+-------------+------------------------------------
         1 | foo         | {{a,blue,3},{b,red,4},{c,green,5}}
(1 row)

Unfortunately, this method has two major limitations. First, arrays in Postgres must contain values of the same type, so one is forced to cast some values to maintain compatibility (flag_value has been cast to text in the example above). A more serious problem is that the three-element array specified in the initcond forces every accumulated array to have exactly three elements. I have found no better way to work around this than to create a family of aggregate functions: aaa4, aaa5, ….

Thanks to RhodiumToad on IRC for helping me work out the details of this hack.