Suppose you have a table where each row represents a widget, and another
table in which zero or more rows may refer to each widget, representing
a set of values for some property of that widget. For example:
create table widgets (
widget_id serial primary key,
widget_name text unique
);
create table widget_flags (
widget_id integer not null references widgets,
flag_name text unique not null,
flag_value integer not null,
flag_colour text not null
);
Given these tables, it's easy to fetch all of a widget's flags given its
id. Joining the two tables using widget_id would allow the widget's own
data to be fetched with its flags, but it would result in as many rows
for a widget as it had flags. I've often wished I could fetch all the
flag data within a single row.
For various (well-known) reasons, it's a bad idea to store data
in array columns, but using an array aggregate to read from the
widget_flags table would be a neat way to stuff many rows' worth of data
into a single column. For example, one could do something like:
> select widget_id,array_agg(flag_colour) from widget_flags
> group by widget_id;
widget_id | array_agg
-----------+------------------
1 | {blue,red,green}
(1 row)
But that's just one column, and what I really wanted was an array of
arrays representing the values in each row. Because of various quirks of
PostgreSQL's array implementation, I haven't found a truly satisfactory
solution, but the definition of array_accum in the documentation semeed
a reasonable starting point:
CREATE AGGREGATE array_accum (anyelement)
(
sfunc = array_append,
stype = anyarray,
initcond = '{}'
);
To create an array of arrays, I changed anyelement argument to anyarray,
and changed the sfunc from array_append to array_cat. The stype remained
unaltered. Unfortunately, that wasn't quite enough. A query like "select
aaa(array[a,b,c])" would return an array of values, not arrays. The only
way to fix that was to change the initcond to '{{0,0,0}}', which meant
that the first dummy element had to be removed afterwards.
CREATE FUNCTION allbutdummy(anyarray) RETURNS anyarray
AS $$select $1[2:array_upper($1,1)]$$ LANGUAGE 'SQL';
CREATE AGGREGATE aaa (anyarray)
(
sfunc = array_cat,
stype = anyarray,
finalfunc = allbutdummy,
initcond = '{{0,0,0}}'
);
With these definitions, I can do something like this:
> select widget_id,widget_name,
> (select aa(array[flag_name,flag_colour,flag_value::text])
> from widget_flags where widget_id=w.widget_id) as flags
> from widgets w;
widget_id | widget_name | flags
-----------+-------------+------------------------------------
1 | foo | {{a,blue,3},{b,red,4},{c,green,5}}
(1 row)
Unfortunately, this method has two major limitations. First, arrays in
Postgres must contain values of the same type, so one is forced to cast
some values to maintain compatibility (flag_value has been cast to text
in the example above). A more serious problem is that the three-element
array specified in the initcond forces every accumulated array to have
exactly three elements. I have found no better way to work around this
than to create a family of aggregate functions: aaa4, aaa5, ….
Thanks to RhodiumToad on IRC for helping me work out the details of this
hack.