Suppose you have a table where each row represents a widget, and another table in which zero or more rows may refer to each widget, representing a set of values for some property of that widget. For example:
create table widgets (
widget_id serial primary key,
widget_name text unique
);
create table widget_flags (
widget_id integer not null references widgets,
flag_name text unique not null,
flag_value integer not null,
flag_colour text not null
);
Given these tables, it's easy to fetch all of a widget's flags given its id. Joining the two tables using widget_id would allow the widget's own data to be fetched with its flags, but it would result in as many rows for a widget as it had flags. I've often wished I could fetch all the flag data within a single row.
For various (well-known) reasons, it's a bad idea to store data in array columns, but using an array aggregate to read from the widget_flags table would be a neat way to stuff many rows' worth of data into a single column. For example, one could do something like:
> select widget_id,array_agg(flag_colour) from widget_flags
> group by widget_id;
widget_id | array_agg
-----------+------------------
1 | {blue,red,green}
(1 row)
But that's just one column, and what I really wanted was an array of arrays representing the values in each row. Because of various quirks of PostgreSQL's array implementation, I haven't found a truly satisfactory solution, but the definition of array_accum in the documentation semeed a reasonable starting point:
CREATE AGGREGATE array_accum (anyelement)
(
sfunc = array_append,
stype = anyarray,
initcond = '{}'
);
To create an array of arrays, I changed anyelement argument to anyarray, and changed the sfunc from array_append to array_cat. The stype remained unaltered. Unfortunately, that wasn't quite enough. A query like "select aaa(array[a,b,c])" would return an array of values, not arrays. The only way to fix that was to change the initcond to '{{0,0,0}}', which meant that the first dummy element had to be removed afterwards.
CREATE FUNCTION allbutdummy(anyarray) RETURNS anyarray
AS $$select $1[2:array_upper($1,1)]$$ LANGUAGE 'SQL';
CREATE AGGREGATE aaa (anyarray)
(
sfunc = array_cat,
stype = anyarray,
finalfunc = allbutdummy,
initcond = '{{0,0,0}}'
);
With these definitions, I can do something like this:
> select widget_id,widget_name,
> (select aa(array[flag_name,flag_colour,flag_value::text])
> from widget_flags where widget_id=w.widget_id) as flags
> from widgets w;
widget_id | widget_name | flags
-----------+-------------+------------------------------------
1 | foo | {{a,blue,3},{b,red,4},{c,green,5}}
(1 row)
Unfortunately, this method has two major limitations. First, arrays in Postgres must contain values of the same type, so one is forced to cast some values to maintain compatibility (flag_value has been cast to text in the example above). A more serious problem is that the three-element array specified in the initcond forces every accumulated array to have exactly three elements. I have found no better way to work around this than to create a family of aggregate functions: aaa4, aaa5, ….
Thanks to RhodiumToad on IRC for helping me work out the details of this hack.
I needed to write a plpgsql function that inserted rows into some tables and returned the generated primary key values to the caller. I had never needed to return multiple values from a function before, and I ran into a couple of new problems along the way.
This was my first attempt:
create function create_widget(name text, colour int) returns record as $$
declare
wid int;
cid int;
ids record;
begin
insert into widgets (name) values (name)
returning widget_id into wid;
insert into widget_colours (widget_id, colour)
values (wid, colour)
returning colour_id into cid;
select wid::int, cid::int into ids;
return ids;
end;
$$ language plpgsql;
When I tried this, I got a syntax error which I eventually understood to mean that my parameters could not have the same names as any columns in the table I was inserting into. Not very surprising, since the inserts look a bit ambiguous even when reading the code. Since parameter names occur only in the function definition and aren't exposed to the caller, I didn't mind working around this by renaming my parameters "p_name" and "p_colour".
What did surprise me was that I had to call the function like this:
select * from create_widget('x', 3) as cw(widget_id int, colour_id int);
In other words, the caller always had to supply a column definition list, and I couldn't (see any way to) specify what I was going to return in the function definition.
I was given two suggestions about how to work around this problem. The first was to use OUT parameters:
create function create_widget(name text, colour int,
widget_id OUT int, colour_id OUT int)
But this approach, predictably enough, had the same problem with the names of parameters conflicting with the names of columns, and I was much less willing to force the caller to retrieve and have to use columns named "r_widget_id" or something similar.
With Postgres 8.4 and above, I learned that I could also declare the function as returning a table, which looked exactly like what I had hoped for:
create function create_widget(name text, colour int)
returns table (widget_id int, colour_id int)
But to my great disappointment, the conflict with the column names persists even in this form.
In the end, I decided to use "p_" prefixed parameter names, return a record, and name the columns in the caller.