query patterns for multipass aggregating

From: Rob Nikander <rob(dot)nikander(at)gmail(dot)com>
To: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: query patterns for multipass aggregating
Date: 2018-11-11 16:20:23
Message-ID: 11335409-A083-4951-BC7A-3C4FD118AC04@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

I’ve run into this pattern a few times, and I usually get a little confused. I’m wondering if there are some common solutions or techniques.

Simplified example:

I have tables `items`, `colors`, and `images`. Items have many colors, and many images.

I want a query to list items, with their colors and images. Each result row is an item, so the colors and images must be aggregated into arrays or json.

If there were only one other table, it’s a simple join and group…

select items.*, array_agg(color_name)
from items join colors on items.id = colors.item_id
group by items.id

Now to get images too, my first try uses a CTE…

// Tables:
// items (id, name, foo)
// colors (item_id, color_name, color_foo)
// images (item_id, image_file, image_foo)

with items_with_colors as (
// This is the same query as above
select items.*, array_agg(colors.color_name) as color_names
from items
join colors on items.id = colors.item_id
group by items.id
)
// Same idea repeated but now joining to images table
select items.*, array_agg(images.image_file) as image_files
from items_with_colors items
join images on items.id = images.item_id
group by items.id, items.name, items.foo, items.color_names; // mmm :(

One first problem is on the last line. I have to list out all the columns. In my real situation there are many more. Listing them is only an inconvenience, but I’m more worried that it is internally comparing all the columns, when really it could compare only `items.id`.

So… are there some other techniques that usually (or might) work better? It seems like the kind of thing that could run in parallel, if I wrote it right.

Rob

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Peter J. Holzer 2018-11-11 16:24:11 Re: Fwd: Log file
Previous Message Adrien Nayrat 2018-11-11 10:14:44 Re: pg_dump out of memory for large table with LOB