From: | Dimitrios Apostolou <jimis(at)gmx(dot)net> |
---|---|
To: | pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | array_agg() does not stop aggregating according to HAVING clause |
Date: | 2024-08-17 14:37:25 |
Message-ID: | 215d6efa-bb4b-a76e-6066-7a83ccdb55e3@gmx.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hello list,
I have a query that goes through *billions* of rows and for the columns
that have an infrequent "datatag" (HAVING count(test_datatag_n)<10) it
selects all the IDs of the entries (array_agg(run_n)). Here is the full
query:
INSERT INTO infrequent_datatags_in_this_chunk
SELECT datatag, datatags.datatag_n, array_agg(run_n)
FROM runs_raw
JOIN datatags USING(datatag_n)
WHERE workitem_n >= 295
AND workitem_n < 714218
AND datatag IS NOT NULL
GROUP BY datatags.datatag_n
HAVING count(datatag_n) < 10
AND count(datatag_n) > 0 -- Not really needed because of the JOIN above
;
The runs_raw table has run_n as the primary key id, and an index on
workitem_n. The datatags table is a key value store with datatag_n as
primary key.
The problem is that this is extremely slow (5 hours), most likely because
it creates tens of gigabytes of temporary files as I see in the logs. I
suspect that it is writing to disk the array_agg(run_n) of all entries and
not only those HAVING count(datatag_n)<10. (I might be wrong though, as
this is only an assumption based on the amount of data written; I don't
know of any way to examine the temporary files written). While this query
is going through billions of rows, the ones with infrequent datatags are
maybe 10M.
How do I tell postgres to stop aggregating when count>=10?
Thank you in advance,
Dimitris
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2024-08-17 15:15:11 | Re: array_agg() does not stop aggregating according to HAVING clause |
Previous Message | Adrian Klaver | 2024-08-16 23:14:02 | Re: What is the best way to upgrade pgAdmin on Windows? |