Re: Multiple Aggregations Order

From: João Haas <joao(dot)ca(dot)haas(at)gmail(dot)com>
To: Michael Lewis <mlewis(at)entrata(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Multiple Aggregations Order
Date: 2020-01-14 21:03:29
Message-ID: CAEi5ktLYdJ0_5NTQn__7GPOi1THAM0_gRspGX5er6_G2xq_ubQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I'm aiming to serialize some graph data into a JSON format, and some of the
data needed for the serialization is in the relation tables, like, "this
node connects to this other node in this way". These are served to IOT
devices and the data changes a lot, so there's a ton of requests and
caching is not that efficient. Due to that, it would be ideal if I could
fetch everything in a single query, with the data needed aggregated in
arrays, so that I can denormalize them later in code (eg.:
zip(data['child_id_set'], data['child_kind_set']) in python).

Each query should have from 100~1000 items from the 'tb' table. The
amount of child relations each item has vary a lot depending on the node,
so it can be from 1~10000.

The tables themselves have some millions of rows, but I don't have access
to the production database to check how many exactly. Due to this, although
I can share the query plan, it wouldn't be so meaningful, since it is
considering my dev env, which only have ~100 rows, and it's throwing seq
scans for everything.

The query is the following:
WITH RECURSIVE tree(tree_id, tree_path) AS (
SELECT "conn"."child_id",
ARRAY["conn"."parent_id"]::integer[]
FROM "conn"
WHERE "parent_id" IN (643) -- Starting point of graph
UNION
SELECT DISTINCT ON ("conn"."child_id")
"conn"."child_id",
"tree"."tree_path" || "conn"."parent_id"
FROM "tree", "conn"
WHERE "conn"."parent_id" = "tree"."tree_id"
AND NOT "conn"."child_id" = ANY("tree"."tree_path")
) SELECT "tb".*,
array_length("tree"."tree_path", 1) AS "depth",
array_agg("conn"."child_id" ORDER BY ("conn"."order",
"conn"."kind")) FILTER (WHERE "conn"."child_id" IS NOT NULL) AS
"child_id_set",
array_agg("conn"."kind" ORDER BY ("conn"."order", "conn"."kind"))
FILTER (WHERE "conn"."child_id" IS NOT NULL) AS "child_kind_set",
array_agg("conn"."restrictions" ORDER BY ("conn"."order",
"conn"."kind")) FILTER (WHERE "conn"."child_id" IS NOT NULL) AS
"child_restrictions_set",
array_agg("conn"."meta" ORDER BY ("conn"."order", "conn"."kind"))
FILTER (WHERE "conn"."child_id" IS NOT NULL) AS "child_meta_set"
FROM "tb"
LEFT OUTER JOIN "conn"
ON "tb"."id" = "conn"."parent_id"
INNER JOIN (SELECT DISTINCT ON ("tree_id") * FROM "tree") AS "tree"
ON "tree"."tree_id" = "tb"."id"
GROUP BY "tb"."id", "tree"."tree_path";

I'm currently using Postgres 11

On Tue, Jan 14, 2020 at 4:49 PM Michael Lewis <mlewis(at)entrata(dot)com> wrote:

> "handle this aggregated data later in code"
>
> What is your end goal though? Also, approx how many rows in these tables?
> Can you share an example query and plan? What version are you using?
>
>>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message João Haas 2020-01-14 21:17:22 Re: Multiple Aggregations Order
Previous Message Justin 2020-01-14 20:51:58 Re: Is it possible to replicate through an http proxy?