Re: Very slow 101-feeling design/query..

From: Mladen Gogala <gogala(dot)mladen(at)gmail(dot)com>
To: pgsql-admin(at)lists(dot)postgresql(dot)org
Subject: Re: Very slow 101-feeling design/query..
Date: 2021-12-11 19:22:23
Message-ID: e8c3a8cf-09be-542f-7a8f-f6194eef7b83@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On 12/11/21 13:32, Wells Oliver wrote:
> We have hundreds of millions of joint locations across hundreds of
> thousands of plays across thousands of games. We also have a few
> hundred games with plays in them, but no joint measurements. The joint
> data is a measurement of body joints through time (up to 300x a second)
>
> So my idea was the plays table contains the game identifier (there is
> no root game table), and the joints table refers to the plays table by
> a play UUID.
>
> As an example: One game contains 231 plays. There are 396,144 rows of
> joint data.
>
> I partitioned the joints data by month because of the volume.
>
> The end result is using the joints view (which joins to plays) is just
> dead, grinding slow, and I'm trying to think of alternatives that stay
> within Postgres, i.e. maybe BigQuery and/or RedShift is more of an
> appropriate solution, but I don't want to go there quite yet.
>
> Just trying to optimize this design and open to ideas. Thanks.
>
> On Sat, Dec 11, 2021 at 7:58 AM Michel SALAIS <msalais(at)msym(dot)fr> wrote:
>
> Hi
>
> Using view which does a join in two tables to access things in one
> of them is not a good idea. You pay for a join even though it is
> not necessary. So if you just want distinct game_id and you don’t
> care if its play_uuid is in joints, then you can accelerate using
> the table plays directly.
>
> /Michel SALAIS/
>
> *De :* Wells Oliver <wells(dot)oliver(at)gmail(dot)com>
> *Envoyé :* vendredi 10 décembre 2021 23:53
> *À :* pgsql-admin <pgsql-admin(at)postgresql(dot)org>
> *Objet :* Very slow 101-feeling design/query..
>
> This feels very 101 but I feel like it should be much faster:
>
> A table "joints" with a PK of play_uuid, target_id, joint_seq,
> joint_timestamp.
>
> "joints" is partitioned using RANGE on joint_timestamp for monthly
> partitions 1/1 - 2/1, 2-1 - 3/1, etc.
>
> "joints" has an FK where play_uuid refers to table "plays" and
> column "play_uuid" where "play_uuid" is the PK.
>
> "plays" additionally has an indexed column game_id.
>
> "joints" has 1133,932,391 rows across 12 monthly partitions for
> 2021, and "plays has 585,627 rows. We made a view called
> "vw_joints" which just does:
>
> SELECT * FROM joints JOIN plays USING (play_id);
>
> Then doing:
>
> SELECT DISTINCT game_id FROM vw_joints
>
> Takes 35-45 minutes. Which seems nuts. We do this kind of design
> in a few different plays to normalize things, but it comes at the
> cost of these agonizingly slow (and seemingly dead simple) qeuries.
>
> Is there any optimization to do here beyond flattening table and
> de-normalizing data? Is the partitioning causing a slowness here?
> I feel like partitioning is creating some difficulty...
>
> --
>
>
Well, you can create a trigger which would fire whenever row is inserted
or deleted and would update joint, game_id and the count in a separate
table. That is the usual solution for the problem you described. Also,
if there is a small amount of games, you can use hash index instead of
B-tree indexes.

--

Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Wells Oliver 2021-12-11 19:37:20 Re: Very slow 101-feeling design/query..
Previous Message Wells Oliver 2021-12-11 18:32:01 Re: Very slow 101-feeling design/query..