Re: Very slow 101-feeling design/query..

From: Wells Oliver <wells(dot)oliver(at)gmail(dot)com>
To: Michel SALAIS <msalais(at)msym(dot)fr>
Cc: pgsql-admin <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Very slow 101-feeling design/query..
Date: 2021-12-11 18:32:01
Message-ID: CAOC+FBVV92QZ+hpCrtKDUXWe3ALB1UkdJtzoe5K=g3YDNctS8w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

We have hundreds of millions of joint locations across hundreds of
thousands of plays across thousands of games. We also have a few hundred
games with plays in them, but no joint measurements. The joint data is a
measurement of body joints through time (up to 300x a second)

So my idea was the plays table contains the game identifier (there is no
root game table), and the joints table refers to the plays table by a play
UUID.

As an example: One game contains 231 plays. There are 396,144 rows of joint
data.

I partitioned the joints data by month because of the volume.

The end result is using the joints view (which joins to plays) is just
dead, grinding slow, and I'm trying to think of alternatives that stay
within Postgres, i.e. maybe BigQuery and/or RedShift is more of an
appropriate solution, but I don't want to go there quite yet.

Just trying to optimize this design and open to ideas. Thanks.

On Sat, Dec 11, 2021 at 7:58 AM Michel SALAIS <msalais(at)msym(dot)fr> wrote:

> Hi
>
> Using view which does a join in two tables to access things in one of them
> is not a good idea. You pay for a join even though it is not necessary. So
> if you just want distinct game_id and you don’t care if its play_uuid is in
> joints, then you can accelerate using the table plays directly.
>
>
>
> *Michel SALAIS*
>
> *De :* Wells Oliver <wells(dot)oliver(at)gmail(dot)com>
> *Envoyé :* vendredi 10 décembre 2021 23:53
> *À :* pgsql-admin <pgsql-admin(at)postgresql(dot)org>
> *Objet :* Very slow 101-feeling design/query..
>
>
>
> This feels very 101 but I feel like it should be much faster:
>
>
>
> A table "joints" with a PK of play_uuid, target_id, joint_seq,
> joint_timestamp.
>
>
>
> "joints" is partitioned using RANGE on joint_timestamp for monthly
> partitions 1/1 - 2/1, 2-1 - 3/1, etc.
>
>
>
> "joints" has an FK where play_uuid refers to table "plays" and
> column "play_uuid" where "play_uuid" is the PK.
>
>
>
> "plays" additionally has an indexed column game_id.
>
>
>
> "joints" has 1133,932,391 rows across 12 monthly partitions for 2021, and
> "plays has 585,627 rows. We made a view called "vw_joints" which just does:
>
>
>
> SELECT * FROM joints JOIN plays USING (play_id);
>
>
>
> Then doing:
>
>
>
> SELECT DISTINCT game_id FROM vw_joints
>
>
>
> Takes 35-45 minutes. Which seems nuts. We do this kind of design in a few
> different plays to normalize things, but it comes at the cost of these
> agonizingly slow (and seemingly dead simple) qeuries.
>
>
>
> Is there any optimization to do here beyond flattening table and
> de-normalizing data? Is the partitioning causing a slowness here? I feel
> like partitioning is creating some difficulty...
>
>
>
> Appreciate it.
>
>
>
> --
>
> Wells Oliver
> wells(dot)oliver(at)gmail(dot)com <wellsoliver(at)gmail(dot)com>
>

--
Wells Oliver
wells(dot)oliver(at)gmail(dot)com <wellsoliver(at)gmail(dot)com>

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Mladen Gogala 2021-12-11 19:22:23 Re: Very slow 101-feeling design/query..
Previous Message Gaurav Anand 2021-12-11 16:44:52 Re: Locking in Inheritance partition