Re: Very slow 101-feeling design/query..

From: Wells Oliver <wells(dot)oliver(at)gmail(dot)com>
To: Mladen Gogala <gogala(dot)mladen(at)gmail(dot)com>
Cc: pgsql-admin(at)lists(dot)postgresql(dot)org
Subject: Re: Very slow 101-feeling design/query..
Date: 2021-12-11 19:37:20
Message-ID: CAOC+FBW4nJzdYxksaHTONbmnkOGgoMk_7xm6D5_Lo=WkeTrNHQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Yeah, I can take a look at that. The other issue is simply doing SELECT *
FROM vw_joints WHERE game_id = 123 is slow because joints -> plays on
play_uuid, and the index is on game_id in plays, but that query is slower
than I want it to be.

On Sat, Dec 11, 2021 at 11:22 AM Mladen Gogala <gogala(dot)mladen(at)gmail(dot)com>
wrote:

> On 12/11/21 13:32, Wells Oliver wrote:
>
> We have hundreds of millions of joint locations across hundreds of
> thousands of plays across thousands of games. We also have a few hundred
> games with plays in them, but no joint measurements. The joint data is a
> measurement of body joints through time (up to 300x a second)
>
> So my idea was the plays table contains the game identifier (there is no
> root game table), and the joints table refers to the plays table by a play
> UUID.
>
> As an example: One game contains 231 plays. There are 396,144 rows of
> joint data.
>
> I partitioned the joints data by month because of the volume.
>
> The end result is using the joints view (which joins to plays) is just
> dead, grinding slow, and I'm trying to think of alternatives that stay
> within Postgres, i.e. maybe BigQuery and/or RedShift is more of an
> appropriate solution, but I don't want to go there quite yet.
>
> Just trying to optimize this design and open to ideas. Thanks.
>
> On Sat, Dec 11, 2021 at 7:58 AM Michel SALAIS <msalais(at)msym(dot)fr> wrote:
>
>> Hi
>>
>> Using view which does a join in two tables to access things in one of
>> them is not a good idea. You pay for a join even though it is not
>> necessary. So if you just want distinct game_id and you don’t care if its
>> play_uuid is in joints, then you can accelerate using the table plays
>> directly.
>>
>>
>>
>> *Michel SALAIS*
>>
>> *De :* Wells Oliver <wells(dot)oliver(at)gmail(dot)com>
>> *Envoyé :* vendredi 10 décembre 2021 23:53
>> *À :* pgsql-admin <pgsql-admin(at)postgresql(dot)org>
>> *Objet :* Very slow 101-feeling design/query..
>>
>>
>>
>> This feels very 101 but I feel like it should be much faster:
>>
>>
>>
>> A table "joints" with a PK of play_uuid, target_id, joint_seq,
>> joint_timestamp.
>>
>>
>>
>> "joints" is partitioned using RANGE on joint_timestamp for monthly
>> partitions 1/1 - 2/1, 2-1 - 3/1, etc.
>>
>>
>>
>> "joints" has an FK where play_uuid refers to table "plays" and
>> column "play_uuid" where "play_uuid" is the PK.
>>
>>
>>
>> "plays" additionally has an indexed column game_id.
>>
>>
>>
>> "joints" has 1133,932,391 rows across 12 monthly partitions for 2021, and
>> "plays has 585,627 rows. We made a view called "vw_joints" which just does:
>>
>>
>>
>> SELECT * FROM joints JOIN plays USING (play_id);
>>
>>
>>
>> Then doing:
>>
>>
>>
>> SELECT DISTINCT game_id FROM vw_joints
>>
>>
>>
>> Takes 35-45 minutes. Which seems nuts. We do this kind of design in a few
>> different plays to normalize things, but it comes at the cost of these
>> agonizingly slow (and seemingly dead simple) qeuries.
>>
>>
>>
>> Is there any optimization to do here beyond flattening table and
>> de-normalizing data? Is the partitioning causing a slowness here? I feel
>> like partitioning is creating some difficulty...
>> --
>>
>
> Well, you can create a trigger which would fire whenever row is inserted
> or deleted and would update joint, game_id and the count in a separate
> table. That is the usual solution for the problem you described. Also, if
> there is a small amount of games, you can use hash index instead of B-tree
> indexes.
>
>
> --
>
> Mladen Gogala
> Database Consultant
> Tel: (347) 321-1217https://dbwhisperer.wordpress.com
>
>

--
Wells Oliver
wells(dot)oliver(at)gmail(dot)com <wellsoliver(at)gmail(dot)com>

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Doug Reynolds 2021-12-11 22:50:03 Re: Very slow 101-feeling design/query..
Previous Message Mladen Gogala 2021-12-11 19:22:23 Re: Very slow 101-feeling design/query..