Re: How to properly query lots of rows based on timestamps?

From: Thorsten Schöning <tschoening(at)am-soft(dot)de>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: How to properly query lots of rows based on timestamps?
Date: 2020-08-30 13:54:11
Message-ID: 1683596875.20200830155411@am-soft.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Guten Tag Alban Hertroys,
am Samstag, 29. August 2020 um 11:29 schrieben Sie:

> [...]Next, it loops over those to match the other condition of
> your query (id = clt_rec.oms_rec). You didn’t tell whether there’s
> an index on that column.

Which column, oms_rec.id or clt_rec.oms_rec? The former has one
because it's the unique key and the plan says so as well:

> Index Scan using pk_oms_rec on oms_rec

clt_rec.oms_rec OTOH is only marked as unique, without explicitly
creating an index.

> CONSTRAINT uk_clt_rec_oms_rec UNIQUE (oms_rec)

But doesn't that mean an index is available behind the scenes,
maintained by Postgres? So it should have been used if it would
provide any benefit?

> Adding a unique constraint will automatically create a unique B-tree
> index on the column or group of columns listed in the constraint.

https://www.postgresql.org/docs/9.4/ddl-constraints.html

> You’d probably see a performance improvement were you to create an
> index on (captured_at, id). If your Postgres version is somewhat
> recent, that could even lead to an Index Only Scan.

Tried that with your statement at the bottom and it didn't seem to
change anything even when using Postgres 11:

> -> Nested Loop (cost=1.14..343169.49 rows=43543 width=20) (actual time=0.228..95.554 rows=34266 loops=3)
> -> Parallel Index Scan using idx_clt_rec_captured_at on clt_rec (cost=0.57..3437.90 rows=43543 width=24) (actual time=0.119..16.895 rows=34266 loops=3)
> Index Cond: ((captured_at >= ('2020-08-01 00:00:00+02'::timestamp with time zone - '00:00:00'::interval)) AND (captured_at <= ('2020-08-01 00:00:00+02'::timestamp with time zone + '1 day'::interval)))
> -> Index Scan using pk_oms_rec on oms_rec (cost=0.57..7.80 rows=1 width=12) (actual time=0.002..0.002 rows=1 loops=102799)
> Index Cond: (id = clt_rec.oms_rec)

vs. with your suggested change:

> -> Nested Loop (cost=1.14..513397.11 rows=43543 width=20) (actual time=0.236..97.044 rows=34266 loops=3)
> -> Parallel Index Scan using idx_clt_rec_captured_at on clt_rec (cost=0.57..173665.52 rows=43543 width=24) (actual time=0.183..17.464 rows=34266 loops=3)
> Index Cond: ((captured_at >= ('2020-08-01 00:00:00+02'::timestamp with time zone - '00:00:00'::interval)) AND (captured_at <= ('2020-08-01 00:00:00+02'::timestamp with time zone + '1 day'::interval)))
> -> Index Scan using pk_oms_rec on oms_rec (cost=0.57..7.80 rows=1 width=12) (actual time=0.002..0.002 rows=1 loops=102799)
> Index Cond: (id = clt_rec.oms_rec)

Good news is that having one of those indexes in place at all makes a
huge difference compared to having neither of both. :-D

Mit freundlichen Grüßen,

Thorsten Schöning

--
Thorsten Schöning E-Mail: Thorsten(dot)Schoening(at)AM-SoFT(dot)de
AM-SoFT IT-Systeme http://www.AM-SoFT.de/

Telefon...........05151- 9468- 55
Fax...............05151- 9468- 88
Mobil..............0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Thorsten Schöning 2020-08-30 14:00:21 Re: How to properly query lots of rows based on timestamps?
Previous Message Thorsten Schöning 2020-08-30 13:36:22 Re: How to properly query lots of rows based on timestamps?