From: | Owen Nelson <onelson(at)gmail(dot)com> |
---|---|
To: | pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Re: Understanding partial index selection |
Date: | 2023-11-29 01:12:51 |
Message-ID: | CAG-u7zggmz5sGhQoib1+0H-dTeGYGZ_DVuyZ=Czko-AULr-HNA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Embarrassed to say that it's been so long since I participated in a mailing
list I neglected to ensure my replies were directed back at the list rather
than select individuals.
I'll recap what I shared for the list here, for posterity.
@Boris
This is where the iteration aspect is tricky. I don't have much in the way
of access to run ad hoc queries against the database, so right now I can't
just run it.
I can see that pg_stat_user_tables.n_mod_since_analyze is 9834 which seems
reasonable to me given the long life of the table so far, and I assume that
the autovacuum/autoanalyze are doing the right thing here.
@Tom
> Are your ANALYZE stats up to date on this table?
It's a very good question! Right now, I'm taking it on faith that
autovacuum and autoanalyze are keeping things up to date, but if I'm honest
I've been getting some conflicting information from pg_stat_user_tables and
I'm starting to suspect this might be an AWS Aurora quirk. The timestamps
for last autovacuum seem recent enough to not be concerning, but sometimes
all the stats come back with zeros and nulls. Surely that can't be right :(
I also speculated about using extended statistics. I would/will experiment
with this if I can reproduce this degraded state outside of the production
system. TBD.
@David
I'm glad you mentioned pgstattuple. This is new to me, and I'm anxious to
find some signal that could indicate the problem. I'll work to get the
extension enabled and see what it shows.
I'm not currently able to `SET seqscan_enabled TO off`. The access I have
right now only permits single statements, executed separately. Comparing
the cost is an interesting idea though, so I'll see what I can do.
On Tue, Nov 28, 2023 at 2:23 PM Owen Nelson <onelson(at)gmail(dot)com> wrote:
> Hi!
>
> I've got a query running periodically which has been degrading in
> performance as time goes on. I'm hoping to better understand what the
> contributing factors are.
>
> Given a table with:
> ```
> postgres=# \d message
> Table "public.message"
> Column | Type | Collation | Nullable |
> Default
>
> ------------+--------------------------+-----------+----------+-----------------------------
> id | character varying | C | not null |
> created_at | timestamp with time zone | | not null |
> org_id | character varying | C | not null |
> app_id | character varying | C | not null |
> event_type | character varying | | not null |
> uid | character varying | | |
> payload | jsonb | | |
> channels | jsonb | | |
> expiration | timestamp with time zone | | not null | now() +
> '90 days'::interval
> Indexes:
> "pk_message" PRIMARY KEY, btree (id)
> "ix_message_per_app" btree (app_id, id DESC)
> "ix_message_uid_unique_app_cond" UNIQUE, btree (app_id, uid) WHERE uid
> IS NOT NULL
> "message_payload_not_null_pidx" btree (expiration) WHERE payload IS
> NOT NULL
> Foreign-key constraints:
> "fk_message_app_id_application" FOREIGN KEY (app_id) REFERENCES
> application(id) ON DELETE CASCADE
> Referenced by:
> TABLE "messagedestination" CONSTRAINT
> "fk_messagedestination_msg_id_message" FOREIGN KEY (msg_id) REFERENCES
> message(id) ON DELETE CASCADE
> ```
>
> I periodically run a query like this:
> ```
> UPDATE message SET payload = NULL WHERE id IN (
> SELECT id FROM message
> WHERE
> payload IS NOT NULL
> AND expiration <= now()
> LIMIT 5000
> FOR UPDATE SKIP LOCKED
> );
> ```
>
> The aim is to cull payload values after the retention period has expired.
> The hope is the sub-select would leverage the index
> "message_payload_not_null_pidx" but when I `EXPLAIN ANALYZE` the query, I
> see a seq scan instead.
>
> ```
> Update on message (cost=1773.41..44611.36 rows=5000 width=283) (actual
> time=20913.192..20913.194 rows=0 loops=1)
> -> Nested Loop (cost=1773.41..44611.36 rows=5000 width=283) (actual
> time=20881.320..20886.541 rows=51 loops=1)
> -> HashAggregate (cost=1772.85..1822.85 rows=5000 width=88)
> (actual time=20881.286..20882.052 rows=51 loops=1)
> Group Key: ("ANY_subquery".id)::text
> -> Subquery Scan on "ANY_subquery" (cost=0.00..1760.35
> rows=5000 width=88) (actual time=8425.022..20881.244 rows=51 loops=1)
> -> Limit (cost=0.00..1710.35 rows=5000 width=38)
> (actual time=8425.017..20881.219 rows=51 loops=1)
> -> LockRows (cost=0.00..2112304.92
> rows=6175068 width=38) (actual time=8425.016..20881.212 rows=51 loops=1)
> -> Seq Scan on message message_1
> (cost=0.00..2050554.24 rows=6175068 width=38) (actual
> time=8424.977..20880.945 rows=65 loops=1)
> Filter: ((payload IS NOT NULL) AND
> (expiration <= now()))
> Rows Removed by Filter: 37772897
> -> Index Scan using pk_message on message (cost=0.56..8.56
> rows=1 width=191) (actual time=0.073..0.073 rows=1 loops=51)
> Index Cond: ((id)::text = ("ANY_subquery".id)::text)
> Planning Time: 0.237 ms
> Execution Time: 20913.310 ms
> ```
>
> I have read that the planner's index preference is driven by statistics. I
> looked at the correlation of both the expiration and payload columns, ~0.8
> and ~0.4 respectively. I understand the planner prefers indices where the
> correlation is farther from 0 (closer to the min/max of the -1, 1 range),
> but I'm not sure where the threshold might be. Still, 0.4-ish feels low to
> me.
>
> I wonder if the fact the index is partial using `WHERE payload IS NOT
> NULL` combined with the fact we're changing the value to NULL is sort of
> "punching holes" in the index, reducing correlation over time. I'm not sure
> how to test or prove this is the case.
>
> I also wonder if the index type (btree) is inappropriate for some reason,
> or what other factors I should be thinking about here.
>
> Separately, I saw this earlier thread which describes a somewhat similar
> problem, though the details are somewhat different.
> https://www.postgresql.org/message-id/20230119090425.GA8608%40arp.lijzij.de
>
> Perhaps all I'd need is to `CREATE STATISTICS` on these two columns to
> make the index viable again.
>
> I hope to just get a better understanding of what's going on. There's an
> upcoming change to the schema which will likely sidestep this situation,
> but there's still a lesson to learn here.
>
> Thanks,
> Owen
>
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2023-11-29 01:44:28 | Re: Understanding partial index selection |
Previous Message | David Rowley | 2023-11-28 23:08:27 | Re: Understanding partial index selection |