Unfortunate Nested Loop + Missing Autovacuum

From: Lincoln Swaine-Moore <lswainemoore(at)gmail(dot)com>
To: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Unfortunate Nested Loop + Missing Autovacuum
Date: 2025-02-21 23:46:15
Message-ID: CABcidkJT9FLu4hDT5VRKD+KGu0rLkJqhyqtAA42-P6QoVfh+kA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi all--

Writing in with a two-parter performance issue.

I've got two tables roughly shaped like this:

Table "public.rawdata"
Column | Type | Collation | Nullable
| Default
-----------------------+-----------------------------+-----------+----------+-------------------------------------------------
id | integer | | not null
| nextval('rawdata_id_seq'::regclass)
timestamp | timestamp without time zone | | not null
|
sn | character varying(36) | | not null
|
raw_value | double precision | |
|
Indexes:
"rawdata_pkey" PRIMARY KEY, btree (id)
"rawdata_multicol_sn_and_timestamp_desc_idx" btree (sn, "timestamp"
DESC)
Foreign-key constraints:
"rawdata_sn_fkey" FOREIGN KEY (sn) REFERENCES devices(sn) ON UPDATE
CASCADE ON DELETE CASCADE
Referenced by:
TABLE "cleandata" CONSTRAINT "cleandata_raw_table_id_fkey" FOREIGN KEY
(raw_table_id) REFERENCES rawdata(id)

Table "public.cleandata"
Column | Type | Collation | Nullable |
Default
-----------------+-----------------------------+-----------+----------+---------------------------------------------------
id | integer | | not null |
nextval('cleandata_id_seq'::regclass)
timestamp | timestamp without time zone | | not null |
sn | character varying(36) | | not null |
raw_table_id | integer | | |
clean_value | double precision | | |
Indexes:
"cleandata_pkey" PRIMARY KEY, btree (id)
"cleandata_multicol_sn_and_timestamp_desc_idx" btree (sn, "timestamp"
DESC)
Foreign-key constraints:
"cleandata_raw_table_id_fkey" FOREIGN KEY (raw_table_id) REFERENCES
rawdata(id)
"cleandata_sn_fkey" FOREIGN KEY (sn) REFERENCES devices(sn) ON UPDATE
CASCADE ON DELETE CASCADE

Rows are inserted into the raw table, then slightly later inserted into the
cleandata table, at a rate of ~1-5 million
rows a day. The tables are on the order of hundreds of millions of rows.
They are almost never updated or deleted,
but are selected regularly. Technically, not every row in rawdata will have
a corresponding row in cleandata,
but in practice the vast majority do.

Thanks to the combined index on sn / timestamp, selecting data from one or
the other of these tables is
fairly performant. Where things get hairy is when I try to join between
them. I can write a performant query that joins
from clean to raw, but the opposite direction (which is not logically
equivalent) gets hairy because raw_table_id is not indexed on cleandata.
I think in the medium term, it would be good to build that index, but in
the short term, I can help speed things up
by limiting the data by doubly filtering like:

explain analyze SELECT count(*)
FROM (
SELECT *
FROM
rawdata
WHERE
timestamp >= NOW() - interval '2 weeks'
AND sn ='FOO'
) r
JOIN (
SELECT *
FROM
cleandata
WHERE
timestamp >= NOW() - interval '2 weeks'
AND sn ='FOO'
) c
ON
r.id = c.raw_table_id;

which produces:
Hash Join (cost=42.80..83.52 rows=1 width=44) (actual
time=852.341..1338.713 rows=20141 loops=1)
Hash Cond: (cleandata.raw_table_id = rawdata.id)
-> Index Scan using cleandata_multicol_sn_and_timestamp_desc_idx on
cleandata (cost=0.70..41.30 rows=47 width=12) (actual time=0.051..476.287
rows=20141 loops=1)
" Index Cond: (((sn)::text = 'FOO'::text) AND (""timestamp"" >=
(now() - '14 days'::interval)))"
-> Hash (cost=41.51..41.51 rows=47 width=40) (actual
time=852.273..852.275 rows=20141 loops=1)
Buckets: 32768 (originally 1024) Batches: 1 (originally 1) Memory
Usage: 1751kB
-> Index Scan using raw_multicol_sn_and_timestamp_desc_idx on
rawdata (cost=0.70..41.51 rows=47 width=40) (actual time=0.050..844.555
rows=20141 loops=1)
" Index Cond: (((sn)::text = 'FOO'::text) AND (""timestamp""
>= (now() - '14 days'::interval)))"
Planning Time: 2.211 ms
Execution Time: 1339.680 ms

The estimates are totally wrong, but the plan is good and the query is fast.
However, when I change the query slightly, and make it `timestamp >= NOW()
- interval '2 weeks' and timestamp < NOW()`,
which actually covers the same period (or really any other time period),
the plan changes dramatically:
Nested Loop (cost=1.42..5.47 rows=1 width=572) (actual
time=0.100..185218.340 rows=20141 loops=1)
Join Filter: (rawdata.id = cleandata.raw_table_id)
Rows Removed by Join Filter: 405639740
-> Index Scan using raw_multicol_sn_and_timestamp_desc_idx on rawdata
(cost=0.71..2.73 rows=1 width=424) (actual time=0.056..100.014 rows=20141
loops=1)
" Index Cond: (((sn)::text = 'FOO'::text) AND (""timestamp"" >=
(now() - '14 days'::interval)) AND (""timestamp"" < now()))"
-> Index Scan using cleandata_multicol_sn_and_timestamp_desc_idx on
cleandata (cost=0.71..2.73 rows=1 width=148) (actual time=0.045..7.584
rows=20141 loops=20141)
" Index Cond: (((sn)::text = 'FOO'::text) AND (""timestamp"" >=
(now() - '14 days'::interval)) AND (""timestamp"" < now()))"
Planning Time: 0.450 ms
Execution Time: 185225.028 ms

The row estimate for each table gets even worse, and I guess this is enough
to encourage postgres to do this as a Nested Loop, with disastrous
consequences.

So, obviously there's a statistics problem, which led me to realize that
actually these tables have *never* been autovacuumed/analyzed according to
pg_stat_user_tables.
I'm using a managed database which makes it a little tricky to debug, but
all my settings
(autovacuum/autovacuum_vacuum_threshold/autovacuum_analyze_threshold/autovacuum_vacuum_insert_threshold)
are default,
and I can see that other tables have been vacuumed recently.

I assume this has something to do with the fact that these tables don't
accumulate dead tuples since they're basically append-only?
But I still think INSERTs should trigger autovacuum/analyze eventually (at
least when the table grows by 10%, because of
autovacuum_analyze_scale_factor),
and I'm confused why that doesn't seem to have happened. Seems like this is
probably hurting my queries' performance elsewhere.

So, my two big questions are:
- Is there a better way to write my query to hint away from the awful
nested loop join?
- Can anyone think of why autovacuum is declining to vacuum/analyze these
tables?

Thanks for reading, and for any help with this!

--
Lincoln Swaine-Moore

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Andrei Lepikhov 2025-02-22 08:37:40 Re: Unfortunate Nested Loop + Missing Autovacuum
Previous Message Pavel Stehule 2025-02-20 20:22:15 Re: Re: proposal: schema variables