From: | Alexander Lakhin <exclusion(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Subject: | Re: To what extent should tests rely on VACUUM ANALYZE? |
Date: | 2024-03-29 08:59:59 |
Message-ID: | ef34cee4-ddd3-30ec-3222-bee25ae60f8f@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
28.03.2024 20:33, Tom Lane wrote:
>
>> But I'm a bit confused - how come the estimates do change at all? The
>> analyze simply fetches 30k rows, and tenk only has 10k of them. So we
>> should have *exact* numbers, and it should be exactly the same for all
>> the analyze runs. So how come it changes like this?
> It's plausible that the VACUUM ANALYZE done by test_setup fails
> ConditionalLockBufferForCleanup() sometimes because of concurrent
> activity like checkpointer writes. I'm not quite sure how we
> get from that to the observed symptom though. Maybe the
> VACUUM needs DISABLE_PAGE_SKIPPING?
Yeah, the way from ConditionalLockBufferForCleanup() returning false to
reltuples < 10000 is not one-step, as I thought initially. There is also
sanity_check doing VACUUM in between. So, effectively the troublesome
scenario is:
VACUUM ANALYZE tenk2; -- with cleanup lock not granted for some blocks
VACUUM tenk2;
In this scenario, lazy_scan_heap() -> vac_estimate_reltuples() called two
times.
First, with rel_pages: 384, vacrel->scanned_pages: 384,
vacrel->live_tuples: 10000 and it results in
vacrel->new_live_tuples = 10000,
And second, with rel_pages: 345, vacrel->scanned_pages: 80,
vacrel->live_tuples: 2315 (for instance), and we get
vacrel->new_live_tuples = 9996,
With unmodified ConditionalLockBufferForCleanup() the second call is
performed with rel_pages: 345, vacrel->scanned_pages: 1,
vacrel->live_tuples: 24 and it returns 10000.
This simple change fixes the issue for me:
-VACUUM ANALYZE tenk2;
+VACUUM (ANALYZE, DISABLE_PAGE_SKIPPING) tenk2;
But it looks like subselect is not the only test that can fail due to
vacuum instability. I see that create_index also suffers from cranky
ConditionalLockBufferForCleanup() (+if (rand() % 10 == 0)
return false; ), although it placed in parallel_schedule before
sanity_check, so this failure needs another explanation:
- QUERY PLAN
--------------------------------------------------------
- Index Only Scan using tenk1_thous_tenthous on tenk1
- Index Cond: (thousand < 2)
- Filter: (tenthous = ANY ('{1001,3000}'::integer[]))
-(3 rows)
+ QUERY PLAN
+--------------------------------------------------------------------------------------
+ Sort
+ Sort Key: thousand
+ -> Index Only Scan using tenk1_thous_tenthous on tenk1
+ Index Cond: ((thousand < 2) AND (tenthous = ANY ('{1001,3000}'::integer[])))
+(4 rows)
Best regards,
Alexander
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2024-03-29 09:05:22 | Re: Synchronizing slots from primary to standby |
Previous Message | Amit Kapila | 2024-03-29 08:58:10 | Re: Synchronizing slots from primary to standby |