Quick Links

To what extent should tests rely on VACUUM ANALYZE?

From:	Alexander Lakhin <exclusion(at)gmail(dot)com>
To:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Cc:	Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject:	To what extent should tests rely on VACUUM ANALYZE?
Date:	2024-03-28 15:00:00
Message-ID:	66eb9a6e-fc67-a230-c5b1-2a741e8b88c6@gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hello hackers,

When running multiple 027_stream_regress.pl test instances in parallel
(and with aggressive autovacuum) on a rather slow machine, I encountered
test failures due to the subselect test instability just as the following
failures on buildfarm:
1) https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grassquit&dt=2024-03-27%2010%3A16%3A12

--- /home/bf/bf-build/grassquit/HEAD/pgsql/src/test/regress/expected/subselect.out 2024-03-19 22:20:34.435867114 +0000
+++ /home/bf/bf-build/grassquit/HEAD/pgsql.build/testrun/recovery/027_stream_regress/data/results/subselect.out
2024-03-27 10:28:38.185776605 +0000
@@ -2067,16 +2067,16 @@
                    QUERY PLAN
-------------------------------------------------
Hash Join
-   Hash Cond: (c.odd = b.odd)
+   Hash Cond: (c.hundred = a.hundred)
    -> Hash Join
-         Hash Cond: (a.hundred = c.hundred)
-         -> Seq Scan on tenk1 a
+         Hash Cond: (b.odd = c.odd)
+         -> Seq Scan on tenk2 b
          -> Hash
                -> HashAggregate
                      Group Key: c.odd, c.hundred
                      -> Seq Scan on tenk2 c
    -> Hash
-         -> Seq Scan on tenk2 b
+         -> Seq Scan on tenk1 a
(11 rows)

2) https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mylodon&dt=2024-03-27%2009%3A49%3A38

(That query was added recently (by 9f1337639 from 2023-02-15) and the
failure evidentially depends on timing, so the number of the failures I
could find on buildfarm is moderate for now.)

With the subselect test modified as in attached, I could see what makes
the plan change:
- -> Seq Scan on public.tenk2 c (cost=0.00..445.00 rows=10000 width=8)
+ -> Seq Scan on public.tenk2 c (cost=0.00..444.95 rows=9995 width=8)

relname | relpages | reltuples | autovacuum_count | autoanalyze_count
---------+----------+-----------+------------------+-------------------
- tenk2 | 345 | 10000 | 0 | 0
+ tenk2 | 345 | 9995 | 0 | 0

Using the trick Thomas proposed in [1] (see my modification attached), I
could reproduce the failure easily on my workstation with no specific
conditions:
2024-03-28 14:05:13.792 UTC client backend[2358012] pg_regress/test_setup LOG: !!!ConditionalLockBufferForCleanup()
returning false
2024-03-28 14:05:13.792 UTC client backend[2358012] pg_regress/test_setup CONTEXT: while scanning block 29 of relation
"public.tenk2"
2024-03-28 14:05:13.792 UTC client backend[2358012] pg_regress/test_setup STATEMENT: VACUUM ANALYZE tenk2;
...
relname | relpages | reltuples | autovacuum_count | autoanalyze_count
---------+----------+-----------+------------------+-------------------
- tenk2 | 345 | 10000 | 0 | 0
+ tenk2 | 345 | 9996 | 0 | 0
(1 row)

So it looks to me like a possible cause of the failure, and I wonder
whether checks for query plans should be immune to such changes or results
of VACUUM ANALYZE should be 100% stable?

[1] https://www.postgresql.org/message-id/CA%2BhUKGKYNHmL_DhmVRiidHv6YLAL8jViifwwn2ABY__Y3BCphg%40mail.gmail.com

Best regards,
Alexander

Attachment	Content-Type	Size
subselect-debugging.patch	text/x-patch	4.8 KB
cranky-ConditionalLockBufferForCleanup.patch	text/x-patch	862 bytes

Responses

Re: To what extent should tests rely on VACUUM ANALYZE? at 2024-03-28 17:04:54 from Tomas Vondra
Re: To what extent should tests rely on VACUUM ANALYZE? at 2024-03-29 07:06:07 from Richard Guo

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Melanie Plageman	2024-03-28 15:07:10	Re: Combine Prune and Freeze records emitted by vacuum
Previous Message	Tom Lane	2024-03-28 14:59:01	Re: [PATCH] plpython function causes server panic