Re: AIO v2.5

From: Andres Freund <andres(at)anarazel(dot)de>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>, Antonin Houska <ah(at)cybertec(dot)at>
Subject: Re: AIO v2.5
Date: 2025-04-01 21:47:51
Message-ID: uc62i6vi5gd4bi6wtjj5poadqxolgy55e7ihkmf3mthjegb6zl@zqo7xez7sc2r
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2025-04-01 11:55:20 -0400, Andres Freund wrote:
> I haven't yet pushed the changes, but will work on that in the afternoon.

There are three different types of failures in the test_aio test so far:

1) TEMP_CONFIG

See https://postgr.es/m/zh5u22wbpcyfw2ddl3lsvmsxf4yvsrvgxqwwmfjddc4c2khsgp%40gfysyjsaelr5

2) Failure on at least some windows BF machines:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2025-04-01%2020%3A15%3A19
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2025-04-01%2019%3A03%3A07

Afaict the error is independent of AIO, instead just related CREATE DATABASE
... STRATEGY wal_log failing on windows. In contrast to dropdb(), which does

/*
* Force a checkpoint to make sure the checkpointer has received the
* message sent by ForgetDatabaseSyncRequests.
*/
RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT);

/* Close all smgr fds in all backends. */
WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_SMGRRELEASE));

createdb_failure_callback() does no such thing. But it's rather likely that
we, bgwriter, checkpointer (and now IO workers) have files open for the target
database.

Note that the test is failing even with "io_method=sync", which obviously
doesn't use IO workers, so it's not related to that.

It's probably not a good idea to blockingly request a checkpoint and a barrier
inside a PG_TRY/PG_ENSURE_ERROR_CLEANUP() though, so this would need a bit
more rearchitecting.

I think I'm just going to make the test more lenient by not insisting that the
error is the first thing on psql's stderr.

3) Some subtests fail if RELCACHE_FORCE_RELEASE and CATCACHE_FORCE_RELEASE are defined:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2025-04-01%2019%3A23%3A07

# +++ tap check in src/test/modules/test_aio +++

# Failed test 'worker: batch_start() leak & cleanup in implicit xact: expected stderr'
# at t/001_aio.pl line 318.
# 'psql:<stdin>:4: ERROR: starting batch while batch already in progress'
# doesn't match '(?^:open AIO batch at end)'

The problem is basically that the test intentionally forgets to exit batchmode
- normally that would trigger an error at the end of the transaction, which
the test verifies. However, with RELCACHE_FORCE_RELEASE and
CATCACHE_FORCE_RELEASE defined, we get other code entering batchmode and
erroring out because batchmode isn't allowed to be entered recursively.

#0 pgaio_enter_batchmode () at ../../../../../home/andres/src/postgresql/src/backend/storage/aio/aio.c:997
#1 0x000055ec847959bf in read_stream_look_ahead (stream=0x55ecbcfda098)
at ../../../../../home/andres/src/postgresql/src/backend/storage/aio/read_stream.c:438
#2 0x000055ec84796514 in read_stream_next_buffer (stream=0x55ecbcfda098, per_buffer_data=0x0)
at ../../../../../home/andres/src/postgresql/src/backend/storage/aio/read_stream.c:890
#3 0x000055ec8432520b in heap_fetch_next_buffer (scan=0x55ecbcfd1c00, dir=ForwardScanDirection)
at ../../../../../home/andres/src/postgresql/src/backend/access/heap/heapam.c:679
#4 0x000055ec843259a4 in heapgettup_pagemode (scan=0x55ecbcfd1c00, dir=ForwardScanDirection, nkeys=1, key=0x55ecbcfd1620)
at ../../../../../home/andres/src/postgresql/src/backend/access/heap/heapam.c:1041
#5 0x000055ec843263ba in heap_getnextslot (sscan=0x55ecbcfd1c00, direction=ForwardScanDirection, slot=0x55ecbcfd0e18)
at ../../../../../home/andres/src/postgresql/src/backend/access/heap/heapam.c:1420
#6 0x000055ec8434ebe5 in table_scan_getnextslot (sscan=0x55ecbcfd1c00, direction=ForwardScanDirection, slot=0x55ecbcfd0e18)
at ../../../../../home/andres/src/postgresql/src/include/access/tableam.h:1041
#7 0x000055ec8434f786 in systable_getnext (sysscan=0x55ecbcfd8088) at ../../../../../home/andres/src/postgresql/src/backend/access/index/genam.c:541
#8 0x000055ec849c784a in SearchCatCacheMiss (cache=0x55ecbcf81000, nkeys=1, hashValue=3830081846, hashIndex=2, v1=403, v2=0, v3=0, v4=0)
at ../../../../../home/andres/src/postgresql/src/backend/utils/cache/catcache.c:1543
#9 0x000055ec849c76f9 in SearchCatCacheInternal (cache=0x55ecbcf81000, nkeys=1, v1=403, v2=0, v3=0, v4=0)
at ../../../../../home/andres/src/postgresql/src/backend/utils/cache/catcache.c:1464
#10 0x000055ec849c73ec in SearchCatCache1 (cache=0x55ecbcf81000, v1=403) at ../../../../../home/andres/src/postgresql/src/backend/utils/cache/catcache.c:1332
#11 0x000055ec849e5ae3 in SearchSysCache1 (cacheId=2, key1=403) at ../../../../../home/andres/src/postgresql/src/backend/utils/cache/syscache.c:228
#12 0x000055ec849d8c78 in RelationInitIndexAccessInfo (relation=0x7f6a85901c20)
at ../../../../../home/andres/src/postgresql/src/backend/utils/cache/relcache.c:1456
#13 0x000055ec849d8471 in RelationBuildDesc (targetRelId=2703, insertIt=true)
at ../../../../../home/andres/src/postgresql/src/backend/utils/cache/relcache.c:1201
#14 0x000055ec849d9e9c in RelationIdGetRelation (relationId=2703) at ../../../../../home/andres/src/postgresql/src/backend/utils/cache/relcache.c:2100
#15 0x000055ec842d219f in relation_open (relationId=2703, lockmode=1) at ../../../../../home/andres/src/postgresql/src/backend/access/common/relation.c:58
#16 0x000055ec8435043c in index_open (relationId=2703, lockmode=1) at ../../../../../home/andres/src/postgresql/src/backend/access/index/indexam.c:137
#17 0x000055ec8434f2f9 in systable_beginscan (heapRelation=0x7f6a859353a8, indexId=2703, indexOK=true, snapshot=0x0, nkeys=1, key=0x7ffc11aa7c90)
at ../../../../../home/andres/src/postgresql/src/backend/access/index/genam.c:400
#18 0x000055ec849c782c in SearchCatCacheMiss (cache=0x55ecbcfa0e80, nkeys=1, hashValue=2659955452, hashIndex=60, v1=2278, v2=0, v3=0, v4=0)
at ../../../../../home/andres/src/postgresql/src/backend/utils/cache/catcache.c:1533
#19 0x000055ec849c76f9 in SearchCatCacheInternal (cache=0x55ecbcfa0e80, nkeys=1, v1=2278, v2=0, v3=0, v4=0)
at ../../../../../home/andres/src/postgresql/src/backend/utils/cache/catcache.c:1464
#20 0x000055ec849c73ec in SearchCatCache1 (cache=0x55ecbcfa0e80, v1=2278) at ../../../../../home/andres/src/postgresql/src/backend/utils/cache/catcache.c:1332
#21 0x000055ec849e5ae3 in SearchSysCache1 (cacheId=82, key1=2278) at ../../../../../home/andres/src/postgresql/src/backend/utils/cache/syscache.c:228
#22 0x000055ec849d0375 in getTypeOutputInfo (type=2278, typOutput=0x55ecbcfd15d0, typIsVarlena=0x55ecbcfd15d8)
at ../../../../../home/andres/src/postgresql/src/backend/utils/cache/lsyscache.c:2995
#23 0x000055ec842d1a57 in printtup_prepare_info (myState=0x55ecbcfcec00, typeinfo=0x55ecbcfd0588, numAttrs=1)
at ../../../../../home/andres/src/postgresql/src/backend/access/common/printtup.c:277
#24 0x000055ec842d1ba6 in printtup (slot=0x55ecbcfd0b28, self=0x55ecbcfcec00)
at ../../../../../home/andres/src/postgresql/src/backend/access/common/printtup.c:315
#25 0x000055ec84541f54 in ExecutePlan (queryDesc=0x55ecbced4290, operation=CMD_SELECT, sendTuples=true, numberTuples=0, direction=ForwardScanDirection,
dest=0x55ecbcfcec00) at ../../../../../home/andres/src/postgresql/src/backend/executor/execMain.c:1814

I don't really have a good idea how to deal with that yet.

Greetings,

Andres

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Matthias van de Meent 2025-04-01 21:56:13 Re: Adding skip scan (including MDAM style range skip scan) to nbtree
Previous Message Jacob Champion 2025-04-01 21:46:56 Re: Making sslrootcert=system work on Windows psql