BTScanOpaqueData size slows down tests

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>
Subject: BTScanOpaqueData size slows down tests
Date: 2025-04-02 15:20:58
Message-ID: kgz63a4hp6s22egd47mlgngkjsz44t6wgojzlzi67zgrx2mzl3@dntq6nrahdgr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I was a bit annoyed at test times just now. Ran a profile on the entire
regression tests in a cassert -Og build.

Unsurprisingly most of the time is spent in AllocSetCheck(). I was mildly
surprised to see how expensive the new compact attribute checks are.

What I was more surprised to realize is how much of the time is spent in
freeing and allocating BTScanOpaqueData.

+ 6.94% postgres postgres [.] AllocSetCheck
- 4.96% postgres libc.so.6 [.] __memset_evex_unaligned_erms
- 1.94% memset(at)plt
- 1.12% _int_malloc
- 1.11% malloc
- 0.90% AllocSetAllocLarge
- AllocSetAlloc
- 0.77% palloc
- 0.63% btbeginscan
- index_beginscan_internal
- 0.63% index_beginscan
- 0.61% systable_beginscan
+ 0.22% SearchCatCacheMiss
+ 0.07% ScanPgRelation
+ 0.05% RelationBuildTupleDesc
+ 0.04% findDependentObjects
0.03% GetNewOidWithIndex
+ 0.02% deleteOneObject
+ 0.02% shdepDropDependency
+ 0.02% DeleteComments
+ 0.02% SearchCatCacheList
+ 0.02% DeleteSecurityLabel
+ 0.02% DeleteInitPrivs
+ 0.04% text_to_cstring
+ 0.02% cstring_to_text_with_len
+ 0.02% datumCopy
+ 0.02% tuplesort_begin_batch
+ 0.11% palloc_extended
+ 0.01% AllocSetRealloc
+ 0.20% AllocSetAllocFromNewBlock
+ 0.82% _int_free_merge_chunk
- 1.90% __memset_evex_unaligned_erms
- 1.82% wipe_mem
- 1.33% AllocSetFree
- 1.33% pfree
+ 0.73% btendscan
+ 0.22% freedfa
+ 0.06% ExecAggCopyTransValue
+ 0.04% freenfa
+ 0.03% enlarge_list
+ 0.03% ExecDropSingleTupleTableSlot
+ 0.02% xmlconcat
+ 0.01% RemoveLocalLock
+ 0.01% errcontext_msg
+ 0.01% IndexScanEnd
0.01% heap_free_minimal_tuple
+ 0.49% AllocSetReset
0.02% palloc0
0.01% PageInit
+ 0.01% wipe_mem
+ 0.59% alloc_perturb
+ 0.46% asm_exc_page_fault
+ 0.03% asm_sysvec_apic_timer_interrupt
+ 0.02% wipe_mem

Looking at the size of BTScanOpaqueData I am less surprised:

/* --- cacheline 1 boundary (64 bytes) --- */
char * currTuples; /* 64 8 */
char * markTuples; /* 72 8 */
int markItemIndex; /* 80 4 */

/* XXX 4 bytes hole, try to pack */

BTScanPosData currPos __attribute__((__aligned__(8))); /* 88 13632 */
/* --- cacheline 214 boundary (13696 bytes) was 24 bytes ago --- */
BTScanPosData markPos __attribute__((__aligned__(8))); /* 13720 13632 */

/* size: 27352, cachelines: 428, members: 17 */
/* sum members: 27340, holes: 4, sum holes: 12 */
/* forced alignments: 2, forced holes: 1, sum forced holes: 4 */
/* last cacheline: 24 bytes */
} __attribute__((__aligned__(8)));

allocating, zeroing and freeing 28kB of memory for every syscache miss, yea,
that's gonna hurt.

The reason BTScanPosData is that large is that it stores MaxTIDsPerBTreePage*
sizeof(BTScanPosItem):
BTScanPosItem items[1358] __attribute__((__aligned__(2))); /* 48 13580 */

Could we perhaps allocate BTScanPosData->items dynamically if more than a
handful of items are needed?

And/or perhaps we could could allocate BTScanOpaqueData.markPos as a whole
only when mark/restore are used?

I'd be rather unsurprised if this isn't just an issue for tests, but also in a
few real workloads.

Greetings,

Andres Freund

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tender Wang 2025-04-02 15:24:32 Re: bug when apply fast default mechanism for adding new column over domain with default value
Previous Message Fujii Masao 2025-04-02 15:19:58 Re: in BeginCopyTo make materialized view using COPY TO instead of COPY (query).