Re: should check interrupts in BuildRelationExtStatistics ?

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: should check interrupts in BuildRelationExtStatistics ?
Date: 2022-06-05 01:42:33
Message-ID: 20220605014233.GR29853@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jun 03, 2022 at 10:28:37AM -0500, Justin Pryzby wrote:
> Maybe this should actually call vacuum_delay_point(), like
> compute_scalar_stats().

I think vacuum_delay_point() would be wrong for these cases, since they don't
call "fetchfunc()", like the other places which use vacuum_delay_point.

> For MCV, there seems to be no issue, since those
> functions are being called (but only for expressional stats). But maybe I've
> just failed to make a large enough, non-expressional MCV list for the problem
> to be apparent.

I reproduced the issue with MCV like this:

DROP TABLE IF EXISTS t; CREATE TABLE t AS SELECT a::text,b::text,c::text,d::text,e::text,f::text,g::text FROM generate_series(1000001,1000006)a,generate_series(1000001,1000006)b,generate_series(1000001,1000006)c,generate_series(1000001,1000006)d,generate_series(1000001,1000006)e,generate_series(1000001,1000006)f,generate_series(1000001,1000006)g,generate_series(1000001,1000006)h; VACUUM t;
DROP STATISTICS IF EXISTS stxx; CREATE STATISTICS stxx (mcv) ON a,b,c,d,e,f FROM t; ALTER STATISTICS stxx SET STATISTICS 9999; ANALYZE VERBOSE t;

This is slow (25 seconds) inside qsort:

(gdb) bt
#0 __memcmp_sse4_1 () at ../sysdeps/x86_64/multiarch/memcmp-sse4.S:1020
#1 0x00005653d8686fac in varstrfastcmp_locale (a1p=0x5653dce67d54 "1000004~", len1=7, a2p=0x5653e895ffa4 "1000004~", len2=7, ssup=ssup(at)entry=0x5653d98c37b8) at varlena.c:2444
#2 0x00005653d8687161 in varlenafastcmp_locale (x=94918188367184, y=94918384418720, ssup=0x5653d98c37b8) at varlena.c:2270
#3 0x00005653d85134d8 in ApplySortComparator (ssup=0x5653d98c37b8, isNull2=<optimized out>, datum2=<optimized out>, isNull1=<optimized out>, datum1=<optimized out>) at ../../../src/include/utils/sortsupport.h:224
#4 multi_sort_compare (a=0x7fa587b44e58, b=0x7fa5875f0dd0, arg=0x5653d98c37b0) at extended_stats.c:903
#5 0x00005653d8712eed in qsort_arg (data=data(at)entry=0x7fa5875f0050, n=<optimized out>, n(at)entry=1679616, element_size=element_size(at)entry=24, compare=compare(at)entry=0x5653d8513483 <multi_sort_compare>,
arg=arg(at)entry=0x5653d98c37b0) at ../../src/include/lib/sort_template.h:349
#6 0x00005653d851415f in build_sorted_items (data=data(at)entry=0x7fa58f2e1050, nitems=nitems(at)entry=0x7ffe4f764e5c, mss=mss(at)entry=0x5653d98c37b0, numattrs=6, attnums=0x7fa58f2e1078) at extended_stats.c:1134
#7 0x00005653d8515d84 in statext_mcv_build (data=data(at)entry=0x7fa58f2e1050, totalrows=totalrows(at)entry=1679616, stattarget=stattarget(at)entry=9999) at mcv.c:204
#8 0x00005653d8513819 in BuildRelationExtStatistics (onerel=onerel(at)entry=0x7fa5b26ef658, inh=inh(at)entry=false, totalrows=1679616, numrows=numrows(at)entry=1679616, rows=rows(at)entry=0x7fa5a4103050, natts=natts(at)entry=7,
vacattrstats=vacattrstats(at)entry=0x5653d98b76b0) at extended_stats.c:213

The fix seems to be to CHECK_FOR_INTERRUPTS() within multi_sort_compare().
That would supercede the other two CHECK_FOR_INTERRUPTS I'd proposed, and
handle mcv, depends, and ndistinct all at once.

Does that sound right ?

For MCV, there's also ~0.6sec spent in build_column_frequencies(), which (if
needed) would be addressed by adding CFI in sort_item_compare.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Guillaume Lelarge 2022-06-05 07:03:47 Re: Proposal: adding a better description in psql command about large objects
Previous Message Michael Paquier 2022-06-05 00:24:25 Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch