Quick Links

Re: Should from_collapse be switched off? (queries 10 times faster)

From:	Peter <pmc(at)citylink(dot)dinoex(dot)sub(dot)org>
To:	Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Cc:	pgsql-performance(at)lists(dot)postgresql(dot)org
Subject:	Re: Should from_collapse be switched off? (queries 10 times faster)
Date:	2018-03-23 14:30:21
Message-ID:	20180323143021.GA31620@gate.oper.dinoex.org
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

The problem appeared when I found the queries suddenly taking longer
than usual. Investigation showed that execution time greatly depends
on the way the queries are invoked.
Consider fn(x) simply a macro containing a plain SQL SELECT statement
returning SETOF (further detail follows below):

# SELECT fn(x);
-> 6.3 sec.

# SELECT a from fn(x) as a;
-> 1.3 sec.

Further investigation with auto_explain shows different plans being
chosen. The slower one uses an Index Only Scan, which seems to perform
bad. Slightly increasing random_page_cost solves this, but this seems
the wrong way, because we are on SSD+ZFS, where random_page_cost
actually should be DEcreased, as there is no difference if random or
sequential.

During this effort I accidentally came upon from_collapse_limit,
and setting it off significantly changed things:

# SET from_collapse_limit = 1;

# SELECT fn(x);
-> 0.6 sec.

# SELECT a from fn(x) as a;
-> 1.2 sec.

The plans look different now (obviousely), and again the difference
between the two invocations comes from an an Index Only Scan, but
this time the Index Only Scan is faster. So now we can reduce
random_page_cost in order to better reflect physical circumstances,
and then both invocations will be fast.

From here it looks like from_collapse is the problem.

Now for the details:

VACUUM ANALYZE is up to date, and all respective configurations are as
default.

The query itself contains three nested SELECTS working all on the same
table. The table is 400'000 rows, 36 MB. (The machine is a pentium-3,
which is my router - so don't be surprized about the comparatively long
execution times.)

This is the (critical part of the) query - let $1 be something like
'2017-03-03':

SELECT MAX(quotes.datum) AS ratedate, aktkurs.*
FROM quotes, wpnames, places,
(SELECT quotes.datum, close, quotes.wpname_id, places.waehrung
FROM quotes, wpnames, places,
(SELECT MAX(datum) AS datum, wpname_id
FROM quotes
WHERE datum <= $1
GROUP BY wpname_id) AS newest
WHERE newest.datum = quotes.datum
AND newest.wpname_id = quotes.wpname_id
AND quotes.wpname_id = wpnames.id
AND wpnames.place_id = places.id) AS aktkurs
WHERE quotes.wpname_id = wpnames.id
AND wpnames.place_id = places.id AND places.platz = 'WAEHR'
AND wpnames.nummer = aktkurs.waehrung
AND quotes.datum <= aktkurs.datum
GROUP BY aktkurs.datum, aktkurs.close, aktkurs.wpname_id,
aktkurs.waehrung

Here are the (respective parts of the) tables:

CREATE TABLE public.quotes -- rows = 405466, 36 MB
(
id integer NOT NULL DEFAULT nextval('quotes_id_seq'::regclass),
wpname_id integer NOT NULL,
datum date NOT NULL,
close double precision NOT NULL,
CONSTRAINT quotes_pkey PRIMARY KEY (id),
CONSTRAINT fk_rails_626c320689 FOREIGN KEY (wpname_id)
REFERENCES public.wpnames (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
CREATE INDEX quotes_wd_idx -- 8912 kB
ON public.quotes
USING btree
(wpname_id, datum);

CREATE TABLE public.wpnames -- rows = 357, 40 kB
(
id integer NOT NULL DEFAULT nextval('wpnames_id_seq'::regclass),
place_id integer NOT NULL,
nummer text NOT NULL,
name text NOT NULL,
CONSTRAINT wpnames_pkey PRIMARY KEY (id),
CONSTRAINT fk_rails_18eae07552 FOREIGN KEY (place_id)
REFERENCES public.places (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)

CREATE TABLE public.places -- rows = 11, 8192 b
(
id integer NOT NULL DEFAULT nextval('places_id_seq'::regclass),
platz text NOT NULL,
text text,
waehrung character varying(3) NOT NULL,
CONSTRAINT places_pkey PRIMARY KEY (id)
)

Hint: the quotes table contains daily stock quotes AND forex quotes,
and what the thing does is fetch the newest quotes before a given
date (inmost SELECT), fetch the respective currency ("waehrung") from
wpnames+places (next SELECT), and fetch the (date of the) respective
newest forex quote (last SELECT). (A final outermost fourth select
will then put it all together, but thats not part of the problem.)

Finally, the execution plans:

6 sec. index only scan with from_collapse:
https://explain.depesz.com/s/IPaT

1.3 sec. seq scan with from_collapse:
https://explain.depesz.com/s/Bxys

1.2 sec. seq scan w/o from_collapse:
https://explain.depesz.com/s/V02L

0.6 sec. index only scan w/o from_collapse:
https://explain.depesz.com/s/8Xh

Addendum: from the Guides for the mailing list, supplemental
information as requested. As this concerns planner strategy, which is
influenced by statistics, it appears difficult to me to create a
proper test-case, because I would need to know from where the planner
fetches the decision-relevant information - which is exactly my
question: how does it get the clue to choose the bad plans?

CPU: Intel Pentium III (945.02-MHz 686-class CPU)
avail memory = 2089263104 (1992 MB)
FreeBSD 11.1-RELEASE-p7
PostgreSQL 9.5.7 on i386-portbld-freebsd11.1, compiled by FreeBSD clang version 4.0.0 (tags/RELEASE_400/final 297347) (based on LLVM 4.0.0), 32-bit

In response to

Re: Should from_collapse be switched off? (queries 10 times faster) at 2018-03-23 11:41:35 from Laurenz Albe

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Peter	2018-03-23 14:30:52	Re: Should from_collapse be switched off? (queries 10 times faster)
Previous Message	Tom Lane	2018-03-23 14:14:19	Re: Should from_collapse be switched off? (queries 10 times faster)