Re: FETCH FIRST clause PERCENT option

From: Ryan Lambert <ryan(at)rustprooflabs(dot)com>
To: Surafel Temesgen <surafel3000(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Vik Fearing <vik(dot)fearing(at)2ndquadrant(dot)com>, Mark Dilger <hornschnorter(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Subject: Re: FETCH FIRST clause PERCENT option
Date: 2019-07-09 13:13:10
Message-ID: CAN-V+g89y6ch0UPDRUCPJ4=3n3EfTtDeU8NSiU1oGZtCaJuQUA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Surafel,

> The cost of FITCH FIRST N PERCENT execution in current implementation is
the cost of pulling the full table plus the cost of storing and fetching
the tuple from tuplestore so it can > not perform better than pulling the
full table in any case . This is because we can't determined the number of
rows to return without executing the plan until the end. We can find the >
estimation of rows that will be return in planner estimation but that is
not exact.

Ok, I can live with that for the normal use cases. This example from the
end of my previous message using 95% seems like a problem still, I don't
like syntax that unexpectedly kills performance like this one. If this
can't be improved in the initial release of the feature I'd suggest we at
least make a strong disclaimer in the docs, along the lines of:

"It is possible for FETCH FIRST N PERCENT to create poorly performing query
plans when the N supplied exceeds 50 percent. In these cases query
execution can take an order of magnitude longer to execute than simply
returning the full table. If performance is critical using an explicit row
count for limiting is recommended."

I'm not certain the 50 percent is the true threshold of where things start
to fall apart, I just used that as a likely guess for now. I can do some
more testing this week to identify where things start falling apart
performance wise. Thanks,

EXPLAIN (ANALYZE, COSTS)
WITH t AS (
SELECT id, v1, v2
FROM r10mwide
FETCH FIRST 95 PERCENT ROWS ONLY
) SELECT AVG(v1), MIN(v1), AVG(v1 + v2) FROM t
;
QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=651432.48..651432.49 rows=1 width=24) (actual
time=58981.043..58981.044 rows=1 loops=1)
-> Limit (cost=230715.67..461431.34 rows=9500057 width=20) (actual
time=0.017..55799.389 rows=9500000 loops=1)
-> Seq Scan on r10mwide (cost=0.00..242858.60 rows=10000060
width=20) (actual time=0.014..3847.146 rows=10000000 loops=1)
Planning Time: 0.117 ms
Execution Time: 59079.680 ms
(5 rows)

Ryan Lambert

>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2019-07-09 13:16:01 Re: Contribution to Perldoc for TestLib module in Postgres
Previous Message Patrick McHardy 2019-07-09 13:00:27 [PATCH] Fix trigger argument propagation to child partitions