Re: FETCH FIRST clause PERCENT option

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: ryan(at)rustprooflabs(dot)com
Cc: surafel3000(at)gmail(dot)com, thomas(dot)munro(at)gmail(dot)com, andres(at)anarazel(dot)de, tomas(dot)vondra(at)2ndquadrant(dot)com, vik(dot)fearing(at)2ndquadrant(dot)com, hornschnorter(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org, andrew(at)tao11(dot)riddles(dot)org(dot)uk
Subject: Re: FETCH FIRST clause PERCENT option
Date: 2019-07-10 06:02:57
Message-ID: 20190710.150257.260806103.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello.

At Tue, 9 Jul 2019 21:56:32 -0600, Ryan Lambert <ryan(at)rustprooflabs(dot)com> wrote in <CAN-V+g-rwFp=xQEjOwbJuggNLegMi1qDhaJt3h1Eqm16yqwqmw(at)mail(dot)gmail(dot)com>
> I did some more testing. I initialized a database with 1 million rows with
> indexes and joins to test against and ran pgbench with a few different
> settings for % to return. I started with a base query not utilizing the
> new functionality. The queries used are similar to my prior examples, code
> at [1].
>
> createdb bench_test
> psql -d bench_test -f init/reporting.sql -v scale=10
>
> The following provided 3.21 TPS and an average latency of 623. The
> "per_change_" columns in the table below use those values.
>
> pgbench -c 2 -j 2 -T 600 -P 60 -s 10 \
> -f tests/reporting1.sql bench_test
>
> The remainder of the tests use the following, only adjusting fetch_percent
> value:
>
> pgbench -c 2 -j 2 -T 600 -P 60 -s 10 \
> --define=fetch_percent=1 \
> -f tests/reporting_fetch_percent.sql \
> bench_test
>
>
> Returning 1% it runs well. By 10% the TPS drops by 30% while the average
> latency increases by 43%. When returning 95% of the table latency has
> increased by 548%.
>
>
> fetch_percent | tps | latency_avg_ms | per_change_tps | per_change_latency
> ---------------+------+----------------+----------------+--------------------
> 1 | 3.37 | 593 | 0.05 | -0.05
> 5 | 2.85 | 700 | -0.11 | 0.12
> 10 | 2.24 | 891 | -0.30 | 0.43
> 25 | 1.40 | 1423 | -0.56 | 1.28
> 45 | 0.93 | 2147 | -0.71 | 2.45
> 95 | 0.49 | 4035 | -0.85 | 5.48
>
>
> I manually tested the inner select queries without the outer aggregation
> thinking it might be a different story with a simple select and no CTE.
> Unfortunately it showed the same overall characteristics. 1% returns in
> about 550 ms, 45% took 1950, and 95% took 4050.
>
> [1] https://github.com/rustprooflabs/pgbench-tests

It is seen by a simpler test.

create table t as select a from generate_series(0, 99999) a;
analyze t;
explain analyze select * from t order by a desc;
Execution Time: 116.613 ms
explain analyze select * from t order by a desc fetch first 1 percent rows only;
Execution Time: 158.458 ms
explain analyze select * from t order by a desc fetch first 100 percent rows only;
Execution Time: 364.442 ms

I didn't looked closer to the version. Fetching from tuplestore
and returning all tuples costs 206ms and it is exceeding the cost
of fething of the whole table and returning all tuples. I don't
believe tuplestore that isn't splling out to disk is so slower
than (cached) table access.

Other than that, we can rip the clause if it is 100%

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2019-07-10 06:03:31 Re: Copy data to DSA area
Previous Message Kyotaro Horiguchi 2019-07-10 05:51:18 make clean removes excesively