Quick Links

Re: Performance of Query 4 on TPC-DS Benchmark

From:	Andrei Lepikhov <lepihov(at)gmail(dot)com>
To:	Ba Jinsheng <bajinsheng(at)u(dot)nus(dot)edu>, "pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject:	Re: Performance of Query 4 on TPC-DS Benchmark
Date:	2024-11-11 09:41:01
Message-ID:	bdb4da6e-4432-4cf3-8ec9-622e5fc64d83@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On 11/11/24 02:35, Ba Jinsheng wrote:
> Hi all,
>
> Please see this case:
>
>
> Query 4 on TPC-DS benchmark:
Thank you for interesting example!
Looking into explains I see two sortings:
-> Sort (cost=794037.94..794037.95 rows=1 width=132)
(actual time=3024403.310..3024403.313 rows=8 loops=1)
-> Sort (cost=794033.93..794033.94 rows=1 width=132)
(actual time=8068.869..8068.872 rows=8 loops=1)

Almost the same cost and different execution time. So, I think, the core
of the problem in accuracy of selectivity estimation.
In this specific example I see lots of composite scan filters:
- ((sale_type = 'w'::text) AND (dyear = 2002))
- ((year_total > '0'::numeric) AND (sale_type = 'w'::text) AND (dyear =
2001))
- ((year_total > '0'::numeric) AND (sale_type = 's'::text) AND (dyear =
2001))

It is all the time a challenge for PostgreSQL to estimate such a filter
because of absent information on joint column distribution.
Can you research this way by building extended statistics on these
clauses? It could move the plan to the more optimal direction.

--
regards, Andrei Lepikhov

In response to

Performance of Query 4 on TPC-DS Benchmark at 2024-11-10 19:35:36 from Ba Jinsheng

Responses

Re: Performance of Query 4 on TPC-DS Benchmark at 2024-11-11 10:49:56 from Ba Jinsheng

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Ba Jinsheng	2024-11-11 10:49:56	Re: Performance of Query 4 on TPC-DS Benchmark
Previous Message	Pavel Stehule	2024-11-10 22:41:31	Re: proposal: schema variables