Re: SELECT is faster on SQL Server

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Frank Millman <frank(at)chagford(dot)com>
Cc: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: SELECT is faster on SQL Server
Date: 2021-03-19 10:00:32
Message-ID: CAFj8pRBLakshW08Har67iJcx=OnSsazqXb8CeY-k3dLUpqSrwQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

pá 19. 3. 2021 v 10:22 odesílatel Frank Millman <frank(at)chagford(dot)com> napsal:

>
> On 2021-03-19 10:56 AM, Pavel Stehule wrote:
>
>
>
> pá 19. 3. 2021 v 9:53 odesílatel Frank Millman <frank(at)chagford(dot)com>
> napsal:
>
>>
>> On 2021-03-19 10:29 AM, Thomas Kellerer wrote:
>> > Frank Millman schrieb am 19.03.2021 um 09:19:
>> >> This may be a non-issue, and I don't want to waste your time. But
>> perhaps someone can have a look to see if there is anything obvious I have
>> missed.
>> >>
>> >> I am writing a cross-platform accounting app, and I test using Sql
>> >> Server on Windows 10 and PostgreSql on Fedora 31. Performance is
>> >> usually very similar, with a slight edge to PostgreSql. Now I have a
>> >> SELECT which runs over twice as fast on Sql Server compared to
>> >> PostgreSql.
>> >>
>> > Can you change the SELECT statement?
>> >
>> > Very often "distinct on ()" is faster in Postgres compared to the
>> equivalent solution using window functions
>> >
>> > Something along the lines (for the first derived table):
>> >
>> > SELECT ...
>> > FROM (
>> > SELECT a.source_code_id, SUM(a.tran_tot) AS cl_tot
>> > FROM (
>> > SELECT distinct on (location_row_id, function_row_id,
>> source_code_id) source_code_id, tran_tot
>> > FROM prop.ar_totals
>> > WHERE deleted_id = 0
>> > AND tran_date <= '2018-03-31'
>> > AND ledger_row_id = 1
>> > ORDER BY location_row_id, function_row_id, source_code_id,
>> tran_date DESC
>> > ) AS a
>> > GROUP BY a.source_code_id
>> > ) as cl_bal
>> > ...
>> Thanks, Thomas
>>
>> I tried that, and it ran about 10% faster. Every little helps, but SQL
>> Server appears to have some secret sauce!
>>
>
> can you send a result of EXPLAIN ANALYZE?
>
>
> QUERY
> PLAN
>
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------
> Nested Loop Left Join (cost=5.66..5.74 rows=1 width=132) (actual
> time=0.213..0.248 rows=5 loops=1)
> Join Filter: (a_1.source_code_id = a.source_code_id)
> Rows Removed by Join Filter: 4
> -> GroupAggregate (cost=3.65..3.67 rows=1 width=36) (actual
> time=0.144..0.157 rows=5 loops=1)
> Group Key: a.source_code_id
> -> Sort (cost=3.65..3.65 rows=1 width=10) (actual
> time=0.131..0.135 rows=29 loops=1)
> Sort Key: a.source_code_id
> Sort Method: quicksort Memory: 26kB
> -> Subquery Scan on a (cost=2.36..3.64 rows=1 width=10)
> (actual time=0.063..0.116 rows=29 loops=1)
> Filter: (a.row_num = 1)
> Rows Removed by Filter: 3
> -> WindowAgg (cost=2.36..3.24 rows=32 width=34)
> (actual time=0.062..0.107 rows=32 loops=1)
> -> Sort (cost=2.36..2.44 rows=32 width=26)
> (actual time=0.054..0.059 rows=32 loops=1)
> Sort Key: ar_totals.location_row_id,
> ar_totals.function_row_id, ar_totals.source_code_id, ar_totals.tran_date
> DESC
> Sort Method: quicksort Memory: 27kB
> -> Seq Scan on ar_totals
> (cost=0.00..1.56 rows=32 width=26) (actual time=0.014..0.028 rows=32
> loops=1)
> Filter: ((tran_date <=
> '2018-03-31'::date) AND (deleted_id = 0) AND (ledger_row_id = 1))
> -> GroupAggregate (cost=2.01..2.03 rows=1 width=36) (actual
> time=0.017..0.017 rows=1 loops=5)
> Group Key: a_1.source_code_id
> -> Sort (cost=2.01..2.02 rows=1 width=10) (actual
> time=0.012..0.013 rows=8 loops=5)
> Sort Key: a_1.source_code_id
> Sort Method: quicksort Memory: 25kB
> -> Subquery Scan on a_1 (cost=1.68..2.00 rows=1 width=10)
> (actual time=0.032..0.047 rows=8 loops=1)
> Filter: (a_1.row_num = 1)
> -> WindowAgg (cost=1.68..1.90 rows=8 width=34)
> (actual time=0.031..0.043 rows=8 loops=1)
> -> Sort (cost=1.68..1.70 rows=8 width=26)
> (actual time=0.023..0.024 rows=8 loops=1)
> Sort Key: ar_totals_1.location_row_id,
> ar_totals_1.function_row_id, ar_totals_1.source_code_id,
> ar_totals_1.tran_date DESC
> Sort Method: quicksort Memory: 25kB
> -> Seq Scan on ar_totals ar_totals_1
> (cost=0.00..1.56 rows=8 width=26) (actual time=0.006..0.013 rows=8 loops=1)
> Filter: ((tran_date <
> '2018-03-01'::date) AND (deleted_id = 0) AND (ledger_row_id = 1))
> Rows Removed by Filter: 24
> Planning Time: 0.479 ms
> Execution Time: 0.344 ms
> (33 rows)
>
>
In this query the most slow operation is query planning. You try to do
tests on almost empty tables. This has no practical sense. You should test
queries on tables with size similar to production size.

The times less 1 ms has significant variance, and are not comparable.

Regards

Pavel

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Frank Millman 2021-03-19 10:58:10 Re: SELECT is faster on SQL Server
Previous Message Adalberto Caccia 2021-03-19 09:31:44 Re: WAL-G shipping to the cloud