Re: Sort is generating rows

From: Nicolas Seinlet <nicolas(at)seinlet(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Sort is generating rows
Date: 2018-05-31 13:16:17
Message-ID: CAMEUSaQp1C4PFfate=bcWWxOddfpqGUPk=aXGTcjVsQQYpgqSQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

2018-05-31 15:10 GMT+02:00 Jeff Janes <jeff(dot)janes(at)gmail(dot)com>:

> On Thu, May 31, 2018 at 7:22 AM, Nicolas Seinlet <nicolas(at)seinlet(dot)com>
> wrote:
>
>> Hi,
>>
>> I have a query with a strange query plan.
>>
>> This query is roughly searching for sales, and convert them with a
>> currency rate. As currency rate changes from time to time, table contains
>> the currency, the company, the rate, the start date of availability of this
>> rate and the end date of availability.
>>
>> The join is done using :
>> left join currency_rate cr on (cr.currency_id = pp.currency_id and
>> cr.company_id = s.company_id and
>> cr.date_start <= coalesce(s.date_order, now()) and
>> (cr.date_end is null or cr.date_end > coalesce(s.date_order,
>> now())))
>>
>> The tricky part is the date range on the currency rate, which is not an
>> equality.
>>
>> the query plan shows:
>> -> Sort (cost=120.13..124.22 rows=1637 width=56) (actual
>> time=14.300..72084.758 rows=308054684 loops=1)
>> Sort Key: cr.currency_id, cr.company_id
>> Sort Method: quicksort Memory: 172kB
>> -> CTE Scan on currency_rate cr
>> (cost=0.00..32.74 rows=1637 width=56) (actual time=1.403..13.610 rows=1576
>> loops=1)
>>
>> There's 2 challenging things :
>> - planner estimates 1637 rows, and get 300 million lines
>> - sorting is generating lines
>>
>
> These are both explained by the same thing. The sort is feeding into a
> merge join. For every row in the other node which have the same value of
> the scan keys, the entire section of this sort with those same keys gets
> scanned again. The repeated scanning gets counted in the actual row count,
> but isn't counted in the expected row count, or the actual row count of the
> thing feeding into the sort (the CTE)
>
>
>>
>>
> For now, the more currency rates, the slowest the query. There's not that
>> much currency rates (1k in this case), as you can only have one rate per
>> day per currency.
>>
>
> If it is only per currency per day, then why is company_id present? In any
> case, you might be better off listing the rates per day, rather than as a
> range, and then doing an equality join.
>
> Cheers,
>
> Jeff
>

Hi,

Thanks for the answer.

You're right, company_id is present, so you can have one rate per day per
currency per company. I've tried to simplify the question without modifying
the query plan, so I didn't talk about it even if it's present. I will now
try to generate a virtual table of rates per dates.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Fred Habash 2018-06-05 14:17:08 Simple Query Elapsed Time ~ 3 hrs Using Bitmap Index/Heap Scan
Previous Message Jeff Janes 2018-05-31 13:10:56 Re: Sort is generating rows