Sort is generating rows

From: Nicolas Seinlet <nicolas(at)seinlet(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Sort is generating rows
Date: 2018-05-31 09:09:38
Message-ID: CAMEUSaRx0UZ4Z0JGuaxbz3UtfMG9_q+v6pV0t41XzGOuVY307Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

I have a query with a strange query plan.

This query is roughly searching for sales, and convert them with a currency
rate. As currency rate changes from time to time, table contains the
currency, the company, the rate, the start date of availability of this
rate and the end date of availability.

The join is done using :
left join currency_rate cr on (cr.currency_id = pp.currency_id and
cr.company_id = s.company_id and
cr.date_start <= coalesce(s.date_order, now()) and
(cr.date_end is null or cr.date_end > coalesce(s.date_order,
now())))

The tricky part is the date range on the currency rate, which is not an
equality.

the query plan shows:
-> Sort (cost=120.13..124.22 rows=1637 width=56) (actual
time=14.300..72084.758 rows=308054684 loops=1)
Sort Key: cr.currency_id, cr.company_id
Sort Method: quicksort Memory: 172kB
-> CTE Scan on currency_rate cr
(cost=0.00..32.74 rows=1637 width=56) (actual time=1.403..13.610 rows=1576
loops=1)

There's 2 challenging things :
- planner estimates 1637 rows, and get 300 million lines
- sorting is generating lines

later in the query plan, you find the join:
-> Merge Left Join (cost=341056.75..351344.76 rows=1165112 width=224)
(actual time=9792.635..269120.409 rows=1170055 loops=1)
Merge Cond: ((pp.currency_id = cr.currency_id) AND
(s.company_id = cr.company_id))
Join Filter: ((cr.date_start <=
COALESCE((s.date_order)::timestamp with time zone, now())) AND
((cr.date_end IS NULL) OR (cr.date_end > COALESCE((s.date_order)::timestamp
with time zone, now()))))
Rows Removed by Join Filter: 307266434

It seems the join deletes all the generated million lines, which is correct.

My question is then , is there a better way to join a table to another
using a date range, knowing that there's no overlap between date ranges?
Should we generate a virtual table with rates for all dates, and joining
using an equality?

For now, the more currency rates, the slowest the query. There's not that
much currency rates (1k in this case), as you can only have one rate per
day per currency.

Have a nice day,

Nicolas.

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ashu Pachauri 2018-05-31 10:38:29 Long running DDL statements blocking all queries
Previous Message Federico Di Gregorio 2018-05-31 07:22:52 Re: License Question