ASOF join

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: ASOF join
Date: 2017-06-15 16:20:56
Message-ID: bc494762-26bd-b100-e1f9-a97901ddad57@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wonder if there were some discussion/attempts to add ASOF join to
Postgres (sorry, may be there is better term for it, I am refereeing
KDB definition: http://code.kx.com/wiki/Reference/aj ).
Such kind of join can be useful when we need to associate two
timeseries. It is quite popular in trading:

//join the eq price and size for the op trade time
a::aj[`underlyingSym`time;select time, underlyingSym, sym, putorcall,
ex, cond, price, seqnum, contracts, contractsTraded from t;eqtrades];

...and not only. Below is one example of how people now manually coding
ASOF join:

select
count(*),
count(*)
filter (where timedelta_prev < -30),
count(*)
filter (where ride_prev = ride_next),
... -- many different aggregates
from
(
select
p.provider_id,
p.ts,
(
select extract(epoch from t.ts - p.ts)
from providers_positions t
where p.provider_id = t.provider_id and t.ts < p.ts and
t.source = 'gps'
order by t.ts desc
limit 1
) as timedelta_prev,
(
select extract(epoch from t.ts - p.ts)
from providers_positions t
where p.provider_id = t.provider_id and t.ts > p.ts and
t.source = 'gps'
order by t.ts
limit 1
) as timedelta,
(
select ride_id
from providers_positions t
where p.provider_id = t.provider_id and t.ts < p.ts and
t.source = 'gps'
order by t.ts desc
limit 1
) as ride_prev,
(
select ride_id
from providers_positions t
where p.provider_id = t.provider_id and t.ts > p.ts and
t.source = 'gps'
order by t.ts
limit 1
) as ride_next
from (
select
provider_id,
ts,
event_name
from
lvl2_681_parsed p
) p
where
p.event_name = 'GPS signal restored'
-- offset 0
) z;

Without OFFSET 0 this query generates awful execution plans with
hundreds (!) of subplans corresponding to the subqueries.
Number of subplans (most of them are the same) is equal number of
occurrences of timedelta, timedelta_prev, ... columns in target aggregates.
OFFSET 0 reduce number of subplans to 4. And I expect that using LATERAL
join can reduce it to two and even without "OFFSET 0" trick.
But in any case - it is very complex and unnatural way of expressing
this not so complex query.
With ASOF join is can be written much simpler.

Also Postgres is implementing this query using nested loop with index
scan, which is definitely not the most efficient strategy.
The best way to implement ASOF join is to use something like mergejoin.
Usually there are indexes for both timeseries, so what we need is to
merge two ordered sets using ASOF join rules.
It will require minimal changes in SQL syntax, just adding ASOF keyword
seems to be enough:

select * from Trades ASOF JOIN EqTrades USING (underlyingSym,time);

It seems to me that adding ASOF joins should not require huge amount of
work and can be done with minimal number of changes in executor and
optimizer.
But may be there are some problems/challenges which I do not realize now?

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Petr Jelinek 2017-06-15 16:22:34 Re: Get stuck when dropping a subscription during synchronizing table
Previous Message Yuan Dong 2017-06-15 16:19:09 GiST API Adancement