Re: AXLE Plans for 9.5 and 9.6

From: Hannu Krosing <hannu(at)krosing(dot)net>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: AXLE Plans for 9.5 and 9.6
Date: 2014-04-22 12:30:24
Message-ID: 535660E0.4020003@krosing.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 04/22/2014 02:04 PM, Simon Riggs wrote:
> On 22 April 2014 00:24, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>> On 04/21/2014 03:41 PM, Simon Riggs wrote:
>>> Storage Efficiency
>>> * Compression
>>> * Column Orientation
>> You might look at turning this:
>>
>> http://citusdata.github.io/cstore_fdw/
>>
>> ... into a more integrated part of Postgres.
> Of course I'm aware of that work - credit to them. Certainly, many
> people feel that it is now time to do as you suggest and include
> column store features within PostgreSQL.
>
> As to turning it into a more integrated part of Postgres, we have a
> few problems there
>
> 1. cstore_fdw code has an incompatible licence
>
> 2. I don't think FDWs are the right place for complex new
> architectures such as column store, massively parallel processing or
> sharding.
I agree that FDW is not an end-all solution for all these, but it is a
reasonable starting point and it just might be that the extra things
needed could be added to our FDW API instead of sewing it directly
into backend guts.

I recently tried to implement sharding at FDW level and the main
problem I ran into was a missing join type for efficiently using it
for certain queries.

The specific use case was queries of form

select l.*, r*
from remotetable r
join localtable l
on l.key1 = r.id and l.n = N;

PostgreSQL offered only two options:

1) full scan on remote table

2) single id=$ selects

neither of which are what is actually needed, as firs performs badly
if there are more than a few rows in remote table and 2nd performs
badly if l.n = N returns more than a few rows

when I manually rewrote the query to

select l.*, r*
from remotetable r where r.id = ANY(ARRAY(select key1 from localtable
where n = N))
join localtable l
on l.key1 = r.id and l.n = N;

it run really well.

Unfortunately this is not something that postgreSQL considers by itself
while optimising.

BTW, this kind of optimisation should also be a win for really large IN
queries if we
could have an indexed IN whic would not start each lookup from the index
root, but
rather would sort the IN contents and do an index merge vis skipping
from current position.

Cheers

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2014-04-22 12:55:08 Re: Perfomance degradation 9.3 (vs 9.2) for FreeBSD
Previous Message Magnus Hagander 2014-04-22 12:22:05 Re: Perfomance degradation 9.3 (vs 9.2) for FreeBSD