From: | Hannu Krosing <hannu(at)2ndQuadrant(dot)com> |
---|---|
To: | PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org> |
Cc: | shigeru(dot)hanada(at)gmail(dot)com, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: Question about optimising (Postgres_)FDW |
Date: | 2014-04-16 04:34:56 |
Message-ID: | 534E0870.3050309@2ndQuadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 04/16/2014 06:12 AM, Hannu Krosing wrote:
> On 04/16/2014 01:25 AM, Tom Lane wrote:
>> Hannu Krosing <hannu(at)2ndQuadrant(dot)com> writes:
>>> Is there a way to force it to prefer a plan where the results of (select
>>> id from onemillion where data > '0.9' limit 100)
>>> are passed to FDW as a single IN ( = ANY(...)) query and are retrieved
>>> all at once ?
>> You could write the query like that:
>>
>> select * from onemillion_pgfdw where id = any (array(select id from
>> onemillion where data > '0.9' limit 100));
> My actual use-case was about a join between a local and a remote table
> and without rewriting the query (they come from ORM)
>
> I was hoping to be able to nudge postgresql towards a better plan via some
> tuning of table/fdw options or GUCs.
>
> for example, would postgresql use the WHERE id IN (...) query on remote
> side for a query like
>
> select r.data, l.data
> from onemillion_pgfdw r
> join onemillion l
> on r.id = l.id and l.data > '0.999';
>
> if it recognizes that the local side returns only 1000 rows ?
>
> or would it still use 1000 individual WHERE id = $1 queries.
>
> Is getting the foreign data via IN and then turning the data into a hash
> for joining one of the plans it considers at all ?
It sees that could we need an extra tuning parameter for choosing the
ID IN (...) + HASH plan over individual SELECT .. WHERE ID = $1
something between
`fdw_startup_cost` and `fdw_tuple_cost`
to signify that an IN query returning 1000 rows runs faster than 1000 =
queries
as I understan currently they both would be estimated as
fdw_startup_cost + 1000 * fdw_tuple_cost
the new parameter could be fdw_call_cost or fdw_query_cost and would
estimate
how much each individual call to fdw costs, thus favouring calls which
return more
data in one call
Cheers
Hannu
>
> Best
> Hannu
>
>> Or at least you should be able to, except when I try it I get
>>
>> explain analyze
>> select * from onemillion_pgfdw where id = any (array(select id from
>> onemillion where data > '0.9' limit 100));
>> ERROR: operator does not exist: integer = integer[]
>> HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.
>> CONTEXT: Remote SQL command: EXPLAIN SELECT id, inserted, data FROM public.onemillion WHERE ((id = ANY ((SELECT null::integer[]))))
>>
>> so there's something the remote-estimate code is getting wrong here.
>> (It seems to work without remote_estimate, though.)
>>
>> regards, tom lane
>
--
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2014-04-16 04:44:09 | Re: Clock sweep not caching enough B-Tree leaf pages? |
Previous Message | Amit Kapila | 2014-04-16 04:27:10 | Re: Clock sweep not caching enough B-Tree leaf pages? |