Re: Indices and Foreign Tables

From: "Kohler Manuel (ID SIS)" <manuel(dot)kohler(at)id(dot)ethz(dot)ch>
To: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Cc: "Kohler Manuel (ID SIS)" <manuel(dot)kohler(at)id(dot)ethz(dot)ch>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Indices and Foreign Tables
Date: 2014-04-04 15:14:44
Message-ID: 2C3C059B-73D0-492A-AAF5-8E4F00052F9E@bsse.ethz.ch
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,
here are the Query plans. The first plan is on the source database directly. So no fdw involved:

source_db=# EXPLAIN ANALYZE select ds.code, count(*), sum(dsf.size_in_bytes) as "raw_size",pg_size_pretty(sum(dsf.size_in_bytes)) as "size" from data_set_files dsf, data_sets ds where dsf.parent_id is null and dsf.dase_id=ds.id group by ds.code order by raw_size desc;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=341248.80..341281.56 rows=13103 width=34) (actual time=695.519..724.286 rows=13839 loops=1)
Sort Key: (sum(dsf.size_in_bytes))
Sort Method: quicksort Memory: 2283kB
-> HashAggregate (cost=340188.93..340352.71 rows=13103 width=34) (actual time=536.229..615.115 rows=13839 loops=1)
-> Nested Loop (cost=0.56..340057.90 rows=13103 width=34) (actual time=0.104..456.139 rows=13839 loops=1)
-> Seq Scan on data_sets ds (cost=0.00..385.39 rows=13839 width=34) (actual time=0.022..40.113 rows=13839 loops=1)
-> Index Scan using data_set_files_dase_id_parent_id_idx on data_set_files dsf (cost=0.56..24.43 rows=11 width=16) (actual time=0.015..0.019 rows=1 loops=13839)
Index Cond: (((dase_id)::bigint = ds.id) AND (parent_id IS NULL))
Total runtime: 752.695 ms
(9 rows)

Here is the same query with fdw:

db=# EXPLAIN VERBOSE select ds.code, count(*), sum(dsf.size_in_bytes) as "raw_size",pg_size_pretty(sum(dsf.size_in_bytes)) as "size" from data_set_files_fdw dsf, data_sets_fdw ds where dsf.parent_id is null and dsf.dase_id=ds.id group by ds.code order by raw_size desc;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Sort (cost=327.81..327.97 rows=64 width=40)
Output: ds.code, (count(*)), (sum(dsf.size_in_bytes)), (pg_size_pretty(sum(dsf.size_in_bytes)))
Sort Key: (sum(dsf.size_in_bytes))
-> HashAggregate (cost=325.09..325.89 rows=64 width=40)
Output: ds.code, count(*), sum(dsf.size_in_bytes), pg_size_pretty(sum(dsf.size_in_bytes))
-> Hash Join (cost=270.61..324.45 rows=64 width=40)
Output: dsf.size_in_bytes, ds.code
Hash Cond: (ds.id = (dsf.dase_id)::bigint)
-> Foreign Scan on public.data_sets_fdw ds (cost=100.00..148.40 rows=1280 width=40)
Output: ds.id, ds.code, ds.location
Remote SQL: SELECT id, code FROM public.data_sets
-> Hash (cost=170.48..170.48 rows=10 width=16)
Output: dsf.size_in_bytes, dsf.dase_id
-> Foreign Scan on public.data_set_files_fdw dsf (cost=100.00..170.48 rows=10 width=16)
Output: dsf.size_in_bytes, dsf.dase_id
Filter: (dsf.parent_id IS NULL)
Remote SQL: SELECT dase_id, parent_id, size_in_bytes FROM public.data_set_files
(17 rows)

And also with ANALYZE:

db=# EXPLAIN ANALYZE select ds.code, count(*), sum(dsf.size_in_bytes) as "raw_size",pg_size_pretty(sum(dsf.size_in_bytes)) as "size" from data_set_files_fdw dsf, data_sets_fdw ds where dsf.parent_id is null and dsf.dase_id=ds.id group by ds.code order by raw_size desc;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=327.81..327.97 rows=64 width=40) (actual time=219401.864..219414.641 rows=13839 loops=1)
Sort Key: (sum(dsf.size_in_bytes))
Sort Method: quicksort Memory: 2283kB
-> HashAggregate (cost=325.09..325.89 rows=64 width=40) (actual time=219327.664..219363.709 rows=13839 loops=1)
-> Hash Join (cost=270.61..324.45 rows=64 width=40) (actual time=219127.848..219277.308 rows=13839 loops=1)
Hash Cond: (ds.id = (dsf.dase_id)::bigint)
-> Foreign Scan on data_sets_fdw ds (cost=100.00..148.40 rows=1280 width=40) (actual time=1.057..77.415 rows=13839 loops=1)
-> Hash (cost=170.48..170.48 rows=10 width=16) (actual time=219126.713..219126.713 rows=13839 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 649kB
-> Foreign Scan on data_set_files_fdw dsf (cost=100.00..170.48 rows=10 width=16) (actual time=1082.614..219083.326 rows=13839 loops=1)
Filter: (parent_id IS NULL)
Rows Removed by Filter: 35726596
Total runtime: 219438.925 ms
(13 rows)

Regards
Manuel

--
Kohler Manuel (ID SIS)
Research Informatics, Scientific IT Services (ID ETHZ)
Quantitative Genomics Facility (QGF), D-BSSE
ETH Zurich, Mattenstr. 26 (1078 1.02), CH-4058 Basel, +41 61 387 3132

On 04 Apr 2014, at 17:01, Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at> wrote:

> Kohler Manuel wrote:
>> I have a question regarding the use of indices when querying foreign data tables (using postgres_fdw
>> of 9.3 to another postgres DB).
>> Everything works fine beside the fact that the indices which are defined in the foreign DB are not
>> used at all when I do query it through the foreign data wrapper. This leads to an extremely slow query
>> performance.
>>
>> Here is what I did:
>>
>> CREATE EXTENSION postgres_fdw;
>>
>> CREATE SERVER app_db
>> FOREIGN DATA WRAPPER postgres_fdw
>> OPTIONS (dbname ‘<DBNAME>', host 'localhost');
>>
>> CREATE USER MAPPING for openbis
>> SERVER app_db
>> OPTIONS (user ‘<USRE>', password ‘<PW>');
>>
>> CREATE FOREIGN TABLE data_sets_fdw
>> (
>> id bigint,
>> code code,
>> location file_path
>> )
>> SERVER app_db OPTIONS (table_name 'data_sets’);
>>
>> The data_sets_fdw corresponds to the table in the other DB which is defined as this:
>>
>> db=> \d data_sets
>> Table "public.data_sets"
>> Column | Type | Modifiers
>> ----------+-----------+--------------------------------------------------------
>> id | bigint | not null default nextval('data_sets_id_seq'::regclass)
>> code | code | not null
>> location | file_path | not null
>> Indexes:
>> "data_sets_pkey" PRIMARY KEY, btree (id)
>> "data_sets_code_key" UNIQUE CONSTRAINT, btree (code)
>> "data_sets_code_idx" btree (code)
>>
>> When I realised that the queries are so slow I ran an EXPLAIN ANALYZE which shows that the indices
>> are not used. I can also post the query plan if it helps.
>>
>> So am I doing something wrong here, or is this not possible that the indices are used by the fdw?
>
> The indices should be used.
>
> Did you ANALYZE the remote table on the remote database?
> What is the remote query (EXPLAIN VERBOSE)?
> What do you get for EXPLAIN ANALYZE of the remote query when executed on the remote database?
>
> Yours,
> Laurenz Albe

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Oleg Bartunov 2014-04-04 15:15:29 Re: hstore - jsonb
Previous Message Albe Laurenz 2014-04-04 15:01:44 Re: Indices and Foreign Tables