From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | Jeff Davis <pgsql(at)j-davis(dot)com> |
Cc: | Daniel Farina <drfarina(at)gmail(dot)com>, Hannu Krosing <hannu(at)krosing(dot)net>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <dfarina(at)truviso(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION |
Date: | 2009-11-25 10:32:17 |
Message-ID: | 162867790911250232u75985e29mba6310358d4b2911@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
2009/11/25 Jeff Davis <pgsql(at)j-davis(dot)com>:
> On Wed, 2009-11-25 at 09:23 +0100, Pavel Stehule wrote:
>> > If SRFs use a tuplestore in that situation, it sounds like that should
>> > be fixed. Why do we need to provide alternate syntax involving COPY?
>>
>> It isn't problem of SRF function design. It allow both mode - row and
>> tuplestor.
>
> select * from generate_series(1,1000000000) limit 1;
>
> That statement takes a long time, which indicates to me that it's
> materializing the result of the SRF. And there's no insert there.
This is missing optimalisation. If I understand - PostgreSQL wait for
complete result set - so in this case materialisation is necessary. In
your query pg do materialisation too early.
postgres=# select * from generate_series(1,100000) limit 1;
generate_series
─────────────────
1
(1 row)
Time: 59,540 ms
postgres=# select generate_series(1,100000) limit 1;
generate_series
─────────────────
1
(1 row)
Time: 1,107 ms
But usually we can process all rows from SRF function - so problem
with LIMIT isn't significant.
I am testing:
1.
postgres=# select count(*) from generate_series(1,1000000);
count
─────────
1000000
(1 row)
Time: 930,720 ms
2.
postgres=# select count(*) from (select generate_series(1,1000000)) x;
count
─────────
1000000
(1 row)
Time: 276,511 ms
2. is significantly faster then 1 (there are not SRF materialisation)
postgres=# create table foo(a integer);
CREATE TABLE
postgres=# insert into foo select generate_series(1,1000000);
INSERT 0 1000000
Time: 4274,869 ms
postgres=# insert into foo select * from generate_series(1,1000000);
INSERT 0 1000000
Time: 4814,756 ms
postgres=# copy foo to '/tmp/xxx';
COPY 1000000
Time: 1033,078 ms
postgres=# set synchronous_commit to off;
SET
postgres=# copy foo from '/tmp/xxx';
COPY 1000000
Time: 2749,277 ms
postgres=# insert into foo select generate_series(1,1000000);
INSERT 0 1000000
Time: 3948,734 ms
generate_function is fast and simple - but still COPY is about 30% faster
>
>> This is problem of INSERT statement, resp. INSERT INTO
>> SELECT implementation.
>
> If "tmp" is a new table, and "zero" is a table with a million zeros in
> it, then:
> insert into tmp select 1/i from zero;
> fails instantly. That tells me that it's not materializing the result of
> the select; rather, it's feeding the rows in one at a time.
>
I thing, so materialisation is every time, when you use any SQL
statement without cursor.
> Can show me in more detail what you mean? I'm having difficulty
> understanding your short replies.
I thing, so COPY tab from/to fce() should be used for very large
import export, where INSERT SELECT needs minimally one
materialisation.
p.s. I am sorry - I am not native speaker - so I am speaking in short replies.
Pavel
>
> Regards,
> Jeff Davis
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Heikki Linnakangas | 2009-11-25 11:00:30 | Hot Standby remaining issues |
Previous Message | Hannu Krosing | 2009-11-25 10:26:39 | Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION |