Re: Performance problem with query

From: "Christian Rengstl" <Christian(dot)Rengstl(at)klinik(dot)uni-regensburg(dot)de>
To: "Q" <qdolan(at)gmail(dot)com>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: Performance problem with query
Date: 2006-07-19 10:49:50
Message-ID: 44BE2A67.0AD0.0080.0@klinik.uni-regensburg.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Obviously it had something to do with the rule, because now everything finished within 20 minutes. the problem is just that i don't really want to give up the inheritance design. is there a way to maintain the inheritance that doesn't cause this huge performance problem?

Q <qdolan(at)gmail(dot)com> wrote on 07/19/06 11:54 am:
> On 19/07/2006, at 6:32 PM, Christian Rengstl wrote:
>
>> The analyze is from the exact query and i dropped the indexes
>> before the insert as well without imrpvement. The target table is
>> as well completely empty and the insert is supposed to write, in
>> this case, more or less 8 million lines in the table. There is a
>> rule though, because i have inheritance table structure with one
>> master table and around 20 child tables.
>
> I would say the problem is in the rule. Try doing the insert into a
> duplicate table with no rules or inheritance and see how long it takes.
>
> Perhaps you should provide the actual schema of tables and rules that
> are involved in the query in question.
>
>> Q <qdolan(at)gmail(dot)com> wrote on 07/19/06 4:37 am:
>>>
>>> On 19/07/2006, at 4:24 AM, Christian Rengstl wrote:
>>>
>>>> now finally after a long time i have the query plan for the whole
>>>> filled table. I hope somebody can explain me why it takes so much
>>>> longer...
>>>
>>>
>>> These explain analyze results don't appear to be from the queries you
>>> posted previously. For these results to mean anything you need to
>>> include the EXACT queries you used to generate them.
>>>
>>>> Result (cost=0.06..254643.06 rows=8044000 width=39) (actual
>>>> time=0.056..655772
>>>> .273 rows=8044000 loops=1)
>>>> One-Time Filter: ((split_part(($1)::text, '_'::text,
>>>> 2))::smallint = 1)
>>>> InitPlan
>>>> -> Limit (cost=0.00..0.02 rows=1 width=10) (actual
>>>> time=0.003..0.003 rows
>>>> =1 loops=1)
>>>> -> Seq Scan on temp_table (cost=0.00..194313.00
>>>> rows=8044000 width=
>>>> 10) (actual time=0.002..0.002 rows=1 loops=1)
>>>> -> Limit (cost=0.00..0.02 rows=1 width=10) (actual
>>>> time=0.006..0.007 rows
>>>> =1 loops=1)
>>>> -> Seq Scan on temp_table (cost=0.00..194313.00
>>>> rows=8044000 width=
>>>> 10) (actual time=0.004..0.004 rows=1 loops=1)
>>>> -> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000
>>>> width=39) (act
>>>> ual time=0.002..191672.344 rows=8044000 loops=1)
>>>> Total runtime: 62259544.896 ms
>>>
>>>
>>> This is the query you want to be interested in, the others took no
>>> time at all.
>>>
>>> As a guess I would say the query is an INSERT INTO ... FROM
>>> SELECT ... WHERE (split_part(???, '_', 2))::smallint = 1 statement.
>>> The majority of the time appears to be taken up on the actual INSERT
>>> and not the SELECT part.
>>>
>>> How many rows are in the target table and what indexes does it have?
>>> Does it have any triggers, check constraints, or rules applied to it?
>>> All these things can make the insert take longer as the number of
>>> rows you have already in the table increases.
>>>
>>> More than likely you have a target table with a LOT of rows and a
>>> bunch of indexes on it and your disks are being thrashed because the
>>> indexes are not able to stay cached in RAM. At this point you should
>>> ensure your machine is not swapping do disk, and at the very least
>>> you should go through one of the many tuning guidelines available and
>>> ensure you have allocated the appropriate amount of memory to
>>> postgresql for your needs.
>>>
>>> You may also want to consider dropping the indexes before you do the
>>> INSERT and recreate them afterwards.
>>>
>>>
>>>
>>>>>>> "Christian Rengstl" <Christian(dot)Rengstl(at)klinik(dot)uni-regensburg(dot)de>
>>>>>>> 13.07.06 8.37 Uhr >>>
>>>> Good morning list,
>>>>
>>>> the following query takes about 15 to 20 minutes for around 2
>>>> million lines in the file myfile.txt, but with 8 million lines it
>>>> takes around 5 hours and i just don't understand why there is such
>>>> a huge discrepancy in performance.
>>>>
>>>> COPY public.temp_table FROM 'myfile.txt' WITH DELIMITER '\t';
>>>>
>>>> INSERT INTO public.master(pid,smid, val1, val2, chr)
>>>> SELECT pid, smid, val1, val12, CAST(split_part((SELECT
>>>> chr from public.temp_table LIMIT 1), '_', 2) as int2)
>>>> FROM public.temp_table;
>>>>
>>>> INSERT INTO public.values(smid, pos, chr)
>>>> SELECT DISTINCT smid, pos, CAST(split_part((SELECT chr from
>>>> public.temp_table LIMIT 1), '_', 2) as int2)
>>>> FROM public.temp_table;
>>>>
>>>> I came up with this query, because i wanted to use the COPY command
>>>> to load huge files into the db, but i don't want to load all the
>>>> columns contained in the file in only one table but copy some of
>>>> them into one table and some in a second table. As i found out with
>>>> logging, the data is loaded into temp_table within 15 minutes, but
>>>> to transfer it from the temp_table toagain only something like 10
>>>> minutes. Can it be that the cast takes up so much more time than
>>>> when reading and transferring 2 million lines?
>>>
>>>
>>> --
>>> Seeya...Q
>>>
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>>>
>>> _____ / Quinton Dolan - qdolan(at)gmail(dot)com
>>> __ __/ / / __/ / /
>>> / __ / _/ / / Gold Coast, QLD, Australia
>>> __/ __/ __/ ____/ / - / Ph: +61 419 729 806
>>> _______ /
>>> _\
>>>
>>>
>>>
>>>
>>> ---------------------------(end of
>>> broadcast)---------------------------
>>> TIP 2: Don't 'kill -9' the postmaster
>>
>>
>> --
>> Christian Rengstl M.A.
>> Klinik und Poliklinik für Innere Medizin II
>> Kardiologie - Forschung
>> Universitätsklinikum Regensburg
>> B3 1.388
>> Franz-Josef-Strauss-Allee 11
>> 93053 Regensburg
>> Tel.: +49-941-944-7230
>
>
> --
> Seeya...Q
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>
> _____ / Quinton Dolan - qdolan(at)gmail(dot)com
> __ __/ / / __/ / /
> / __ / _/ / / Gold Coast, QLD, Australia
> __/ __/ __/ ____/ / - / Ph: +61 419 729 806
> _______ /
> _\
>
>

--
Christian Rengstl M.A.
Klinik und Poliklinik für Innere Medizin II
Kardiologie - Forschung
Universitätsklinikum Regensburg
B3 1.388
Franz-Josef-Strauss-Allee 11
93053 Regensburg
Tel.: +49-941-944-7230

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Q 2006-07-19 11:52:20 Re: Performance problem with query
Previous Message Martijn van Oosterhout 2006-07-19 10:35:56 Re: UTF8 conversion differences from v8.1.3 to v8.1.4