Re: performance problem on big tables

From: Mariel Cherkassky <mariel(dot)cherkassky(at)gmail(dot)com>
To: Daniel Blanch Bataller <daniel(dot)blanch(dot)bataller(at)gmail(dot)com>
Cc: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: performance problem on big tables
Date: 2017-08-17 09:00:18
Message-ID: CA+t6e1kijisMi3cjNy3HEEvDMMH_u9WkzS3ViQGZ9cPTZ3YtWQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I checked with the storage team in the company and they saw that I have
alot of io on the server. How should I reduce the io that the postgresql
uses ?

2017-08-17 9:25 GMT+03:00 Mariel Cherkassky <mariel(dot)cherkassky(at)gmail(dot)com>:

> Hi Daniel,
> I already tried to set the destination table to unlogged - it improved the
> performance slightly. Is there a way to make sure that I/O is the problem ?
>
> 2017-08-17 0:46 GMT+03:00 Daniel Blanch Bataller <
> daniel(dot)blanch(dot)bataller(at)gmail(dot)com>:
>
>> Seems your disks are too slow. On my laptop (nothing special, just one
>> disk) using COPY I can dump 3G in ~ 20 secs, loading takes 120 secs, bare
>> copying 3G takes 10 secs.
>>
>> Similar proportion you had, but much faster.
>>
>> confirm I/O is your bottleneck, and tell us how you solved your problem
>>
>> Anyway, You can cut import time by half if you set your destination table
>> to unlogged (postgres will write half the data, it will save the
>> transaction log writing). Remember to set it to logged when finished!!
>>
>>
>> Regards,
>>
>> Daniel
>>
>> El 16 ago 2017, a las 16:32, Mariel Cherkassky <
>> mariel(dot)cherkassky(at)gmail(dot)com> escribió:
>>
>> My server is virtual and it have virtual hd from a vnx storage machine.
>> The logs and the data are on the same disk.
>>
>> 2017-08-16 17:04 GMT+03:00 Daniel Blanch Bataller <
>> daniel(dot)blanch(dot)bataller(at)gmail(dot)com>:
>>
>>> Considering it has to write logs and data at checkpoints I don’t see it
>>> particularly slow compared to the extract phase. What kind of disks you
>>> have SSD or regular disks? Different disks for ltransaction logs and data?
>>>
>>>
>>> El 16 ago 2017, a las 15:54, Mariel Cherkassky <
>>> mariel(dot)cherkassky(at)gmail(dot)com> escribió:
>>>
>>> I run the copy command via psql to create a local dump of a 3G table and
>>> it took me 134059.732ms =~2 minutes. After that I imported the data via
>>> copy and it took 458648.677ms =~7 minutes. So the copy command works but
>>> pretty slow.
>>>
>>> 2017-08-16 16:08 GMT+03:00 Daniel Blanch Bataller <
>>> daniel(dot)blanch(dot)bataller(at)gmail(dot)com>:
>>>
>>>> See if the copy command is actually working, copy should be very fast
>>>> from your local disk.
>>>>
>>>>
>>>> El 16 ago 2017, a las 14:26, Mariel Cherkassky <
>>>> mariel(dot)cherkassky(at)gmail(dot)com> escribió:
>>>>
>>>>
>>>> After all the changes of the memory parameters the same
>>>> operation(without the copy utility) didnt run much faster - it took one
>>>> minute less. I made a test with the copy command (without the 'with
>>>> binary') and it took 1.5 hours to create the dumpfile in my local
>>>> postgresql server. Then I tried to run the copy from the local dump and it
>>>> is already running two hours and it didnt even finish. I looked at the
>>>> server log and I saw that I run the copy command at 13:18:05, 3 minutes
>>>> later checkpoint started and completed and there are no messages in the log
>>>> after that. What can I do ? Improving the memory parameters and the memory
>>>> on the server didnt help and for now the copy command doesnt help either.
>>>>
>>>>
>>>>
>>>>
>>>> 2017-08-15 20:14 GMT+03:00 Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>:
>>>>
>>>>> On Tue, Aug 15, 2017 at 4:06 AM, Mariel Cherkassky
>>>>> <mariel(dot)cherkassky(at)gmail(dot)com> wrote:
>>>>> > Hi,
>>>>> > So I I run the cheks that jeff mentioned :
>>>>> > \copy (select * from oracle_remote_table) to /tmp/tmp with binary -
>>>>> 1 hour
>>>>> > and 35 minutes
>>>>>
>>>>> So 26G takes 95 minutes, or 27 MB/minute or 456k/second? Sound about
>>>>> right (it's early, I haven't had enough coffee please check my math).
>>>>> That's pretty slow unless you're working across pretty big distances
>>>>> with mediocre connections. My home internet downloads about 100MB/s
>>>>> by comparison.
>>>>>
>>>>> > \copy local_postresql_table from /tmp/tmp with binary - Didnt run
>>>>> because
>>>>> > the remote oracle database is currently under maintenance work.
>>>>>
>>>>> You shouldn't need the remote oracle server if you've already copied
>>>>> it over, you're just copying from local disk into the local pgsql db.
>>>>> Unless I'm missing something.
>>>>>
>>>>> > So I decided to follow MichaelDBA tips and I set the ram on my
>>>>> machine to
>>>>> > 16G and I configured the effective_cache memory to
>>>>> 14G,tshared_buffer to be
>>>>> > 2G and maintenance_work_mem to 4G.
>>>>>
>>>>> Good settings. Maybe set work_mem to 128MB or so while you're at it.
>>>>>
>>>>> > I started running the copy checks again and for now it coppied 5G in
>>>>> 10
>>>>> > minutes. I have some questions :
>>>>> > 1)When I run insert into local_postresql_table select * from
>>>>> > remote_oracle_table I insert that data as bulk to the local table or
>>>>> row by
>>>>> > row ? If the answer as bulk than why copy is a better option for
>>>>> this case
>>>>> > ?
>>>>>
>>>>> insert into select from oracle remote is one big copy, but it will
>>>>> take at least as long as copying from oracle to the local network
>>>>> took. Compare that to the same thing but use file_fdw on the file
>>>>> locally.
>>>>>
>>>>> > 2)The copy from dump into the postgresql database should take less
>>>>> time than
>>>>> > the copy to dump ?
>>>>>
>>>>> Yes. The copy from Oracle to your local drive is painfully slow for a
>>>>> modern network connection.
>>>>>
>>>>> > 3)What do you think about the new memory parameters that I cofigured
>>>>> ?
>>>>>
>>>>> They should be OK. I'm more worried about the performance of the io
>>>>> subsystem tbh.
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Daniel Blanch Bataller 2017-08-17 10:06:39 Re: performance problem on big tables
Previous Message Mariel Cherkassky 2017-08-17 06:25:32 Re: performance problem on big tables