From: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
---|---|
To: | Craig Gibson <craiggib(at)gmail(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Performance issue with cross table updates |
Date: | 2012-09-12 04:01:20 |
Message-ID: | CAMkU=1zLUykQs9FuJhLOg6wWQNm8LRJgU-c-mjzhd1CZvLE-jw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Mon, Sep 10, 2012 at 9:34 AM, Craig Gibson <craiggib(at)gmail(dot)com> wrote:
> Hi all
>
> I am no database wizard so I am hoping someone may be able to assist me :)
>
> I get a daily CSV file of 6.5 million records. I create a temporary
> table and COPY them in. On completion I create an index on the mdnid
> column. This column is also indexed in table 2. This part is very
> fast. I had some 'checkpoint too often' issues, but that I have
> resolved.
>
> I then use the following procedure to update all the records, and if a
> record does not exist, insert it instead:
>
...
>
> From my understanding, a for loop is encapsulated in a cursor anyway
> so no need to do that. Am I fundamentally doing something wrong as the
> operation is slow as molasses?
How big is the parent table? Are you CPU limited or IO limited?
If you are not CPU limited, then I would guess that the indexes on the
parent table do not fit in RAM or shared_buffers and that maintaining
the indexes on the parent table during the updates/inserts is the
bottleneck.
> Maybe there is a better way altogether
> that I have not thought of? The bottom line is that at no point can
> the e_lookup table be unavailable to clients,
It always has to be available for updates, or just for selects?
> else I would have just
> done a simple drop and rename post the COPY.
Maybe you can just do an atomic rename.
Cheers,
Jeff
From | Date | Subject | |
---|---|---|---|
Next Message | Kenaniah Cerny | 2012-09-12 06:41:16 | CentOS initd Script |
Previous Message | Edson Richter | 2012-09-12 03:37:18 | Re: Compressed binary field |