From: | François Beausoleil <francois(at)teksol(dot)info> |
---|---|
To: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
Cc: | Forums postgresql <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Most efficient way to insert without duplicates |
Date: | 2013-04-17 20:19:46 |
Message-ID: | 6DAD4014-38AD-43E5-8D19-B36164E79DC9@teksol.info |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Le 2013-04-17 à 14:15, Jeff Janes a écrit :
> On Wed, Apr 17, 2013 at 4:26 AM, François Beausoleil <francois(at)teksol(dot)info> wrote:
>
>
> Insert on public.persona_followers (cost=139261.12..20483497.65 rows=6256498 width=16) (actual time=4729255.535..4729255.535 rows=0 loops=1)
> Buffers: shared hit=33135295 read=4776921
> -> Subquery Scan on t1 (cost=139261.12..20483497.65 rows=6256498 width=16) (actual time=562265.156..578844.999 rows=6819520 loops=1)
>
>
> It looks like 12% of the time is being spent figuring out what rows to insert, and 88% actually doing the insertions.
>
> So I think that index maintenance is killing you. You could try adding a sort to your select so that rows are inserted in index order, or inserting in batches in which the batches are partitioned by service_id (which is almost the same thing as sorting, since service_id is the lead column)
In that case, partitioning the original table by service_id % N would help, since the index would be much smaller, right?
N would have to be reasonable - 10, 100, 256, or something similar.
Thanks,
François
From | Date | Subject | |
---|---|---|---|
Next Message | Christopher Manning | 2013-04-17 20:44:20 | Single Row Mode in psql |
Previous Message | Scott Marlowe | 2013-04-17 19:53:18 | Re: How large can a PostgreSQL database get? |