Quick Links

Re: Bulk Insert and Index use

From:	Rudi Starcevic <tech(at)wildcash(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Bulk Insert and Index use
Date:	2004-08-11 03:33:33
Message-ID:	4119938D.6000609@wildcash.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Hi Jim,

Thanks for your time.

> If the bulk load has the possibility of duplicating data

Yes, each row will require either:

a) One SELECT + One INSERT
or
b) One SELECT + One UPDATE

I did think of using more than one table, ie. temp table.
As each month worth of data is added I expect to see
a change from lots of INSERTS to lots of UPDATES.

Perhaps when the UPDATES become more dominant it would
be best to start using Indexes.

While INSERTS are more prevelant perhaps a seq. scan is better.

I guess of all the options available it boils down to which
is quicker for my data: index or sequential scan.

Many thanks.

Jim J wrote:

> If the bulk load has the possibility of duplicating data, then you need
> to change methods. Try bulk loading into a temp table, index it like
> the original, eliminate the dups and merge the tables.
>
> It is also possible to do an insert from the temp table into the final
> table like:
> insert into original (x,x,x) (select temp.1, temp.2, etc from temp left
> join original on temp.street=original.street where original.street is null)
>
> Good Luck
> Jim
>
> Rudi Starcevic wrote:
>
>> Hi,
>>
>> I have a question on bulk checking, inserting into a table and
>> how best to use an index for performance.
>>
>> The data I have to work with is a monthly CD Rom csv data dump of
>> 300,000 property owners from one area/shire.
>>
>> So every CD has 300,000 odd lines, each line of data which fills the
>> 'property' table.
>>
>> Beginning with the first CD each line should require one SELECT and
>> one INSERT as it will be the first property with this address.
>>
>> The SELECT uses fields like 'street' and 'suburb', to check for an
>> existing property,
>> so I have built an index on those fields.
>>
>> My question is does each INSERT rebuild the index on the 'street' and
>> 'suburb' fields?
>> I believe it does but I'm asking to be sure.
>>
>> If this is the case I guess performance will suffer when I have, say,
>> 200,000
>> rows in the table.
>>
>> Would it be like:
>>
>> a) Use index to search on 'street' and 'suburb'
>> b) No result? Insert new record
>> c) Rebuild index on 'street' and 'suburb'
>>
>> for each row?
>> Would this mean that after 200,000 rows each INSERT will require
>> the index of 000's of rows to be re-indexed?
>>
>> So far I believe my only options are to use either and index
>> or sequential scan and see which is faster.
>>
>> A minute for your thoughts and/or suggestions would be great.
>>
>> Thanks.
>> Regards,
>> Rudi.
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 4: Don't 'kill -9' the postmaster
>>
>>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faqs/FAQ.html
>
>

Regards,
Rudi.

Internet Media Productions

In response to

Re: Bulk Insert and Index use at 2004-08-11 02:31:48 from Jim J

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Gregory S. Williamson	2004-08-11 03:53:11	Re: Bulk Insert and Index use
Previous Message	Jim J	2004-08-11 02:31:48	Re: Bulk Insert and Index use