From: | Rudi Starcevic <tech(at)wildcash(dot)com> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Bulk Insert and Index use |
Date: | 2004-08-11 03:33:33 |
Message-ID: | 4119938D.6000609@wildcash.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hi Jim,
Thanks for your time.
> If the bulk load has the possibility of duplicating data
Yes, each row will require either:
a) One SELECT + One INSERT
or
b) One SELECT + One UPDATE
I did think of using more than one table, ie. temp table.
As each month worth of data is added I expect to see
a change from lots of INSERTS to lots of UPDATES.
Perhaps when the UPDATES become more dominant it would
be best to start using Indexes.
While INSERTS are more prevelant perhaps a seq. scan is better.
I guess of all the options available it boils down to which
is quicker for my data: index or sequential scan.
Many thanks.
Jim J wrote:
> If the bulk load has the possibility of duplicating data, then you need
> to change methods. Try bulk loading into a temp table, index it like
> the original, eliminate the dups and merge the tables.
>
> It is also possible to do an insert from the temp table into the final
> table like:
> insert into original (x,x,x) (select temp.1, temp.2, etc from temp left
> join original on temp.street=original.street where original.street is null)
>
> Good Luck
> Jim
>
> Rudi Starcevic wrote:
>
>> Hi,
>>
>> I have a question on bulk checking, inserting into a table and
>> how best to use an index for performance.
>>
>> The data I have to work with is a monthly CD Rom csv data dump of
>> 300,000 property owners from one area/shire.
>>
>> So every CD has 300,000 odd lines, each line of data which fills the
>> 'property' table.
>>
>> Beginning with the first CD each line should require one SELECT and
>> one INSERT as it will be the first property with this address.
>>
>> The SELECT uses fields like 'street' and 'suburb', to check for an
>> existing property,
>> so I have built an index on those fields.
>>
>> My question is does each INSERT rebuild the index on the 'street' and
>> 'suburb' fields?
>> I believe it does but I'm asking to be sure.
>>
>> If this is the case I guess performance will suffer when I have, say,
>> 200,000
>> rows in the table.
>>
>> Would it be like:
>>
>> a) Use index to search on 'street' and 'suburb'
>> b) No result? Insert new record
>> c) Rebuild index on 'street' and 'suburb'
>>
>> for each row?
>> Would this mean that after 200,000 rows each INSERT will require
>> the index of 000's of rows to be re-indexed?
>>
>> So far I believe my only options are to use either and index
>> or sequential scan and see which is faster.
>>
>> A minute for your thoughts and/or suggestions would be great.
>>
>> Thanks.
>> Regards,
>> Rudi.
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 4: Don't 'kill -9' the postmaster
>>
>>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faqs/FAQ.html
>
>
--
Regards,
Rudi.
Internet Media Productions
From | Date | Subject | |
---|---|---|---|
Next Message | Gregory S. Williamson | 2004-08-11 03:53:11 | Re: Bulk Insert and Index use |
Previous Message | Jim J | 2004-08-11 02:31:48 | Re: Bulk Insert and Index use |