From: | Rural Hunter <ruralhunter(at)gmail(dot)com> |
---|---|
To: | pgsql-admin(at)postgresql(dot)org |
Cc: | Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Julio Leyva <jcleyva(at)hotmail(dot)com> |
Subject: | Re: plsql gets "out of memory" |
Date: | 2011-09-03 03:30:26 |
Message-ID: | 4E619F52.7050104@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin |
Hi Kevin,
I did another try with following additional changes based on our discussion:
1. use the tcp connection
2. turn off autovacuum
3. turn off full_page_writes
I could import more than 30G data in about 2 hours. That's totally
acceptable performance to me with the current server capability. There
is a minor issue though. I saw a few errors during the import:
ERROR: invalid byte sequence for encoding "UTF8": 0xe6272c
ERROR: invalid byte sequence for encoding "UTF8": 0xe5272c
ERROR: invalid byte sequence for encoding "UTF8": 0xe5272c
ERROR: invalid byte sequence for encoding "UTF8": 0xe5272c
ERROR: invalid byte sequence for encoding "UTF8": 0xe68e27
ERROR: invalid byte sequence for encoding "UTF8": 0xe7272c
ERROR: invalid byte sequence for encoding "UTF8": 0xe5272c
ERROR: invalid byte sequence for encoding "UTF8": 0xe5a427
My data was exported from an UTF8 MySQL database and my pgsql db is also
UTF8. I got 8 errors above only with about 3 million records imported.
The strange thing is, I usually see the problematic SQL output in the
log if there is any error for that SQL so I have a chance to fix the
data manually. But for the errors above, I don't see any SQL logged. The
pgsql log just output error log same as above with no additional info:
2011-09-01 11:26:32 CST ERROR: invalid byte sequence for encoding
"UTF8": 0xe6272c
2011-09-01 11:26:47 CST ERROR: invalid byte sequence for encoding
"UTF8": 0xe5272c
2011-09-01 11:26:53 CST ERROR: invalid byte sequence for encoding
"UTF8": 0xe5272c
2011-09-01 11:26:58 CST ERROR: invalid byte sequence for encoding
"UTF8": 0xe5272c
2011-09-01 11:26:58 CST ERROR: invalid byte sequence for encoding
"UTF8": 0xe68e27
2011-09-01 11:27:01 CST ERROR: invalid byte sequence for encoding
"UTF8": 0xe7272c
2011-09-01 11:27:06 CST ERROR: invalid byte sequence for encoding
"UTF8": 0xe5272c
2011-09-01 11:27:15 CST ERROR: invalid byte sequence for encoding
"UTF8": 0xe5a427
What could be the cause of that?
于 2011/8/30 9:29, Rural Hunter 写道:
> Thank you. I didn't understand what 'vacuum freeze' actually does. I
> will check the detail to see if it's good for my situation. and I will
> also test the load by tcp connection. Thanks again for all your
> advices and they are really very helpful to me!
>
> 于 2011/8/30 0:06, Kevin Grittner 写道:
>> Rural Hunter<ruralhunter(at)gmail(dot)com> wrote:
>>> 2011/8/29 23:18, Kevin Grittner:
>>
>>>> I also recommend a VACUUM FREEZE ANALYZE on the database unless
>>>> most of these rows will be deleted or updated before you run a
>>>> billion database transactions. Otherwise you will get a painful
>>>> "anti-wraparound" autovacuum on everything, probably at a time
>>>> of heavy usage.
>>> hmm....I will try to turn autovacuum off though I didn't see any
>>> resource intension caused by it.
>>
>> Well, turning off autovacuum during a bulk load is probably a net
>> gain if it's insert-only (i.e., no need to query just-loaded data to
>> decide what to do with new rows); but that's not what I was getting
>> at. Bulk loading 200 GB of data which is not going to be deleted or
>> updated heavily is setting a performance time bomb without a VACUUM
>> FREEZE. At some point, perhaps months later, it will be necessary
>> to freeze the tuples to prevent data loss, and since this occurs
>> based on a threshold of how many transaction IDs have been consumed,
>> it is most likely to happen at peak OLTP loads, when it will be the
>> biggest problem. A VACUUM FREEZE (and you might as well throw in
>> ANALYZE while you're at it) will take care of that up front. As a
>> side benefit it will keep SELECT statements from generating heavy
>> *write* loads on the first access to tuples, and will perform other
>> maintenance which will improve database performance.
>>
>> I just count the time for VACUUM FREEZE ANALYZE as part of the bulk
>> load time required before letting in users.
>>
>>>> Network latency?
>>
>>> No, I do the import locally on the db server so the network
>>> letency can be excluded.
>>
>> Hmm... I don't remember the details, but there was a problem at
>> some point where Linux pipe connections could introduce significant
>> latency, and you could get much better performance on a TCP
>> connection through localhost. It might be worth a try. (Maybe
>> someone else will remember the details.)
>>
>> -Kevin
>>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Scott Marlowe | 2011-09-03 04:02:35 | Re: plsql gets "out of memory" |
Previous Message | Tom Lane | 2011-09-02 04:34:04 | Re: Alter column varchar(n) via updating pg_attribute |