From: | Andreas Pflug <pgadmin(at)pse-consulting(dot)de> |
---|---|
To: | "Jim C(dot) Nasby" <decibel(at)decibel(dot)org> |
Cc: | Matthew Schumacher <matt(dot)s(at)aptalaska(dot)net>, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Performance problems testing with Spamassassin 3.1.0 |
Date: | 2005-07-31 18:19:11 |
Message-ID: | 42ED161F.1020408@pse-consulting.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Jim C. Nasby wrote:
> On Sun, Jul 31, 2005 at 08:51:06AM -0800, Matthew Schumacher wrote:
>
>>Ok, here is the current plan.
>>
>>Change the spamassassin API to pass a hash of tokens into the storage
>>module, pass the tokens to the proc as an array, start a transaction,
>>load the tokens into a temp table using copy, select the tokens distinct
>>into the token table for new tokens, update the token table for known
>>tokens, then commit.
>
>
> You might consider:
> UPDATE tokens
> FROM temp_table (this updates existing records)
>
> INSERT INTO tokens
> SELECT ...
> FROM temp_table
> WHERE NOT IN (SELECT ... FROM tokens)
>
> This way you don't do an update to newly inserted tokens, which helps
> keep vacuuming needs in check.
The subselect might be quite a big set, so avoiding a full table scan
and materialization by
DELETE temp_table
WHERE key IN (select key FROM tokens JOIN temp_table);
INSERT INTO TOKENS SELECT * FROM temp_table;
or
INSERT INTO TOKENS
SELECT temp_table.* FROM temp_table LEFT JOIN tokens USING (key)
WHERE tokens.key IS NULL
might be an additional win, assuming that only a small fraction of
tokens is inserted and updated.
Regards,
Andreas
From | Date | Subject | |
---|---|---|---|
Next Message | William Yu | 2005-07-31 21:11:58 | Re: Performance problems on 4/8way Opteron (dualcore) |
Previous Message | Jim C. Nasby | 2005-07-31 17:10:12 | Re: Performance problems testing with Spamassassin 3.1.0 |