From: | Matthew Schumacher <matt(dot)s(at)aptalaska(dot)net> |
---|---|
To: | josh(at)agliodbs(dot)com |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Performance problems testing with Spamassassin 3.1.0 |
Date: | 2005-07-28 01:59:37 |
Message-ID: | 42E83C09.2040508@aptalaska.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Josh Berkus wrote:
> Matt,
>
>
>>After playing with various indexes and what not I simply am unable to
>>make this procedure perform any better. Perhaps someone on the list can
>>spot the bottleneck and reveal why this procedure isn't performing that
>>well or ways to make it better.
>
>
> Well, my first thought is that this is a pretty complicated procedure for
> something you want to peform well. Is all this logic really necessary?
> How does it get done for MySQL?
>
I'm not sure if it's all needed, in mysql they have this simple schema:
===============================================
CREATE TABLE bayes_expire (
id int(11) NOT NULL default '0',
runtime int(11) NOT NULL default '0',
KEY bayes_expire_idx1 (id)
) TYPE=MyISAM;
CREATE TABLE bayes_global_vars (
variable varchar(30) NOT NULL default '',
value varchar(200) NOT NULL default '',
PRIMARY KEY (variable)
) TYPE=MyISAM;
INSERT INTO bayes_global_vars VALUES ('VERSION','3');
CREATE TABLE bayes_seen (
id int(11) NOT NULL default '0',
msgid varchar(200) binary NOT NULL default '',
flag char(1) NOT NULL default '',
PRIMARY KEY (id,msgid)
) TYPE=MyISAM;
CREATE TABLE bayes_token (
id int(11) NOT NULL default '0',
token char(5) NOT NULL default '',
spam_count int(11) NOT NULL default '0',
ham_count int(11) NOT NULL default '0',
atime int(11) NOT NULL default '0',
PRIMARY KEY (id, token),
INDEX bayes_token_idx1 (token),
INDEX bayes_token_idx2 (id, atime)
) TYPE=MyISAM;
CREATE TABLE bayes_vars (
id int(11) NOT NULL AUTO_INCREMENT,
username varchar(200) NOT NULL default '',
spam_count int(11) NOT NULL default '0',
ham_count int(11) NOT NULL default '0',
token_count int(11) NOT NULL default '0',
last_expire int(11) NOT NULL default '0',
last_atime_delta int(11) NOT NULL default '0',
last_expire_reduce int(11) NOT NULL default '0',
oldest_token_age int(11) NOT NULL default '2147483647',
newest_token_age int(11) NOT NULL default '0',
PRIMARY KEY (id),
UNIQUE bayes_vars_idx1 (username)
) TYPE=MyISAM;
===============================================
Then they do this to insert the token:
INSERT INTO bayes_token (
id,
token,
spam_count,
ham_count,
atime
) VALUES (
?,
?,
?,
?,
?
) ON DUPLICATE KEY
UPDATE
spam_count = GREATEST(spam_count + ?, 0),
ham_count = GREATEST(ham_count + ?, 0),
atime = GREATEST(atime, ?)
Or update the token:
UPDATE bayes_vars SET
$token_count_update
newest_token_age = GREATEST(newest_token_age, ?),
oldest_token_age = LEAST(oldest_token_age, ?)
WHERE id = ?
I think the reason why the procedure was written for postgres was
because of the greatest and least statements performing poorly.
Honestly, I'm not real up on writing procs, I was hoping the problem
would be obvious to someone.
schu
From | Date | Subject | |
---|---|---|---|
Next Message | Josh Berkus | 2005-07-28 02:12:39 | Re: Performance problems testing with Spamassassin 3.1.0 Bayes module. |
Previous Message | Tom Lane | 2005-07-28 00:41:05 | Re: [Bizgres-general] Re: faster INSERT with possible |