Re: Performance problems testing with Spamassassin 3.1.0

From: Matthew Schumacher <matt(dot)s(at)aptalaska(dot)net>
To: josh(at)agliodbs(dot)com
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance problems testing with Spamassassin 3.1.0
Date: 2005-07-28 01:59:37
Message-ID: 42E83C09.2040508@aptalaska.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Josh Berkus wrote:
> Matt,
>
>
>>After playing with various indexes and what not I simply am unable to
>>make this procedure perform any better. Perhaps someone on the list can
>>spot the bottleneck and reveal why this procedure isn't performing that
>>well or ways to make it better.
>
>
> Well, my first thought is that this is a pretty complicated procedure for
> something you want to peform well. Is all this logic really necessary?
> How does it get done for MySQL?
>

I'm not sure if it's all needed, in mysql they have this simple schema:

===============================================
CREATE TABLE bayes_expire (
id int(11) NOT NULL default '0',
runtime int(11) NOT NULL default '0',
KEY bayes_expire_idx1 (id)
) TYPE=MyISAM;

CREATE TABLE bayes_global_vars (
variable varchar(30) NOT NULL default '',
value varchar(200) NOT NULL default '',
PRIMARY KEY (variable)
) TYPE=MyISAM;

INSERT INTO bayes_global_vars VALUES ('VERSION','3');

CREATE TABLE bayes_seen (
id int(11) NOT NULL default '0',
msgid varchar(200) binary NOT NULL default '',
flag char(1) NOT NULL default '',
PRIMARY KEY (id,msgid)
) TYPE=MyISAM;

CREATE TABLE bayes_token (
id int(11) NOT NULL default '0',
token char(5) NOT NULL default '',
spam_count int(11) NOT NULL default '0',
ham_count int(11) NOT NULL default '0',
atime int(11) NOT NULL default '0',
PRIMARY KEY (id, token),
INDEX bayes_token_idx1 (token),
INDEX bayes_token_idx2 (id, atime)
) TYPE=MyISAM;

CREATE TABLE bayes_vars (
id int(11) NOT NULL AUTO_INCREMENT,
username varchar(200) NOT NULL default '',
spam_count int(11) NOT NULL default '0',
ham_count int(11) NOT NULL default '0',
token_count int(11) NOT NULL default '0',
last_expire int(11) NOT NULL default '0',
last_atime_delta int(11) NOT NULL default '0',
last_expire_reduce int(11) NOT NULL default '0',
oldest_token_age int(11) NOT NULL default '2147483647',
newest_token_age int(11) NOT NULL default '0',
PRIMARY KEY (id),
UNIQUE bayes_vars_idx1 (username)
) TYPE=MyISAM;
===============================================

Then they do this to insert the token:

INSERT INTO bayes_token (
id,
token,
spam_count,
ham_count,
atime
) VALUES (
?,
?,
?,
?,
?
) ON DUPLICATE KEY
UPDATE
spam_count = GREATEST(spam_count + ?, 0),
ham_count = GREATEST(ham_count + ?, 0),
atime = GREATEST(atime, ?)

Or update the token:

UPDATE bayes_vars SET
$token_count_update
newest_token_age = GREATEST(newest_token_age, ?),
oldest_token_age = LEAST(oldest_token_age, ?)
WHERE id = ?

I think the reason why the procedure was written for postgres was
because of the greatest and least statements performing poorly.

Honestly, I'm not real up on writing procs, I was hoping the problem
would be obvious to someone.

schu

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Josh Berkus 2005-07-28 02:12:39 Re: Performance problems testing with Spamassassin 3.1.0 Bayes module.
Previous Message Tom Lane 2005-07-28 00:41:05 Re: [Bizgres-general] Re: faster INSERT with possible