Re: query speed joining tables

From: Tomasz Myrta <jasiek(at)klaster(dot)net>
To: Christopher Smith <christopherl_smith(at)yahoo(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: query speed joining tables
Date: 2003-01-13 22:01:39
Message-ID: 3E233743.2080100@klaster.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Christopher Smith wrote:

> I have 4 tables that I need to query... 3 of the tables are links by the
> field userid.
>
> below are table sql fragments. as well as the query. The subquery
> causes very high
>
> CPU usages. It typically returns ~3000 matches. Is there another way to
> rewrite this?
>
> SELECT user_login.userid FROM user_login,user_details_p,user_match_details
> WHERE user_login.userid = user_details_p.userid AND
> user_details_p.userid = user_match_details.userid AND
> user_details_p.gender ='W' AND
> user_details_p.seekgender ='M' AND
> user_details_p.age >=18 AND
> user_details_p.age <=50 AND
> user_match_details.min_age <= 30 AND
> user_match_details.max_age >= 30 AND
> user_details_p.ethnictype = 'Caucasian (White)' AND
> strpos(user_match_details.ethnicity,'Asian') !=0 AND
> user_details_p.zipcode in (select zips_max.destination from zips_max
> where zips_max.origin='90210' )
> order by user_login.last_login desc;

explicit joins show better idea of your query and helps postgres
choosing indexing.

select userid
from
user_login
join user_details using (userid)
join user_match_details using (userid)
where
user_details_p.gender ='W' AND
user_details_p.seekgender ='M' AND
user_details_p.age between 18 and 50 and
user_match_details.min_age <= 30 AND
user_match_details.max_age >= 30 AND
user_details_p.ethnictype = 'Caucasian (White)' AND
strpos(user_match_details.ethnicity,'Asian') !=0 AND
user_details_p.zipcode in (select zips_max.destination from zips_max
where zips_max.origin='90210' )
order by user_login.last_login desc;

How can I help you with subquery if you didn't write even zips_max
definition?!?

If origin is unique value in that table, you can change subquery into
join on "from" list.

> Table "public.user_login"
> Column | Type | Modifiers
> ------------+--------------------------+---------------
> userid | character varying(30) | not null
> password | character varying(30) | not null
> email | character varying(50) | not null
> last_login | timestamp with time zone | not null
> Indexes: user_login_pkey primary key btree (userid),

Do you really need userid as varchar?
indexing on int4 or int8 would be much faster than varchar

Why do you have 3 tables? It looks like only one table would be enough.
Remember, that null values don't take too much space.

>
>
>
>
> Table "public.user_details_p"
> Column | Type | Modifiers
> -----------------+--------------------------+---------------
> userid | character varying(30) | not null
> gender | character varying(1) |
> age | integer |
> height | character varying(10) |
> ethnicty pe | character varying(30) |
> education | character varying(30) |
> createdate | timestamp with time zone | default now()
> zipcode | character varying(5) |
> birthdate | date | default now()
> zodiac | character varying(40) |
> seekgender | character varying(2) |
> Indexes: user_details_p_pkey primary key btree (userid),
> user_details_p_age_idx btree (age),
> &nb sp; user_details_p_ethnic_idx btree (ethnictype),
> user_details_p_gender_idx btree (gender),

>
> user_details_p_last_login_idx btree (last_login),
> user_details_p_seekgender_idx btree (seekgender),
> user_details_p_state_idx btree (state)

There is too many indexes -
index on gender (2 possible values) is useless,
index on ethnic (how many values - I think not too many?) is possibly
useless

Consider creating single index on several fields for queries like this:
select
...
where
user_details_p.gender ='W' AND
user_details_p.seekgender ='M' AND
user_details_p.age between 18 and 50 and

index on (age,seekgender,gender)

>
> Foreign Key constraints: $1 FOREIGN KEY (userid) REFERENCES
> user_login(userid) ON UPDATE NO ACTI
> ON ON DELETE CASCADE
>
>
> Table "public.user_match_details"
> Column | Type | Modifiers
> ------------------+------------------------+-----------
> userid | character varying(30) | not null
> soughtmate | character varying(200) |
> ethnicity | character varying(200) |
> marital_status | character varying(200) |
> min_age | integer |
> max_age | integer &nbs p; |
> city | character varying(50) |
> state | character varying(2) |
> zipcode | integer |
> match_distance | integer |
> Indexes: user_match_details_pkey primary key btree (userid)
> Foreign Key constraints: $1 FOREIGN KEY (userid) REFERENCES
> user_login(userid) ON UPDATE NO ACTION ON DELETE CASCADE
>
> Table "public.zips_300"
> Column | Type | Modifiers
> -------------+----------------------+-----------
> origin | character varying(5) |
> destination | character varying(5) |
> Indexes: zips_300_origin_idx btree (origin)

If you need more specific answer, you have to add more information - how
many records do you have in your tables and how many possible values do
you use for example for zipcodes, ethnicity etc.

Regards,
Tomasz Myrta

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Christopher Smith 2003-01-13 22:03:00 Re: query speed joining tables
Previous Message Christopher Smith 2003-01-13 21:56:42 Re: query speed joining tables