From: | Richard Huxton <dev(at)archonet(dot)com> |
---|---|
To: | Adam Gundy <adam(at)starsilk(dot)net> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: varchar index joins not working? |
Date: | 2008-04-10 08:46:09 |
Message-ID: | 47FDD3D1.9070208@archonet.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Adam Gundy wrote:
> I'm hitting an unexpected problem with postgres 8.3 - I have some
> tables which use varchar(32) for their unique IDs which I'm attempting
> to join using some simple SQL:
>
> select *
> from group_access, groups
> where group_access.groupid = groups.groupid and
> group_access.uid = '7275359408f44591d0717e16890ce335';
>
> there's a unique index on group_access.groupid, and a non-unique index
> on groups.groupid. both are non-null.
What about group_access.uid - I'd have thought that + groups pkey is
probably the sensible combination here.
> the problem is: if groupid (in both tables) is varchar, I cannot force
> postgres (no matter how hard I try) to do an index scan. it ends up
> reading the entire groups table (pretty large!):
OK
> QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------------
> Hash Join (cost=8.89..41329.88 rows=119940 width=287) (actual
> time=0.202..935.136 rows=981 loops=1)
That's because it's expecting 119,940 rows to match (rather than the
actual 981 you do get). If you were getting that many results this is
probably a sensible plan.
> Hash Cond: ((groups.groupid)::text = (group_access.groupid)::text)
> -> Seq Scan on groups (cost=0.00..31696.48 rows=1123348
> width=177) (actual time=0.011..446.091 rows=1125239 loops=1)
It's got a good idea of the total number of rows in groups.
> -> Hash (cost=8.51..8.51 rows=30 width=110) (actual
> time=0.148..0.148 rows=30 loops=1)
> -> Seq Scan on group_access (cost=0.00..8.51 rows=30
> width=110) (actual time=0.014..0.126 rows=30 loops=1)
And also group_access. Oh, the seq-scan doesn't really matter here. It
probably *is* faster to read all 30 rows in one burst rather than go to
the index and then back to the table.
> Filter: ((uid)::text = '7275359408f44591d0717e16890ce335'::text)
> Total runtime: 935.443 ms
> (7 rows)
>
> if I disable seq_scan, I get this:
>
> QUERY PLAN
> ------------------------------------------------------------------------------------------------------------------------------------------------
> Merge Join (cost=1.47..106189.61 rows=120004 width=287) (actual
> time=0.100..1532.353 rows=981 loops=1)
It's still thinking it's going to get 120 thousand rows.
> it's running an index scan across the entire table (no condition applied) :-(
>
> so, just for the hell of it, I tried making groupid a char(32),
> despite repeated assertions in this group that there's no performance
> difference between the two:
There's no performance difference between the two.
> Nested Loop (cost=4.48..253.85 rows=304 width=291) (actual
> time=0.715..22.906 rows=984 loops=1)
> (this last plan is actually against a smaller test DB, but I get the
> same behavior with it, seq scan for varchar or index scan for char,
> and the results returned are identical for this query)
The char(32) thing isn't important here, what is important is that it's
expecting ~300 rows rather than 120,000. It's still wrong, but it's
close enough to make sense.
So - the question is - why is PG expecting so many matches to your join.
How many distinct values do you have in groups.groupid and
group_access.group_id?
--
Richard Huxton
Archonet Ltd
From | Date | Subject | |
---|---|---|---|
Next Message | Matthew | 2008-04-10 09:51:13 | Re: large tables and simple "= constant" queries using indexes |
Previous Message | PFC | 2008-04-10 08:25:48 | Re: large tables and simple "= constant" queries using indexes |