From: | Adam Gundy <adam(at)starsilk(dot)net> |
---|---|
To: | Richard Huxton <dev(at)archonet(dot)com> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: varchar index joins not working? |
Date: | 2008-04-14 17:02:25 |
Message-ID: | 48038E21.50606@starsilk.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Richard Huxton wrote:
> Adam Gundy wrote:
>> On Thu, Apr 10, 2008 at 8:52 AM, Adam Gundy <adam(at)starsilk(dot)net> wrote:
>>> Richard Huxton wrote:
>>>> How many distinct values do you have in groups.groupid and
>>> group_access.group_id?
>>> for the small database (since it shows the same problem):
>>>
>>> group_access: 280/268
>>> groups: 2006/139
>>>
>>> for the large database:
>>>
>>> group_access: same
>>> groups: 1712647/140
>>>
>>> the groupid key is an MD5 hash, so it should be uniformly distributed.
>>> maybe that throws the stats? but, again, char works, varchar doesn't :-(
>>
>> OK, I'm thinking the varchar/char part is not the issue.
>
> Good, because it's not :-)
hmm. unfortunately it did turn out to be (part) of the issue. I've
discovered that mixing char and varchar in a stored procedure does not
coerce the types, and ends up doing seq scans all the time.
changing something like this:
proc x ( y char(32) )
{
select * from groups where groupid = y
}
into this:
proc x ( y varchar(32) )
{
select * from groups where groupid = y
}
and suddenly postgres does index lookups in the stored proc... way faster.
>> I *think* that when I analyze using char instead of varchar, it is
>> recording a stat for the large group, but for some reason with varchar
>> doesn't add a stat for that one.
>>
>> so, the real question is, how do I fix this? I can turn the stats way
>> up to 1000, but that doesn't guarantee that I'll get a stat for the
>> large group :-(
>
> Well, by default it will be tracking the 10 most common values (and how
> often they occur). As you say, this can be increased to 1000 (although
> it obviously takes longer to check 1000 rather than 10).
>
> We can have a look at the stats with something like:
> SELECT * FROM pg_stats WHERE tablename='group_access' AND attname='uid';
> You'll be interested in n_distinct, most_common_vals and most_common_freqs.
>
> However, I think the problem may be that PG doesn't track cross-column
> stats, so it doesn't know that a particular uid implies one or more
> particular groupid values.
I doubt we could get stats stable enough for this. the number of groups
will hopefully be much larger at some point.
it's a shame the index entries can't be used to provide information to
the planner, eg a rough count of the number of entries for a given key
(or subset). it would be nice to be able to create eg a counted btree
when you know you have this kind of data as a hint to the planner.
>> can I turn the statistics off completely for this column? I'm guessing
>> that if I can, that will mean it takes a guess based on the number of
>> distinct values in the groups table, which is still large number of
>> records, possibly enough to trigger the seqscan anyway.
>
> No - can't disable stats. Besides, you want it the other way around -
> index scans for all groups except the largest.
actually, disabling seqscan at the server level gives extremely good
response times. I ended up rewriting a few queries that were scanning
the whole group for no good reason, and bitmap index hashing seems to
take care of things nicely.
queries have gone from 30+ seconds to < 0.1 seconds.
>> does postgres have a way of building a 'counted index' that the
>> planner can use for it's record counts? some way of forcibly
>> maintaining a stat for every group?
>
> No, but let's see what's in pg_stats.
no real help there. either it hits the group being read, and does a good
plan, or it doesn't, and tries to seqscan (unless I disable it). even
forcing stats to 1000 only bandaids the situation, given the number of
groups will eventually exceed that..
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2008-04-14 17:46:06 | Re: varchar index joins not working? |
Previous Message | Greg Smith | 2008-04-14 15:44:44 | Re: shared_buffers performance |