From: | Richard Huxton <dev(at)archonet(dot)com> |
---|---|
To: | WireSpot <wirespot(at)gmail(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: query time |
Date: | 2005-02-02 15:50:55 |
Message-ID: | 4200F6DF.6080504@archonet.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
WireSpot wrote:
> On Wed, 02 Feb 2005 14:48:41 +0000, Richard Huxton <dev(at)archonet(dot)com> wrote:
>
>>Think about it, you'd need an index that ordered use_name so that
>>(john_doe, AAAAAjohn_doe, Zjohn_doe1234) were all next to each other.
>>
>>If you anchor the search (LIKE 'john_doe%') and are using the C locale
>>then an index can be used (because it's the same as >='john_doe' AND
>><'john_dof').
>
>
> Unfortunately, all my cases are LIKE '%john_doe'. So I'm guessing I'll
> never get an index.
Well, if you want names *ending* in john_doe you can create a function
reverse() and then create a functional index on it. If you want
something in the middle, tough.
> How about the response time when doing both LIKE user_name and ORDER
> BY timestamp? Why does it get blown out of the water like that, from 5
> and 5 to 2000 ms? If a LIKE by itself takes 5 ms and an ORDER by
> itself takes 5 ms... Doesn't it grab the results matching the LIKE and
> the ORDER only those?
In your first '%john_doe%' example you had a LIMIT 100 without a sort.
So - the sequential scan started and when PG found the 100th match it
stopped. If you order by timestamp, it has to find all the matches, sort
them by timestamp and then discard the 101st onwards.
Now, the reason it doesn't use the timestamp index instead is that it
thinks that your LIKE is going to be very specific:
(cost=0.00..8250.17 rows=1 width=175)
In actual fact, there isn't one matching row there are 15,481:
(actual time=0.432..1051.036 rows=15481 loops=1)
I'm not sure that there's any way for PG to know how many matches there
are going to be - it keeps track of values, but there's no easy way to
model substrings of a column.
Are you sure you need to match user_name against a double-wildcarded
value? If you do, all I could suggest is perhaps limiting the search to
one day/week/whatever at a time, which might make the timestamp index
seem appealing.
> While we're on the subject of indexes, is there any way I can speed up
> a SELECT DISTINCT user_name FROM log_entries? With 200.000 entries I
> get like 46 seconds on this one.
Not sure there's a simple way to avoid a seq-scan of the whole table. PG
has good concurrency support with MVCC, but the downside is that the
indexes don't store whether something is visible or not. That means you
can't just look at the index to determine what values are visible to
your current transaction.
> I apologise if these things are already in the manual, I'm only now
> getting used to it and I don't find some things right away.
No problem - all sensible questions. Worth checking the mailing-list
archives too though.
--
Richard Huxton
Archonet Ltd
From | Date | Subject | |
---|---|---|---|
Next Message | Yury Don | 2005-02-02 15:53:57 | PostgreSQL on cluster |
Previous Message | Tom Lane | 2005-02-02 15:41:27 | Re: PL/PgSQL, Inheritance, Locks, and Deadlocks |