From: | "Gurjeet Singh" <singh(dot)gurjeet(at)gmail(dot)com> |
---|---|
To: | "Kai-Uwe Sattler" <kus(at)tu-ilmenau(dot)de> |
Cc: | "PGSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>, "Gurjeet Singh" <gurjeet(dot)singh(at)enterprisedb(dot)com> |
Subject: | Re: [Fwd: Index Advisor] |
Date: | 2006-11-20 05:35:46 |
Message-ID: | 65937bea0611192135q2f9e8a74he5b84b5b0880f5be@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 11/19/06, Kai-Uwe Sattler <kus(at)tu-ilmenau(dot)de> wrote:
>
> Hi,
> > .) The SELECTs in the pg_advise are returning wrong results, when
> > the same index is suggested twice, because of the SUM() aggregates.
> I don't think that this is a bug. If the same index is recommended
> for two different queries it will appear two times in
> pg_indexadvisor. So, if you want to calculate the overall benefit of
> this index, then you have to sum up the local benefits for each query.
If this is intended behaviour, then its okay.
> .) I doubt that on a table t(a,b), for a suggestion of idx(b,a),
> > pg_advise will
> > suggest idx(a,b);
> ?? Not sure, if I understand you right. idx(b,a) and idx(a,b) are
> completely different indexes. Why should pg_advise suggest idx(a,b).
I am referring to the way get_column_names() is coded. First, the SQL for
the portal does not guarantee any order of the result; secondly, the 'for'
loops that follow, will always output the columns in their increasing order
of attribute number. Here's a small way to reproduce the bug, that I cooked
up just now:
Change the SQL in read_advisor_output() to:
res = PQexec(conn, "DECLARE myportal CURSOR FOR "
"SELECT relname,"
"int2vector_to_string(index_attrs) AS colids,"
"MAX(index_pages) AS size_in_pages,"
"SUM(profit) AS benefit,"
"SUM(profit)/MAX(index_pages) AS gain "
"FROM pg_indexadvisor,"
"pg_class "
"WHERE backend_pid = 0 "
"AND rel_oid = pg_class.oid "
"GROUP BY relname, colids "
"ORDER BY gain DESC");
Notice the backend_pid = 0. Now insert the following into pg_indexadvisor:
insert into pg_indexadvisor values( 1259, '2 1', 2, 1000, 20,0,0 );
This should prompt the advisor to generate the statement:
create index advidx_1 on pg_class (relnamespace,relname);
But instead, it will output this:
create index advidx_1 on pg_class (relname,relnamespace);
Now run the advisor with any workload, and inspect the output.
$ pg_advise.exe -d postgres -h localhost -p 5432 -U gsk -o create_index.sql
workload.sql
We should tokenize the list of attribute numbers (column_ids variable) in
get_column_names() and query them individually.
But there is another bug: if there are recommendations like idx
> (a,b,c), idx(a,b) and idx(a) it would be a good idea to create just
> idx(a). I will add this to pg_advise as an optional feature.
>
I'd say it's a new feature request and not a bug :) But I don't understand
why would you want to not build idx(a,b,c) in such a situation? idx(a,b,c)
can be useful in places where idx(a,b) or idx(a) is required, but the same
can't be said about idx(a) or idx(a,b) being useful where idx(a,b,c) is
required!
Best regards,
--
gurjeet[(dot)singh](at)EnterpriseDB(dot)com
singh(dot)gurjeet(at){ gmail | hotmail | yahoo }.com
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Fuhr | 2006-11-20 05:54:57 | Statistics visibility in SERIALIZABLE transactions |
Previous Message | Alvaro Herrera | 2006-11-19 22:37:53 | Re: [GENERAL] Allowing SYSDATE to Work |