From: | Manfred Koizar <mkoi-pg(at)aon(dot)at> |
---|---|
To: | Wes Palmer <Wesley(dot)R(dot)Palmer(at)syntegra(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Compound keys and foreign constraints |
Date: | 2004-04-06 12:59:33 |
Message-ID: | no6570l605ah5rc3m9i7mu3is64sbffdal@email.aon.at |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Mon, 05 Apr 2004 11:55:17 -0500, Wes Palmer
<Wesley(dot)R(dot)Palmer(at)syntegra(dot)com> wrote:
>> match, there's no need to fetch the heap tuple. Unfortunately the
>> planner doesn't believe that this is possible:
>> /* Don't believe estimates less than 1... */
>> if (tuples_fetched < 1.0)
>> tuples_fetched = 1.0;
>
>Unless you are fetching a majority of the rows, wouldn't it always be
>cheaper to consult the index instead of reading the full record?
In many (if not most) cases yes; always no. Think of a table having
100 rows per page. To select 1% of the rows you might have to access
every row of the table -- at least theoretically in the worst case, but
you get the point. It is the optimizer's job to find out which one of a
possibly large number of query plans is the best. Unfortunately the
optimizer is not perfect (yet :-)). This code snippet above might be a
spot where some improvement is possible. If this really contributes to
your case, you have to live with if for now.
>> -> Nested Loop (cost=0.00..75565.60 rows=1 width=0)
>> -> Nested Loop (cost=0.00..73745.26 rows=11 width=0)
>In the above example, the first case is where both the compound and the
>single-column indexes existed. I ran the test, deleted the single-column
>index, then ran the test again.
Yes, this is what I understood. What I don't understand is why the
optimizer didn't choose the second plan in the first run.
> I did not run vacuum or analyze in between.
Auto-vacuum?
>> more readable with \x):
>See enclosed tar file. f1 is the first couple of commands. f1a is with
>random_page_cost=4 and f1b is with random_page_cost=.8
SELECT * FROM pg_stats WHERE tablename='messages' ?
What were the other settings (sort_mem, effective_cache_size)?
>You can see that I get the same plan (filter vs. index) even keeping the
>addresses table out of the picture.
Now that you join only two tables you could experiment with forcing
other join methods (SET enable_nestloop, enable_mergejoin,
enable_hashjoin).
>It occurs to me that de-normalizing it a bit and putting a duplicate
>message_date in the message_recipients may have been a good thing to do.
>The result set could have been obtained quicker. But, I was trying to keep
>everything normalized as much as possible and wanted to keep the
>message_recipients as small as possible because it would be growing the
>fastest. Trying to put that in now would be a bit of a challenge, since I'd
>have to update 70 million records based on the value in 20 million records.
Another de-normalization idea:
ALTER TABLE messages ADD COLUMN mkey2 numeric(12,0);
UPDATE messages SET mkey2 = message_key;
Do this in batches and run VACUUM between the UPDATEs.
DROP INDEX messages_i_id_mdate;
CREATE INDEX ... ON messages(mkey2, message_date);
SELECT ...
FROM a INNER JOIN r ON (...)
INNER JOIN m ON (r.message_key = m.mkey2)
WHERE ...
I guess your hack leads to better speed, but mine is limited to only one
table which might be considered cleaner.
Servus
Manfred
From | Date | Subject | |
---|---|---|---|
Next Message | Ericson Smith | 2004-04-06 13:27:04 | Re: Large DB |
Previous Message | Bill Moran | 2004-04-06 12:47:48 | Re: pg_autovacuum won't make on FreeBSD |