Re: Compound keys and foreign constraints

From: Manfred Koizar <mkoi-pg(at)aon(dot)at>
To: Wes Palmer <Wesley(dot)R(dot)Palmer(at)syntegra(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Compound keys and foreign constraints
Date: 2004-04-06 12:59:33
Message-ID: no6570l605ah5rc3m9i7mu3is64sbffdal@email.aon.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, 05 Apr 2004 11:55:17 -0500, Wes Palmer
<Wesley(dot)R(dot)Palmer(at)syntegra(dot)com> wrote:
>> match, there's no need to fetch the heap tuple. Unfortunately the
>> planner doesn't believe that this is possible:
>> /* Don't believe estimates less than 1... */
>> if (tuples_fetched < 1.0)
>> tuples_fetched = 1.0;
>
>Unless you are fetching a majority of the rows, wouldn't it always be
>cheaper to consult the index instead of reading the full record?

In many (if not most) cases yes; always no. Think of a table having
100 rows per page. To select 1% of the rows you might have to access
every row of the table -- at least theoretically in the worst case, but
you get the point. It is the optimizer's job to find out which one of a
possibly large number of query plans is the best. Unfortunately the
optimizer is not perfect (yet :-)). This code snippet above might be a
spot where some improvement is possible. If this really contributes to
your case, you have to live with if for now.

>> -> Nested Loop (cost=0.00..75565.60 rows=1 width=0)
>> -> Nested Loop (cost=0.00..73745.26 rows=11 width=0)

>In the above example, the first case is where both the compound and the
>single-column indexes existed. I ran the test, deleted the single-column
>index, then ran the test again.

Yes, this is what I understood. What I don't understand is why the
optimizer didn't choose the second plan in the first run.

> I did not run vacuum or analyze in between.

Auto-vacuum?

>> more readable with \x):
>See enclosed tar file. f1 is the first couple of commands. f1a is with
>random_page_cost=4 and f1b is with random_page_cost=.8

SELECT * FROM pg_stats WHERE tablename='messages' ?

What were the other settings (sort_mem, effective_cache_size)?

>You can see that I get the same plan (filter vs. index) even keeping the
>addresses table out of the picture.

Now that you join only two tables you could experiment with forcing
other join methods (SET enable_nestloop, enable_mergejoin,
enable_hashjoin).

>It occurs to me that de-normalizing it a bit and putting a duplicate
>message_date in the message_recipients may have been a good thing to do.
>The result set could have been obtained quicker. But, I was trying to keep
>everything normalized as much as possible and wanted to keep the
>message_recipients as small as possible because it would be growing the
>fastest. Trying to put that in now would be a bit of a challenge, since I'd
>have to update 70 million records based on the value in 20 million records.

Another de-normalization idea:

ALTER TABLE messages ADD COLUMN mkey2 numeric(12,0);
UPDATE messages SET mkey2 = message_key;

Do this in batches and run VACUUM between the UPDATEs.

DROP INDEX messages_i_id_mdate;
CREATE INDEX ... ON messages(mkey2, message_date);

SELECT ...
FROM a INNER JOIN r ON (...)
INNER JOIN m ON (r.message_key = m.mkey2)
WHERE ...

I guess your hack leads to better speed, but mine is limited to only one
table which might be considered cleaner.

Servus
Manfred

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Ericson Smith 2004-04-06 13:27:04 Re: Large DB
Previous Message Bill Moran 2004-04-06 12:47:48 Re: pg_autovacuum won't make on FreeBSD