Quick Links

Re: fast DISTINCT or EXIST

From:	Arjen van der Meijden <acmmailing(at)tweakers(dot)net>
To:	Tilo Buschmann <mailinglist(dot)postgresql(dot)performance(at)b-n-w(dot)org>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Performance <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: fast DISTINCT or EXIST
Date:	2007-04-07 17:28:52
Message-ID:	4617D4D4.2080509@tweakers.net
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On 7-4-2007 18:24 Tilo Buschmann wrote:
> Unfortunately, the query above will definitely not work correctly, if
> someone searches for "a" or "the".

That are two words you may want to consider not searching on at all.

As Tom said, its not very likely to be fixed in PostgreSQL. But you can
always consider using application logic (or a pgpsql function, you could
even use a set returning function to replace the double-limit subselects
in your in-statement) which will automatically fetch more records when
the initial guess turns out to be wrong, obviously using something like
a NOT IN to remove the initially returned cd.id's for the next batches.
Then again, even 'a' or 'the' will not likely be in *all* tracks of a
cd, so you can also use the 'average amount of tracks per cd' (about 10
or 11?) as your multiplier rather than my initial 3. Obviously you'll
loose performance with each increment of that value.

Best regards,

Arjen

In response to

Re: fast DISTINCT or EXIST at 2007-04-07 16:24:07 from Tilo Buschmann

Browse pgsql-performance by date

	From	Date	Subject
Next Message	david	2007-04-07 21:42:47	Re: SCSI vs SATA
Previous Message	Tom Lane	2007-04-07 16:39:38	Re: fast DISTINCT or EXIST