Re: Mistake in my query or Index Scan on subquery failure?

From: Nick Barr <nicky(at)chuckie(dot)co(dot)uk>
To: "Gellert, Andre" <AGellert(at)ElectronicPartner(dot)de>
Cc: "'pgsql-general(at)postgresql(dot)org'" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Mistake in my query or Index Scan on subquery failure?
Date: 2004-03-03 18:03:41
Message-ID: 40461DFD.40705@chuckie.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Gellert, Andre wrote:

>Hello,
>I have a problem with this simple query :
>
>explain select * from ref_artikel a where a.artnr in ( 351275 , 351346 ,
>293082 ) LIMIT 20 OFFSET 0 ;
> QUERY PLAN
>
>----------------------------------------------------------------------------
>------------------------------------------------------
> Limit (cost=0.00..9.06 rows=3 width=299)
> -> Index Scan using ref_artikel_pkey, ref_artikel_pkey, \
> ref_artikel_pkey on ref_artikel a (cost=0.00..9.06 rows=3 width=299)
> Index Cond: ((artnr = 351275) OR (artnr = 351346) OR (artnr =
>293082))
>(3 rows)
>
>Is it okay , that the word ref_artikel_pkey will be repeated for each
>condition ?
>
>How can i optimize this, i have a number of up to 3000 "artnr" cond. to
>check for equality
>( and more tables joined over this query which i left away for better
>understanding ) ,
>is a "(x=1) or (x=2)..." faster than "x in 1,2... " at this large number of
>checks ?
>
>Maybe this is the hash-joined bug marked for this 7.4 postgresql....
>I consider upgrading to 7.41 on another testing server....
>
>Thx for ideas,
>Andre
>
>---------------------------(end of broadcast)---------------------------
>TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
>
As far as I can see this is fine, especially for a small number of
values in the IN clause. If I understand you correctly the number of
values in the IN clause might extend to 3000. This would not be
particularly efficient and might end in a sequential scan.

For example,

explain select * from ref_artikel a where a.artnr in ( 351275 , 351346 ,
293082, ... <2997 more values> ) LIMIT 20 OFFSET 0 ;

A better method if you are going to use a lot of numbers would be something like the following:

-----
BEGIN;

CREATE TEMPORARY TABLE temp_num_2004_03_03 (temp_num int4);

INSERT INTO temp_num_2004_03_03 (351275);
...
INSERT INTO temp_num_2004_03_03 (293082);

CREATE INDEX temp_num_2004_03_03_temp_num_idx ON temp_num_2004_03_03 (temp_num);

SELECT * FROM ref_artikel a, temp_num_2004_03_03 b WHERE a.artnr=b.temp_num ORDER BY a.artnr LIMIT 20 OFFSET 0;

DROP INDEX temp_num_2004_03_03_temp_num_idx;
DROP TABLE temp_num_2004_03_03;

COMMIT/ABORT;
-----

This is especially true if you are going to use the set of numbers again in other queries that follow.

BTW, you will need to put an ORDER BY clause in your SELECT to guarentee the order of the rows that come back, especially when using the LIMIT/OFFSET clauses.

Nick

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Paulovič Michal 2004-03-03 18:36:02 Re: Moving from MySQL to PGSQL....some questions (multilevel
Previous Message Bruno Wolff III 2004-03-03 17:48:20 Re: Moving from MySQL to PGSQL....some questions (multilevel