Re: Equivalence Classes when using IN

From: Kim Rose Carlsen <krc(at)hiper(dot)dk>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Equivalence Classes when using IN
Date: 2017-10-09 09:39:39
Message-ID: VI1PR05MB1709102944A48BCC52D460F1C7740@VI1PR05MB1709.eurprd05.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


>On 9 October 2017 at 08:01, Kim Rose Carlsen <krc(at)hiper(dot)dk> wrote:

>> Is this because postgres never consider IN clause when building equivalence
>> class's?
>
>Only btree equality operators are considered at the moment.

After good night sleep and reading the previous discussion, I am no longer sure I have reduced my original problem to the right example. If we continue from previous setup and add the following:

           ALTER TABLE customer ADD COLUMN age INTEGER;
           UPDATE customer SET age = customer_id / 5;

CREATE INDEX ON customer (age);
CREATE INDEX ON product (customer_id);

               SET enable_hashjoin = false;
               SET enable_mergejoin = false;
                
            EXPLAIN ANALYZE
             SELECT *
               FROM customer
               JOIN view_customer
                 ON customer.customer_id = view_customer.customer_id
              WHERE age < 20;

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=139.00..10392.96 rows=668 width=16) (actual time=0.528..35.120 rows=200 loops=1)
Join Filter: (c.customer_id = product.customer_id)
Rows Removed by Join Filter: 199900
-> Nested Loop (cost=0.28..199.21 rows=334 width=12) (actual time=0.075..1.146 rows=100 loops=1)
-> Seq Scan on customer (cost=0.00..21.51 rows=334 width=8) (actual time=0.067..0.282 rows=100 loops=1)
Filter: (age < 20)
Rows Removed by Filter: 901
-> Index Only Scan using customer_pkey on customer c (cost=0.28..0.53 rows=1 width=4) (actual time=0.006..0.006 rows=1 loops=100)
Index Cond: (customer_id = customer.customer_id)
Heap Fetches: 100
-> Materialize (cost=138.73..173.75 rows=2001 width=8) (actual time=0.005..0.130 rows=2001 loops=100)
-> Sort (cost=138.73..143.73 rows=2001 width=8) (actual time=0.448..0.588 rows=2001 loops=1)
Sort Key: product.customer_id, product.product_id
Sort Method: quicksort Memory: 142kB
-> Seq Scan on product (cost=0.00..29.01 rows=2001 width=8) (actual time=0.006..0.215 rows=2001 loops=1)
Planning time: 0.214 ms
Execution time: 35.284 ms

The planner prefer to use hash and merge joins which is ok, when many rows are to be joined, I don't think any condition can be merged to make these case faster. I have disabled merge and hash joins to get to a nested loop join instead, in this case it would be much better if customer_id can be pulled inside the loop, so it can look at only the relevant rows and not all rows for each loop. I somehow inferred that this would be the same as selecting from the view using IN clause, now I'm not so sure anymore.

I can see there is a trade off between planner time and how exotic the case is. If you want to be able to hide abstraction through views I guess the nature becomes more OLAP oriented than OLTP.

Best Regards
Kim Carlsen

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David Rowley 2017-10-09 10:47:59 Re: Equivalence Classes when using IN
Previous Message David Rowley 2017-10-09 06:39:54 Re: OR-clause support for indexes