Re: Slow SELECT by primary key? Postgres 9.1.2

From: Roman Konoval <rkonoval(at)gmail(dot)com>
To: Evgeniy Shishkin <itparanoia(at)gmail(dot)com>
Cc: John Mudd <johnbmudd(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Slow SELECT by primary key? Postgres 9.1.2
Date: 2013-05-28 08:10:24
Message-ID: CABcZEEBpWumTFjvJZe7eFk-Lb5xpTvb1yXVThvaWK2+hgHE4eQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

John,

And can you please include BUFFERS to ANALYZE?

Regards,
Roman Konoval

On Tue, May 28, 2013 at 9:48 AM, Evgeniy Shishkin <itparanoia(at)gmail(dot)com>wrote:

>
>
>
>
> On 28.05.2013, at 2:17, John Mudd <johnbmudd(at)gmail(dot)com> wrote:
>
> Thanks again.
>
> Well, I have two problems with using the CLUSTER option. It's only
> temporary since any updates, depending how much free space is reserved per
> page, requires re-running the CLUSTER. And my primary concern is that it
> arbitrarily gives an unfair advantage to the primary key SELECT. Still,
> it's easy to test so here are the results. The primary key still looses
> even with the CLUSTER. Granted it is close but considering this is now an
> unfair comparison it still doesn't make sense to me. How can a search for a
> specific row that should be fairly straight forward take longer than a
> search that includes an ORDER BY clause?
>
>
> Well, you do just regular index scan because of LIMIT 1.
>
> And now it is just a matter of index size and table organization.
>
> I also don't understand why you consider CLUSTER unfair - the way you
> populated the table was natural cluster over my_key.
>
> But it bothers me why my_key is always better. Can you please test it on
> different values but the same rows? Because now it is two different tuples
> and you count every io.
>
>
> test=# CLUSTER test_select USING test_select_pkey ;
> CLUSTER
> test=# VACUUM ANALYZE test_select ;
> VACUUM
>
> (stopped postgres; reset O/S cache; started postgres)
>
>
> test=# explain analyze SELECT * FROM test_select WHERE key1 >= 500000
> ORDER BY key1, key2, key3, id LIMIT 1;
> QUERY PLAN
>
> --------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=0.00..0.08 rows=1 width=21) (actual time=19.430..19.431
> rows=1 loops=1)
> -> Index Scan using my_key on test_select (cost=0.00..41938.15
> rows=499992 width=21) (actual time=19.428..19.428 rows=1 loops=1)
> Index Cond: (key1 >= 500000)
> Total runtime: 19.526 ms
>
>
> (stopped postgres; reset O/S cache; started postgres)
>
>
> test=# explain analyze SELECT * FROM test_select WHERE id = 500000;
> QUERY PLAN
>
>
> ---------------------------------------------------------------------------------------------------------------------------------
> Index Scan using test_select_pkey on test_select (cost=0.00..8.36 rows=1
> width=21) (actual time=21.070..21.072 rows=1 loops=1)
> Index Cond: (id = 500000)
> Total runtime: 21.178 ms
>
>
>
>
> On Mon, May 27, 2013 at 10:59 AM, Evgeny Shishkin <itparanoia(at)gmail(dot)com>wrote:
>
>>
>> On May 27, 2013, at 6:35 PM, John Mudd <johnbmudd(at)gmail(dot)com> wrote:
>>
>> Thanks, that's easy enough to test. Didn't seem to help though.
>>
>>
>> Ok. And if you CLUSTER tables USING PK?
>>
>>
>> test=# REINDEX index test_select_pkey;
>> REINDEX
>> test=# VACUUM ANALYZE test_select ;
>> VACUUM
>>
>>
>> (stopped postgres; reset O/S cache; started postgres)
>>
>> test=# explain analyze SELECT * FROM test_select WHERE key1 >= 500000
>> ORDER BY key1, key2, key3, id LIMIT 1;
>> QUERY PLAN
>>
>>
>> --------------------------------------------------------------------------------------------------------------------------------------
>> Limit (cost=0.00..0.08 rows=1 width=21) (actual time=16.368..16.369
>> rows=1 loops=1)
>> -> Index Scan using my_key on test_select (cost=0.00..41981.16
>> rows=501333 width=21) (actual time=16.366..16.366 rows=1 loops=1)
>> Index Cond: (key1 >= 500000)
>> Total runtime: 16.444 ms
>>
>>
>> (stopped postgres; reset O/S cache; started postgres)
>>
>> test=# explain analyze SELECT * FROM test_select WHERE id = 500000;
>> QUERY PLAN
>>
>> ---------------------------------------------------------------------------------------------------------------------------------
>> Index Scan using test_select_pkey on test_select (cost=0.00..8.36
>> rows=1 width=21) (actual time=23.072..23.074 rows=1 loops=1)
>> Index Cond: (id = 500000)
>> Total runtime: 23.192 ms
>>
>>
>>
>>
>> On Mon, May 27, 2013 at 10:21 AM, Evgeny Shishkin <itparanoia(at)gmail(dot)com>wrote:
>>
>>>
>>> On May 27, 2013, at 6:02 PM, John Mudd <johnbmudd(at)gmail(dot)com> wrote:
>>>
>>> > Postgres 9.1.2 on Ubuntu 12.04
>>> >
>>> > Any reason why a select by primary key would be slower than a select
>>> that includes an ORDER BY? I was really hoping using the primary key would
>>> give me a boost.
>>> >
>>>
>>> You created my_key after data loading, and PK was there all the time.
>>> If you REINDEX PK, i bet it will be as fast.
>>>
>>> > I stopped the server and cleared the O/S cache using "sync; echo 3 >
>>> /proc/sys/vm/drop_caches" between the runs.
>>> >
>>> >
>>> >
>>> > test=# VACUUM ANALYZE test_select;
>>> > VACUUM
>>> >
>>> > (stopped postgres; reset O/S cache; started postgres)
>>> >
>>> > test=# explain analyze SELECT * FROM test_select WHERE key1 >= 500000
>>> ORDER BY key1, key2, key3, id LIMIT 1;
>>> > QUERY
>>> PLAN
>>> >
>>> --------------------------------------------------------------------------------------------------------------------------------------
>>> > Limit (cost=0.00..0.08 rows=1 width=21) (actual time=12.599..12.600
>>> rows=1 loops=1)
>>> > -> Index Scan using my_key on test_select (cost=0.00..41895.49
>>> rows=498724 width=21) (actual time=12.597..12.597 rows=1 loops=1)
>>> > Index Cond: (key1 >= 500000)
>>> > Total runtime: 12.678 ms
>>> >
>>> > (stopped postgres; reset O/S cache; started postgres)
>>> >
>>> > test=# explain analyze SELECT * FROM test_select WHERE id = 500000;
>>> > QUERY PLAN
>>> >
>>> ---------------------------------------------------------------------------------------------------------------------------------
>>> > Index Scan using test_select_pkey on test_select (cost=0.00..8.36
>>> rows=1 width=21) (actual time=31.396..31.398 rows=1 loops=1)
>>> > Index Cond: (id = 500000)
>>> > Total runtime: 31.504 ms
>>> >
>>> >
>>> >
>>> > Schema:
>>> >
>>> > test=# \d test_select
>>> > Table "public.test_select"
>>> > Column | Type | Modifiers
>>> >
>>> --------+--------------+----------------------------------------------------------
>>> > id | integer | not null default
>>> nextval('test_select_id_seq'::regclass)
>>> > key1 | integer |
>>> > key2 | integer |
>>> > key3 | integer |
>>> > data | character(4) |
>>> > Indexes:
>>> > "test_select_pkey" PRIMARY KEY, btree (id)
>>> > "my_key" btree (key1, key2, key3, id)
>>> >
>>> > test=#
>>> >
>>> >
>>> >
>>> > Sample data:
>>> >
>>> > test=# SELECT * FROM test_select LIMIT 10;
>>> > id | key1 | key2 | key3 | data
>>> > ----+--------+--------+--------+------
>>> > 1 | 984966 | 283954 | 772063 | x
>>> > 2 | 817668 | 393533 | 924888 | x
>>> > 3 | 751039 | 798753 | 454309 | x
>>> > 4 | 128505 | 329643 | 280553 | x
>>> > 5 | 105600 | 257225 | 710015 | x
>>> > 6 | 323891 | 615614 | 83206 | x
>>> > 7 | 194054 | 63506 | 353171 | x
>>> > 8 | 212068 | 881225 | 271804 | x
>>> > 9 | 644180 | 26693 | 200738 | x
>>> > 10 | 136586 | 498699 | 554417 | x
>>> > (10 rows)
>>> >
>>> >
>>> >
>>> >
>>> > Here's how I populated the table:
>>> >
>>> > import psycopg2
>>> >
>>> > conn = psycopg2.connect('dbname=test')
>>> >
>>> > cur = conn.cursor()
>>> >
>>> > def random_int():
>>> > n = 1000000
>>> > return random.randint(0,n)
>>> >
>>> > def random_key():
>>> > return random_int(), random_int(), random_int()
>>> >
>>> > def create_table():
>>> > cur.execute('''
>>> > DROP TABLE IF EXISTS test_select;
>>> >
>>> > CREATE TABLE test_select (
>>> > id SERIAL PRIMARY KEY,
>>> > key1 INTEGER,
>>> > key2 INTEGER,
>>> > key3 INTEGER,
>>> > data char(4)
>>> > );
>>> > ''')
>>> > conn.commit()
>>> >
>>> > n = 1000000
>>> > for i in range(n):
>>> > cur.execute("INSERT INTO test_select(key1, key2, key3, data)
>>> VALUES(%s, %s, %s, 'x')", random_key())
>>> > conn.commit()
>>> >
>>> > cur.execute('CREATE INDEX my_key ON test_select(key1, key2, key3,
>>> id)')
>>> > conn.commit()
>>> >
>>> > create_table()
>>> >
>>>
>>>
>>
>>
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Matheus de Oliveira 2013-05-28 12:39:45 Re: Slow SELECT by primary key? Postgres 9.1.2
Previous Message Evgeniy Shishkin 2013-05-28 06:48:45 Re: Slow SELECT by primary key? Postgres 9.1.2