Re: Slow planning time for simple query

From: Maksim Milyutin <milyutinma(at)gmail(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Slow planning time for simple query
Date: 2018-06-09 18:49:47
Message-ID: 36adf760-680b-7a4a-e019-64f4eaaf6ff7@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On hot standby I faced with the similar problem.

The following simple query:

SELECT array_to_json(array_agg(t)) from (
    select *
        from     main.message m
            join main.message_instance mi on m.message_id = mi.message_id
            join main.channel_type ct on mi.channel_type_id =
ct.channel_type_id
        where   m.user_id = 2152595
            and ct.name = 'android'
            and m.user_delete_ts is null
        order by
            case
                when read_ts is null then 0
                else 1
            end,
            m.message_id desc
        limit 100
    ) t;

is planned 4.940 ms on master and *254.741* ms on standby. I tried to
investigate the reasons of so long planning on replica and determined
that index accesses on planning stage come to multiple heap scans.

Here is the execution plan statistics of query above:

 Aggregate (actual time=0.641..0.642 rows=1 loops=1)
   Buffers: shared hit=14
   ->  Subquery Scan on t (actual time=0.337..0.360 rows=2 loops=1)
         Buffers: shared hit=14
         ->  Limit (actual time=0.292..0.301 rows=2 loops=1)
               Buffers: shared hit=14
               ->  Sort (actual time=0.287..0.291 rows=2 loops=1)
                     Sort Key: (CASE WHEN (m.read_ts IS NULL) THEN 0
ELSE 1 END), m.message_id DESC
                     Sort Method: quicksort  Memory: 27kB
                     Buffers: shared hit=14
                     ->  Nested Loop (actual time=0.157..0.219 rows=2
loops=1)
                           Buffers: shared hit=14
                           ->  Seq Scan on channel_type ct (actual
time=0.043..0.048 rows=1 loops=1)
                                 Filter: (name = 'android'::text)
                                 Rows Removed by Filter: 7
                                 Buffers: shared hit=1
                           ->  Nested Loop (actual time=0.098..0.148
rows=2 loops=1)
                                 Buffers: shared hit=13
                                 ->  Index Scan using
message_user_id_idx1 on message m (actual time=0.055..0.063 rows=2 loops=1)
                                       Index Cond: (user_id = 2152595)
                                       Buffers: shared hit=5
                                 ->  Index Scan using
message_instance_pkey on message_instance mi (actual time=0.026..0.029
rows=1 loops=2)
                                       Index Cond: ((message_id =
m.message_id) AND (channel_type_id = ct.channel_type_id))
                                       Buffers: shared hit=8

The 14 accesses to buffer cache under query execution. But
pg_statio_user_tables and pg_statio_user_indexes views show the
different picture:

select
    schemaname, relname, indexrelname, idx_blks_hit, idx_blks_read,
pg_size_pretty(pg_relation_size(indexrelid::regclass))
from pg_statio_user_indexes
where idx_blks_hit + idx_blks_read > 0;

 schemaname |     relname      |     indexrelname      | idx_blks_hit |
idx_blks_read | pg_size_pretty
------------+------------------+-----------------------+--------------+---------------+----------------
 main       | channel_type     | channel_type_pkey |            2
|             0 | 16 kB
 main       | message_instance | message_instance_pkey |          666
|             0 | 345 MB
 main       | message          | message_pkey |           56
|             0 | 53 MB
 main       | message          | message_user_id_idx1 |            3
|             0 | 17 MB

select
    schemaname, relname, heap_blks_read, heap_blks_hit
from pg_statio_user_tables
where heap_blks_read + heap_blks_hit + idx_blks_hit + idx_blks_read > 0;

 schemaname |     relname      | heap_blks_read | heap_blks_hit
------------+------------------+----------------+---------------
 main       | channel_type     |              0 |             3
 main       | message          |              0 |          8682
 main       | message_instance |              0 | 114922

(This experiment is carried out on test hot standby without parallel
activities and with cleared system statistics)

The vacuum on problem tables (main.message and main.message_instance) on
master node resolves the problem somehow but its often execution slows
down all queries and generally increases IO.

Is there any case to overcome the problem or it's fundamental issue and
necessary to rewrite the query to simplify planning?

--
Regards,
Maksim Milyutin

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Maksim Milyutin 2018-06-09 19:06:04 Re: Slow planning time for simple query
Previous Message Miguel Angel Sanchez Sandoval 2018-06-09 17:40:58 Re: Performance problem postgresql 9.5

Browse pgsql-hackers by date

  From Date Subject
Next Message Maksim Milyutin 2018-06-09 19:06:04 Re: Slow planning time for simple query
Previous Message Peter Geoghegan 2018-06-09 18:30:35 Re: JIT documentation fixes