Re: Very slow Query compared to Oracle / SQL - Server

From: Semen Yefimenko <semen(dot)yefimenko(at)gmail(dot)com>
To: Vijaykumar Jain <vijaykumarjain(dot)github(at)gmail(dot)com>
Cc: pgsql-performance <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject: Re: Very slow Query compared to Oracle / SQL - Server
Date: 2021-05-07 15:57:19
Message-ID: CAKnyMoJb=+hBwOG97zd2Jbumg1BZuYpEqTPWStaFFNmfArBvQA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

For testing purposes I set up a separate postgres 13.2 instance on windows.
To my surprise, it works perfectly fine. Also indexes, have about 1/4 of
the size they had on 12.6.
I'll try setting up a new 12.6 instance and see if I can reproduce
anything.

This explain plan is on a SSD local postgres 13.2 instance with default
settings and not setting random_page_cost.

Gather Merge (cost=19444.07..19874.60 rows=3690 width=2638) (actual
time=41.633..60.538 rows=7087 loops=1)
Output: column1, .. ,column54
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=2123
-> Sort (cost=18444.05..18448.66 rows=1845 width=2638) (actual
time=4.057..4.595 rows=2362 loops=3)
Output: column1, .. ,column54
Sort Key: logtable.timestampcol1 DESC
Sort Method: quicksort Memory: 3555kB
Buffers: shared hit=2123
Worker 0: actual time=0.076..0.077 rows=0 loops=1
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=7
Worker 1: actual time=0.090..0.091 rows=0 loops=1
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=7
-> Parallel Bitmap Heap Scan on schema.logtable
(cost=61.84..16243.96 rows=1845 width=2638) (actual time=0.350..2.419
rows=2362 loops=3)
Output: column1, .. ,column54
Recheck Cond: ((logtable.entrytype = 4000) OR
(logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
Filter: (logtable.tfnlogent_archivestatus <= 1)
Heap Blocks: exact=2095
Buffers: shared hit=2109
Worker 0: actual time=0.030..0.030 rows=0 loops=1
Worker 1: actual time=0.035..0.036 rows=0 loops=1
-> BitmapOr (cost=61.84..61.84 rows=4428 width=0) (actual
time=0.740..0.742 rows=0 loops=1)
Buffers: shared hit=14
-> Bitmap Index Scan on idx_entrytype
(cost=0.00..19.50 rows=1476 width=0) (actual time=0.504..0.504 rows=5475
loops=1)
Index Cond: (logtable.entrytype = 4000)
Buffers: shared hit=7
-> Bitmap Index Scan on idx_entrytype
(cost=0.00..19.50 rows=1476 width=0) (actual time=0.056..0.056 rows=830
loops=1)
Index Cond: (logtable.entrytype = 4001)
Buffers: shared hit=3
-> Bitmap Index Scan on idx_entrytype
(cost=0.00..19.50 rows=1476 width=0) (actual time=0.178..0.179 rows=782
loops=1)
Index Cond: (logtable.entrytype = 4002)
Buffers: shared hit=4
Planning Time: 0.212 ms
Execution Time: 61.692 ms

I've also installed a locally running 12.6 on windows.
Unfortunately I couldn't reproduce the issue. I loaded the data with a tool
that I wrote a few months ago which basically independently from that
database inserts data and creates sequences and indexes.
Query also finishes in like 70 ~ ms. Then I've tried pg_dump into a
different database on the same dev database (where the slow query still
exists). The performance is just as bad on this database and indexes are
also all 300 MB big (whereas on my locally running instance they're at
around 80 MB)
Now I'm trying to insert the data with the same tool I've used for my local
installations on the remote dev database.
This will still take some time so I will update once I have this tested.
Seems like there is something skewed going on with the development database
so far.

Am Fr., 7. Mai 2021 um 11:56 Uhr schrieb Vijaykumar Jain <
vijaykumarjain(dot)github(at)gmail(dot)com>:

> Is this on windows ?
>
> I see a thread that mentions of performance penalty due to parallel worker
>
>
> There is a mailing thread with subject line -
> Huge performance penalty with parallel queries in Windows x64 v. Linux x64
>
>
>
> On Fri, 7 May 2021 at 2:33 PM Semen Yefimenko <semen(dot)yefimenko(at)gmail(dot)com>
> wrote:
>
>> As mentionend in the slack comments :
>>
>> SELECT pg_size_pretty(pg_relation_size('logtable')) as table_size,
>> pg_size_pretty(pg_relation_size('idx_entrytype')) as index_size,
>> (pgstattuple('logtable')).dead_tuple_percent;
>>
>> table_size | index_size | dead_tuple_percent
>> ------------+------------+--------------------
>> 3177 MB | 289 MB | 0
>>
>> I have roughly 6 indexes which all have around 300 MB
>>
>> SELECT pg_relation_size('logtable') as table_size,
>> pg_relation_size(idx_entrytype) as index_size,
>> 100-(pgstatindex('idx_entrytype')).avg_leaf_density as bloat_ratio
>>
>> table_size | index_size | bloat_ratio
>> ------------+------------+-------------------
>> 3331694592 | 302555136 | 5.219999999999999
>>
>> Your queries:
>>
>> n_live_tup n_dead_tup
>> 14118380 0
>>
>>
>> For testing, I've also been running VACUUM and ANALYZE pretty much before
>> every test run.
>>
>> Am Fr., 7. Mai 2021 um 10:44 Uhr schrieb Vijaykumar Jain <
>> vijaykumarjain(dot)github(at)gmail(dot)com>:
>>
>>> ok one last thing, not to be a PITA, but just in case if this helps.
>>>
>>> postgres=# SELECT * FROM pg_stat_user_indexes where relname =
>>> 'logtable'; postgres=# SELECT * FROM pg_stat_user_tables where relname =
>>> 'logtable';
>>> basically, i just to verify if the table is not bloated.
>>> looking at *n_live_tup* vs *n_dead_tup* would help understand it.
>>>
>>> if you see too many dead tuples,
>>> vacuum (ANALYZE,verbose) logtable; -- would get rid of dead tuples (if
>>> there are no tx using the dead tuples)
>>>
>>> and then run your query.
>>>
>>> Thanks,
>>> Vijay
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Fri, 7 May 2021 at 13:34, Semen Yefimenko <semen(dot)yefimenko(at)gmail(dot)com>
>>> wrote:
>>>
>>>> Sorry if I'm cumulatively answering everyone in one E-Mail, I'm not
>>>> sure how I'm supposed to do it. (single E-Mails vs many)
>>>>
>>>>
>>>>> Can you try tuning by increasing the shared_buffers slowly in steps
>>>>> of 500MB, and running explain analyze against the query.
>>>>
>>>>
>>>> -- 2500 MB shared buffers - random_page_cost = 1;
>>>> Gather Merge (cost=343085.23..392186.19 rows=420836 width=2542)
>>>> (actual time=2076.329..3737.050 rows=516517 loops=1)
>>>> Output: column1, .. , column54
>>>> Workers Planned: 2
>>>> Workers Launched: 2
>>>> Buffers: shared hit=295446
>>>> -> Sort (cost=342085.21..342611.25 rows=210418 width=2542) (actual
>>>> time=2007.487..2202.707 rows=172172 loops=3)
>>>> Output: column1, .. , column54
>>>> Sort Key: logtable.timestampcol DESC
>>>> Sort Method: quicksort Memory: 65154kB
>>>> Worker 0: Sort Method: quicksort Memory: 55707kB
>>>> Worker 1: Sort Method: quicksort Memory: 55304kB
>>>> Buffers: shared hit=295446
>>>> Worker 0: actual time=1963.969..2156.624 rows=161205 loops=1
>>>> Buffers: shared hit=91028
>>>> Worker 1: actual time=1984.700..2179.697 rows=161935 loops=1
>>>> Buffers: shared hit=92133
>>>> -> Parallel Bitmap Heap Scan on schema.logtable
>>>> (cost=5546.39..323481.21 rows=210418 width=2542) (actual
>>>> time=322.125..1618.971 rows=172172 loops=3)
>>>> Output: column1, .. , column54
>>>> Recheck Cond: ((logtable.entrytype = 4000) OR
>>>> (logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
>>>> Filter: (logtable.archivestatus <= 1)
>>>> Heap Blocks: exact=110951
>>>> Buffers: shared hit=295432
>>>> Worker 0: actual time=282.201..1595.117 rows=161205
>>>> loops=1
>>>> Buffers: shared hit=91021
>>>> Worker 1: actual time=303.671..1623.299 rows=161935
>>>> loops=1
>>>> Buffers: shared hit=92126
>>>> -> BitmapOr (cost=5546.39..5546.39 rows=510578 width=0)
>>>> (actual time=199.119..199.119 rows=0 loops=1)
>>>> Buffers: shared hit=1334
>>>> -> Bitmap Index Scan on idx_entrytype
>>>> (cost=0.00..682.13 rows=67293 width=0) (actual time=28.856..28.857
>>>> rows=65970 loops=1)
>>>> Index Cond: (logtable.entrytype = 4000)
>>>> Buffers: shared hit=172
>>>> -> Bitmap Index Scan on idx_entrytype
>>>> (cost=0.00..2223.63 rows=219760 width=0) (actual time=108.871..108.872
>>>> rows=225283 loops=1)
>>>> Index Cond: (logtable.entrytype = 4001)
>>>> Buffers: shared hit=581
>>>> -> Bitmap Index Scan on idx_entrytype
>>>> (cost=0.00..2261.87 rows=223525 width=0) (actual time=61.377..61.377
>>>> rows=225264 loops=1)
>>>> Index Cond: (logtable.entrytype = 4002)
>>>> Buffers: shared hit=581
>>>> Settings: random_page_cost = '1', temp_buffers = '80MB', work_mem =
>>>> '1GB'
>>>> Planning Time: 0.940 ms
>>>> Execution Time: 4188.083 ms
>>>>
>>>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>> -- 3000 MB shared buffers - random_page_cost = 1;
>>>> Gather Merge (cost=343085.23..392186.19 rows=420836 width=2542)
>>>> (actual time=2062.280..3763.408 rows=516517 loops=1)
>>>> Output: column1, .. , column54
>>>> Workers Planned: 2
>>>> Workers Launched: 2
>>>> Buffers: shared hit=295446
>>>> -> Sort (cost=342085.21..342611.25 rows=210418 width=2542) (actual
>>>> time=1987.933..2180.422 rows=172172 loops=3)
>>>> Output: column1, .. , column54
>>>> Sort Key: logtable.timestampcol DESC
>>>> Sort Method: quicksort Memory: 66602kB
>>>> Worker 0: Sort Method: quicksort Memory: 55149kB
>>>> Worker 1: Sort Method: quicksort Memory: 54415kB
>>>> Buffers: shared hit=295446
>>>> Worker 0: actual time=1963.059..2147.916 rows=159556 loops=1
>>>> Buffers: shared hit=89981
>>>> Worker 1: actual time=1949.726..2136.200 rows=158554 loops=1
>>>> Buffers: shared hit=90141
>>>> -> Parallel Bitmap Heap Scan on schema.logtable
>>>> (cost=5546.39..323481.21 rows=210418 width=2542) (actual
>>>> time=340.705..1603.796 rows=172172 loops=3)
>>>> Output: column1, .. , column54
>>>> Recheck Cond: ((logtable.entrytype = 4000) OR
>>>> (logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
>>>> Filter: (logtable.archivestatus <= 1)
>>>> Heap Blocks: exact=113990
>>>> Buffers: shared hit=295432
>>>> Worker 0: actual time=317.918..1605.548 rows=159556
>>>> loops=1
>>>> Buffers: shared hit=89974
>>>> Worker 1: actual time=304.744..1589.221 rows=158554
>>>> loops=1
>>>> Buffers: shared hit=90134
>>>> -> BitmapOr (cost=5546.39..5546.39 rows=510578 width=0)
>>>> (actual time=218.972..218.973 rows=0 loops=1)
>>>> Buffers: shared hit=1334
>>>> -> Bitmap Index Scan on idx_entrytype
>>>> (cost=0.00..682.13 rows=67293 width=0) (actual time=37.741..37.742
>>>> rows=65970 loops=1)
>>>> Index Cond: (logtable.entrytype = 4000)
>>>> Buffers: shared hit=172
>>>> -> Bitmap Index Scan on idx_entrytype
>>>> (cost=0.00..2223.63 rows=219760 width=0) (actual time=119.120..119.121
>>>> rows=225283 loops=1)
>>>> Index Cond: (logtable.entrytype = 4001)
>>>> Buffers: shared hit=581
>>>> -> Bitmap Index Scan on idx_entrytype
>>>> (cost=0.00..2261.87 rows=223525 width=0) (actual time=62.097..62.098
>>>> rows=225264 loops=1)
>>>> Index Cond: (logtable.entrytype = 4002)
>>>> Buffers: shared hit=581
>>>> Settings: random_page_cost = '1', temp_buffers = '80MB', work_mem =
>>>> '1GB'
>>>> Planning Time: 2.717 ms
>>>> Execution Time: 4224.670 ms
>>>>
>>>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>> -- 3500 MB shared buffers - random_page_cost = 1;
>>>> Gather Merge (cost=343085.23..392186.19 rows=420836 width=2542)
>>>> (actual time=3578.155..4932.858 rows=516517 loops=1)
>>>> Output: column1, .. , column54
>>>> Workers Planned: 2
>>>> Workers Launched: 2
>>>> Buffers: shared hit=14 read=295432 written=67
>>>> -> Sort (cost=342085.21..342611.25 rows=210418 width=2542) (actual
>>>> time=3482.159..3677.227 rows=172172 loops=3)
>>>> Output: column1, .. , column54
>>>> Sort Key: logtable.timestampcol DESC
>>>> Sort Method: quicksort Memory: 58533kB
>>>> Worker 0: Sort Method: quicksort Memory: 56878kB
>>>> Worker 1: Sort Method: quicksort Memory: 60755kB
>>>> Buffers: shared hit=14 read=295432 written=67
>>>> Worker 0: actual time=3435.131..3632.985 rows=166842 loops=1
>>>> Buffers: shared hit=7 read=95783 written=25
>>>> Worker 1: actual time=3441.545..3649.345 rows=179354 loops=1
>>>> Buffers: shared hit=5 read=101608 written=20
>>>> -> Parallel Bitmap Heap Scan on schema.logtable
>>>> (cost=5546.39..323481.21 rows=210418 width=2542) (actual
>>>> time=345.111..3042.932 rows=172172 loops=3)
>>>> Output: column1, .. , column54
>>>> Recheck Cond: ((logtable.entrytype = 4000) OR
>>>> (logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
>>>> Filter: (logtable.archivestatus <= 1)
>>>> Heap Blocks: exact=96709
>>>> Buffers: shared hit=2 read=295430 written=67
>>>> Worker 0: actual time=300.525..2999.403 rows=166842
>>>> loops=1
>>>> Buffers: shared read=95783 written=25
>>>> Worker 1: actual time=300.552..3004.859 rows=179354
>>>> loops=1
>>>> Buffers: shared read=101606 written=20
>>>> -> BitmapOr (cost=5546.39..5546.39 rows=510578 width=0)
>>>> (actual time=241.996..241.997 rows=0 loops=1)
>>>> Buffers: shared hit=2 read=1332
>>>> -> Bitmap Index Scan on idx_entrytype
>>>> (cost=0.00..682.13 rows=67293 width=0) (actual time=37.129..37.130
>>>> rows=65970 loops=1)
>>>> Index Cond: (logtable.entrytype = 4000)
>>>> Buffers: shared read=172
>>>> -> Bitmap Index Scan on idx_entrytype
>>>> (cost=0.00..2223.63 rows=219760 width=0) (actual time=131.051..131.052
>>>> rows=225283 loops=1)
>>>> Index Cond: (logtable.entrytype = 4001)
>>>> Buffers: shared hit=1 read=580
>>>> -> Bitmap Index Scan on idx_entrytype
>>>> (cost=0.00..2261.87 rows=223525 width=0) (actual time=73.800..73.800
>>>> rows=225264 loops=1)
>>>> Index Cond: (logtable.entrytype = 4002)
>>>> Buffers: shared hit=1 read=580
>>>> Settings: random_page_cost = '1', temp_buffers = '80MB', work_mem =
>>>> '1GB'
>>>> Planning Time: 0.597 ms
>>>> Execution Time: 5389.811 ms
>>>>
>>>>
>>>> This doesn't seem to have had an effect.
>>>> Thanks for the suggestion.
>>>>
>>>> Have you try of excluding not null from index? Can you give dispersion
>>>>> of archivestatus?
>>>>>
>>>>
>>>> Yes I have, it yielded the same performance boost as :
>>>>
>>>> create index test on logtable(entrytype) where archivestatus <= 1;
>>>>
>>>> I wonder what the old query plan was...
>>>>> Would you include links to your prior correspondance ?
>>>>
>>>>
>>>> So prior Execution Plans are present in the SO.
>>>> The other forums I've tried are the official slack channel :
>>>> https://postgresteam.slack.com/archives/C0FS3UTAP/p1620286295228600
>>>> And SO :
>>>> https://stackoverflow.com/questions/67401792/slow-running-postgresql-query
>>>> But I think most of the points discussed in these posts have already
>>>> been mentionend by you except bloating of indexes.
>>>>
>>>> Oracle is apparently doing a single scan on "entrytype".
>>>>> As a test, you could try forcing that, like:
>>>>> begin; SET enable_bitmapscan=off ; explain (analyze) [...]; rollback;
>>>>> or
>>>>> begin; DROP INDEX idx_arcstatus; explain (analyze) [...]; rollback;
>>>>
>>>>
>>>> I've tried enable_bitmapscan=off but it didn't yield any good results.
>>>>
>>>> -- 2000 MB shared buffers - random_page_cost = 4 - enable_bitmapscan to
>>>> off
>>>> Gather Merge (cost=543949.72..593050.69 rows=420836 width=2542)
>>>> (actual time=7716.031..9043.399 rows=516517 loops=1)
>>>> Output: column1, .., column54
>>>> Workers Planned: 2
>>>> Workers Launched: 2
>>>> Buffers: shared hit=192 read=406605
>>>> -> Sort (cost=542949.70..543475.75 rows=210418 width=2542) (actual
>>>> time=7642.666..7835.527 rows=172172 loops=3)
>>>> Output: column1, .., column54
>>>> Sort Key: logtable.timestampcol DESC
>>>> Sort Method: quicksort Memory: 58803kB
>>>> Worker 0: Sort Method: quicksort Memory: 60376kB
>>>> Worker 1: Sort Method: quicksort Memory: 56988kB
>>>> Buffers: shared hit=192 read=406605
>>>> Worker 0: actual time=7610.482..7814.905 rows=177637 loops=1
>>>> Buffers: shared hit=78 read=137826
>>>> Worker 1: actual time=7607.645..7803.561 rows=167316 loops=1
>>>> Buffers: shared hit=80 read=132672
>>>> -> Parallel Seq Scan on schema.logtable (cost=0.00..524345.70
>>>> rows=210418 width=2542) (actual time=1.669..7189.365 rows=172172 loops=3)
>>>> Output: column1, .., column54
>>>> Filter: ((logtable.acrhivestatus <= 1) AND
>>>> ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR
>>>> (logtable.entrytype = 4002)))
>>>> Rows Removed by Filter: 4533459
>>>> Buffers: shared hit=96 read=406605
>>>> Worker 0: actual time=1.537..7158.286 rows=177637 loops=1
>>>> Buffers: shared hit=30 read=137826
>>>> Worker 1: actual time=1.414..7161.670 rows=167316 loops=1
>>>> Buffers: shared hit=32 read=132672
>>>> Settings: enable_bitmapscan = 'off', temp_buffers = '80MB', work_mem =
>>>> '1GB'
>>>> Planning Time: 0.725 ms
>>>> Execution Time: 9500.928 ms
>>>> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>> 2000 MB shared buffers - random_page_cost = 4 - -- 2000 -- 2000 MB shared
>>>> buffers - random_page_cost = 1 - enable_bitmapscan to off
>>>> Gather Merge (cost=543949.72..593050.69 rows=420836 width=2542)
>>>> (actual time=7519.032..8871.433 rows=516517 loops=1)
>>>> Output: column1, .., column54
>>>> Workers Planned: 2
>>>> Workers Launched: 2
>>>> Buffers: shared hit=576 read=406221
>>>> -> Sort (cost=542949.70..543475.75 rows=210418 width=2542) (actual
>>>> time=7451.958..7649.480 rows=172172 loops=3)
>>>> Output: column1, .., column54
>>>> Sort Key: logtable.timestampcol DESC
>>>> Sort Method: quicksort Memory: 58867kB
>>>> Worker 0: Sort Method: quicksort Memory: 58510kB
>>>> Worker 1: Sort Method: quicksort Memory: 58788kB
>>>> Buffers: shared hit=576 read=406221
>>>> Worker 0: actual time=7438.271..7644.241 rows=172085 loops=1
>>>> Buffers: shared hit=203 read=135166
>>>> Worker 1: actual time=7407.574..7609.922 rows=172948 loops=1
>>>> Buffers: shared hit=202 read=135225
>>>> -> Parallel Seq Scan on schema.logtable (cost=0.00..524345.70
>>>> rows=210418 width=2542) (actual time=2.839..7017.729 rows=172172 loops=3)
>>>> Output: column1, .., column54
>>>> Filter: ((logtable.acrhivestatus <= 1) AND
>>>> ((logtable.entrytype = 4000) OR (logtable.entrytype = 4001) OR
>>>> (logtable.entrytype = 4002)))
>>>> Rows Removed by Filter: 4533459
>>>> Buffers: shared hit=480 read=406221
>>>> Worker 0: actual time=2.628..7006.420 rows=172085 loops=1
>>>> Buffers: shared hit=155 read=135166
>>>> Worker 1: actual time=3.948..6978.154 rows=172948 loops=1
>>>> Buffers: shared hit=154 read=135225
>>>> Settings: enable_bitmapscan = 'off', random_page_cost = '1',
>>>> temp_buffers = '80MB', work_mem = '1GB'
>>>> Planning Time: 0.621 ms
>>>> Execution Time: 9339.457 ms
>>>>
>>>> Have you tune shared buffers enough? Each block is of 8k by default.
>>>>> BTW, please try to reset random_page_cost.
>>>>
>>>>
>>>> Look above.
>>>>
>>>> I will try upgrading the minor version next.
>>>> I will also try setting up a 13.X version locally and import the data
>>>> from 12.2 to 13.X and see if it might be faster.
>>>>
>>>>
>>>> Am Do., 6. Mai 2021 um 23:16 Uhr schrieb Imre Samu <
>>>> pella(dot)samu(at)gmail(dot)com>:
>>>>
>>>>> *> Postgres Version : *PostgreSQL 12.2,
>>>>> > ... ON ... USING btree
>>>>>
>>>>> IMHO:
>>>>> The next minor (bugix&security) release is near ( expected ~ May 13th,
>>>>> 2021 ) https://www.postgresql.org/developer/roadmap/
>>>>> so you can update your PostgreSQL to 12.7 ( + full Reindexing
>>>>> recommended ! )
>>>>>
>>>>> You can find a lot of B-tree index-related fixes.
>>>>> https://www.postgresql.org/docs/12/release-12-3.html Release date:
>>>>> 2020-05-14
>>>>> - Fix possible undercounting of deleted B-tree index pages in VACUUM
>>>>> VERBOSE output
>>>>> - Fix wrong bookkeeping for oldest deleted page in a B-tree index
>>>>> - Ensure INCLUDE'd columns are always removed from B-tree pivot tuples
>>>>> https://www.postgresql.org/docs/12/release-12-4.html
>>>>> - Avoid repeated marking of dead btree index entries as dead
>>>>> https://www.postgresql.org/docs/12/release-12-5.html
>>>>> - Fix failure of parallel B-tree index scans when the index
>>>>> condition is unsatisfiable
>>>>> https://www.postgresql.org/docs/12/release-12-6.html Release date:
>>>>> 2021-02-11
>>>>>
>>>>>
>>>>> > COLLATE pg_catalog."default"
>>>>>
>>>>> You can test the "C" Collation in some columns (keys ? ) ; in
>>>>> theory, it should be faster :
>>>>> "The drawback of using locales other than C or POSIX in PostgreSQL is
>>>>> its performance impact. It slows character handling and prevents ordinary
>>>>> indexes from being used by LIKE. For this reason use locales only if you
>>>>> actually need them."
>>>>> https://www.postgresql.org/docs/12/locale.html
>>>>>
>>>>> https://www.postgresql.org/message-id/flat/CAF6DVKNU0vb4ZeQQ-%3Dagg69QJU3wdjPnMYYrPYY7CKc6iOU7eQ%40mail.gmail.com
>>>>>
>>>>> Best,
>>>>> Imre
>>>>>
>>>>>
>>>>> Semen Yefimenko <semen(dot)yefimenko(at)gmail(dot)com> ezt írta (időpont: 2021.
>>>>> máj. 6., Cs, 16:38):
>>>>>
>>>>>> Hi there,
>>>>>>
>>>>>> I've recently been involved in migrating our old system to SQL Server
>>>>>> and then PostgreSQL. Everything has been working fine so far but now after
>>>>>> executing our tests on Postgres, we saw a very slow running query on a
>>>>>> large table in our database.
>>>>>> I have tried asking on other platforms but no one has been able to
>>>>>> give me a satisfying answer.
>>>>>>
>>>>>> *Postgres Version : *PostgreSQL 12.2, compiled by Visual C++ build
>>>>>> 1914, 64-bit
>>>>>> No notable errors in the Server log and the Postgres Server itself.
>>>>>>
>>>>>> The table structure :
>>>>>>
>>>>>> CREATE TABLE logtable
>>>>>> (
>>>>>> key character varying(20) COLLATE pg_catalog."default" NOT NULL,
>>>>>> id integer,
>>>>>> column3 integer,
>>>>>> column4 integer,
>>>>>> column5 integer,
>>>>>> column6 integer,
>>>>>> column7 integer,
>>>>>> column8 integer,
>>>>>> column9 character varying(128) COLLATE pg_catalog."default",
>>>>>> column10 character varying(2048) COLLATE pg_catalog."default",
>>>>>> column11 character varying(2048) COLLATE pg_catalog."default",
>>>>>> column12 character varying(2048) COLLATE pg_catalog."default",
>>>>>> column13 character varying(2048) COLLATE pg_catalog."default",
>>>>>> column14 character varying(2048) COLLATE pg_catalog."default",
>>>>>> column15 character varying(2048) COLLATE pg_catalog."default",
>>>>>> column16 character varying(2048) COLLATE pg_catalog."default",
>>>>>> column17 character varying(2048) COLLATE pg_catalog."default",
>>>>>> column18 character varying(2048) COLLATE pg_catalog."default",
>>>>>> column19 character varying(2048) COLLATE pg_catalog."default",
>>>>>> column21 character varying(256) COLLATE pg_catalog."default",
>>>>>> column22 character varying(256) COLLATE pg_catalog."default",
>>>>>> column23 character varying(256) COLLATE pg_catalog."default",
>>>>>> column24 character varying(256) COLLATE pg_catalog."default",
>>>>>> column25 character varying(256) COLLATE pg_catalog."default",
>>>>>> column26 character varying(256) COLLATE pg_catalog."default",
>>>>>> column27 character varying(256) COLLATE pg_catalog."default",
>>>>>> column28 character varying(256) COLLATE pg_catalog."default",
>>>>>> column29 character varying(256) COLLATE pg_catalog."default",
>>>>>> column30 character varying(256) COLLATE pg_catalog."default",
>>>>>> column31 character varying(256) COLLATE pg_catalog."default",
>>>>>> column32 character varying(256) COLLATE pg_catalog."default",
>>>>>> column33 character varying(256) COLLATE pg_catalog."default",
>>>>>> column34 character varying(256) COLLATE pg_catalog."default",
>>>>>> column35 character varying(256) COLLATE pg_catalog."default",
>>>>>> entrytype integer,
>>>>>> column37 bigint,
>>>>>> column38 bigint,
>>>>>> column39 bigint,
>>>>>> column40 bigint,
>>>>>> column41 bigint,
>>>>>> column42 bigint,
>>>>>> column43 bigint,
>>>>>> column44 bigint,
>>>>>> column45 bigint,
>>>>>> column46 bigint,
>>>>>> column47 character varying(128) COLLATE pg_catalog."default",
>>>>>> timestampcol timestamp without time zone,
>>>>>> column49 timestamp without time zone,
>>>>>> column50 timestamp without time zone,
>>>>>> column51 timestamp without time zone,
>>>>>> column52 timestamp without time zone,
>>>>>> archivestatus integer,
>>>>>> column54 integer,
>>>>>> column55 character varying(20) COLLATE pg_catalog."default",
>>>>>> CONSTRAINT pkey PRIMARY KEY (key)
>>>>>> USING INDEX TABLESPACE tablespace
>>>>>> )
>>>>>>
>>>>>> TABLESPACE tablespace;
>>>>>>
>>>>>> ALTER TABLE schema.logtable
>>>>>> OWNER to user;
>>>>>>
>>>>>> CREATE INDEX idx_timestampcol
>>>>>> ON schema.logtable USING btree
>>>>>> ( timestampcol ASC NULLS LAST )
>>>>>> TABLESPACE tablespace ;
>>>>>>
>>>>>> CREATE INDEX idx_test2
>>>>>> ON schema.logtable USING btree
>>>>>> ( entrytype ASC NULLS LAST)
>>>>>> TABLESPACE tablespace
>>>>>> WHERE archivestatus <= 1;
>>>>>>
>>>>>> CREATE INDEX idx_arcstatus
>>>>>> ON schema.logtable USING btree
>>>>>> ( archivestatus ASC NULLS LAST)
>>>>>> TABLESPACE tablespace;
>>>>>>
>>>>>> CREATE INDEX idx_entrytype
>>>>>> ON schema.logtable USING btree
>>>>>> ( entrytype ASC NULLS LAST)
>>>>>> TABLESPACE tablespace ;
>>>>>>
>>>>>>
>>>>>> The table contains 14.000.000 entries and has about 3.3 GB of data:
>>>>>> No triggers, inserts per day, probably 5-20 K per day.
>>>>>>
>>>>>> SELECT relname, relpages, reltuples, relallvisible, relkind,
>>>>>> relnatts, relhassubclass, reloptions, pg_table_size(oid) FROM pg_class
>>>>>> WHERE relname='logtable';
>>>>>>
>>>>>> relname
>>>>>> |relpages|reltuples|relallvisible|relkind|relnatts|relhassubclass|reloptions|pg_table_size|
>>>>>>
>>>>>> ------------------|--------|---------|-------------|-------|--------|--------------|----------|-------------|
>>>>>> logtable | 405988| 14091424| 405907|r |
>>>>>> 54|false |NULL | 3326803968|
>>>>>>
>>>>>>
>>>>>> The slow running query:
>>>>>>
>>>>>> SELECT column1,..., column54 where ((entrytype = 4000 or entrytype =
>>>>>> 4001 or entrytype = 4002) and (archivestatus <= 1)) order by timestampcol
>>>>>> desc;
>>>>>>
>>>>>>
>>>>>> This query runs in about 45-60 seconds.
>>>>>> The same query runs in about 289 ms Oracle and 423 ms in SQL-Server.
>>>>>> Now I understand that actually loading all results would take a
>>>>>> while. (about 520K or so rows)
>>>>>> But that shouldn't be exactly what happens right? There should be a
>>>>>> resultset iterator which can retrieve all data but doesn't from the get go.
>>>>>>
>>>>>> With the help of some people in the slack and so thread, I've found a
>>>>>> configuration parameter which helps performance :
>>>>>>
>>>>>> set random_page_cost = 1;
>>>>>>
>>>>>> This improved performance from 45-60 s to 15-35 s. (since we are
>>>>>> using ssd's)
>>>>>> Still not acceptable but definitely an improvement.
>>>>>> Some maybe relevant system parameters:
>>>>>>
>>>>>> effective_cache_size 4GB
>>>>>> maintenance_work_mem 1GB
>>>>>> shared_buffers 2GB
>>>>>> work_mem 1GB
>>>>>>
>>>>>>
>>>>>> Currently I'm accessing the data through DbBeaver (JDBC -
>>>>>> postgresql-42.2.5.jar) and our JAVA application (JDBC -
>>>>>> postgresql-42.2.19.jar). Both use the defaultRowFetchSize=5000 to not load
>>>>>> everything into memory and limit the results.
>>>>>> The explain plan:
>>>>>>
>>>>>> EXPLAIN (ANALYZE, BUFFERS, SETTINGS, VERBOSE)...
>>>>>> (Above Query)
>>>>>>
>>>>>>
>>>>>> Gather Merge (cost=347142.71..397196.91 rows=429006 width=2558)
>>>>>> (actual time=21210.019..22319.444 rows=515841 loops=1)
>>>>>> Output: column1, .. , column54
>>>>>> Workers Planned: 2
>>>>>> Workers Launched: 2
>>>>>> Buffers: shared hit=141487 read=153489
>>>>>> -> Sort (cost=346142.69..346678.95 rows=214503 width=2558)
>>>>>> (actual time=21148.887..21297.428 rows=171947 loops=3)
>>>>>> Output: column1, .. , column54
>>>>>> Sort Key: logtable.timestampcol DESC
>>>>>> Sort Method: quicksort Memory: 62180kB
>>>>>> Worker 0: Sort Method: quicksort Memory: 56969kB
>>>>>> Worker 1: Sort Method: quicksort Memory: 56837kB
>>>>>> Buffers: shared hit=141487 read=153489
>>>>>> Worker 0: actual time=21129.973..21296.839 rows=166776 loops=1
>>>>>> Buffers: shared hit=45558 read=49514
>>>>>> Worker 1: actual time=21114.439..21268.117 rows=165896 loops=1
>>>>>> Buffers: shared hit=45104 read=49506
>>>>>> -> Parallel Bitmap Heap Scan on schema.logtable
>>>>>> (cost=5652.74..327147.77 rows=214503 width=2558) (actual
>>>>>> time=1304.813..20637.462 rows=171947 loops=3)
>>>>>> Output: column1, .. , column54
>>>>>> Recheck Cond: ((logtable.entrytype = 4000) OR
>>>>>> (logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
>>>>>> Filter: (logtable.archivestatus <= 1)
>>>>>> Heap Blocks: exact=103962
>>>>>> Buffers: shared hit=141473 read=153489
>>>>>> Worker 0: actual time=1280.472..20638.620 rows=166776
>>>>>> loops=1
>>>>>> Buffers: shared hit=45551 read=49514
>>>>>> Worker 1: actual time=1275.274..20626.219 rows=165896
>>>>>> loops=1
>>>>>> Buffers: shared hit=45097 read=49506
>>>>>> -> BitmapOr (cost=5652.74..5652.74 rows=520443
>>>>>> width=0) (actual time=1179.438..1179.438 rows=0 loops=1)
>>>>>> Buffers: shared hit=9 read=1323
>>>>>> -> Bitmap Index Scan on idx_entrytype
>>>>>> (cost=0.00..556.61 rows=54957 width=0) (actual time=161.939..161.940
>>>>>> rows=65970 loops=1)
>>>>>> Index Cond: (logtable.entrytype = 4000)
>>>>>> Buffers: shared hit=1 read=171
>>>>>> -> Bitmap Index Scan on idx_entrytype
>>>>>> (cost=0.00..2243.22 rows=221705 width=0) (actual time=548.849..548.849
>>>>>> rows=224945 loops=1)
>>>>>> Index Cond: (logtable.entrytype = 4001)
>>>>>> Buffers: shared hit=4 read=576
>>>>>> -> Bitmap Index Scan on idx_entrytype
>>>>>> (cost=0.00..2466.80 rows=243782 width=0) (actual time=468.637..468.637
>>>>>> rows=224926 loops=1)
>>>>>> Index Cond: (logtable.entrytype = 4002)
>>>>>> Buffers: shared hit=4 read=576
>>>>>> Settings: random_page_cost = '1', search_path = '"$user", schema,
>>>>>> public', temp_buffers = '80MB', work_mem = '1GB'
>>>>>> Planning Time: 0.578 ms
>>>>>> Execution Time: 22617.351 ms
>>>>>>
>>>>>> As mentioned before, oracle does this much faster.
>>>>>>
>>>>>>
>>>>>> -------------------------------------------------------------------------------------------------------------------------
>>>>>> | Id | Operation | Name
>>>>>> | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
>>>>>>
>>>>>> -------------------------------------------------------------------------------------------------------------------------
>>>>>> | 0 | SELECT STATEMENT |
>>>>>> | 6878 | 2491K| | 2143 (1)| 00:00:01 |
>>>>>> | 1 | SORT ORDER BY |
>>>>>> | 6878 | 2491K| 3448K| 2143 (1)| 00:00:01 |
>>>>>> | 2 | INLIST ITERATOR |
>>>>>> | | | | | |
>>>>>> |* 3 | TABLE ACCESS BY INDEX ROWID BATCHED| logtable
>>>>>> | 6878 | 2491K| | 1597 (1)| 00:00:01 |
>>>>>> |* 4 | INDEX RANGE SCAN | idx_entrytype
>>>>>> | 6878 | | | 23 (0)| 00:00:01 |
>>>>>>
>>>>>> -------------------------------------------------------------------------------------------------------------------------
>>>>>>
>>>>>> Is there much I can analyze, any information you might need to
>>>>>> further analyze this?
>>>>>>
>>>>>
>>>
>>> --
>>> Thanks,
>>> Vijay
>>> Mumbai, India
>>>
>> --
> Thanks,
> Vijay
> Mumbai, India
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Justin Pryzby 2021-05-07 16:15:51 Re: Very slow Query compared to Oracle / SQL - Server
Previous Message Rick Otten 2021-05-07 12:50:08 Re: Very slow Query compared to Oracle / SQL - Server