Re: Very slow Query compared to Oracle / SQL - Server

From: Imre Samu <pella(dot)samu(at)gmail(dot)com>
To: Semen Yefimenko <semen(dot)yefimenko(at)gmail(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Very slow Query compared to Oracle / SQL - Server
Date: 2021-05-06 21:16:45
Message-ID: CAJnEWwmJg1aUbv5h2zsuWD2i0K-Jc052TTCGAspwv2BYejH3xQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

*> Postgres Version : *PostgreSQL 12.2,
> ... ON ... USING btree

IMHO:
The next minor (bugix&security) release is near ( expected ~ May 13th, 2021
) https://www.postgresql.org/developer/roadmap/
so you can update your PostgreSQL to 12.7 ( + full Reindexing
recommended ! )

You can find a lot of B-tree index-related fixes.
https://www.postgresql.org/docs/12/release-12-3.html Release date:
2020-05-14
- Fix possible undercounting of deleted B-tree index pages in VACUUM
VERBOSE output
- Fix wrong bookkeeping for oldest deleted page in a B-tree index
- Ensure INCLUDE'd columns are always removed from B-tree pivot tuples
https://www.postgresql.org/docs/12/release-12-4.html
- Avoid repeated marking of dead btree index entries as dead
https://www.postgresql.org/docs/12/release-12-5.html
- Fix failure of parallel B-tree index scans when the index condition is
unsatisfiable
https://www.postgresql.org/docs/12/release-12-6.html Release date:
2021-02-11

> COLLATE pg_catalog."default"

You can test the "C" Collation in some columns (keys ? ) ; in theory, it
should be faster :
"The drawback of using locales other than C or POSIX in PostgreSQL is its
performance impact. It slows character handling and prevents ordinary
indexes from being used by LIKE. For this reason use locales only if you
actually need them."
https://www.postgresql.org/docs/12/locale.html
https://www.postgresql.org/message-id/flat/CAF6DVKNU0vb4ZeQQ-%3Dagg69QJU3wdjPnMYYrPYY7CKc6iOU7eQ%40mail.gmail.com

Best,
Imre

Semen Yefimenko <semen(dot)yefimenko(at)gmail(dot)com> ezt írta (időpont: 2021. máj.
6., Cs, 16:38):

> Hi there,
>
> I've recently been involved in migrating our old system to SQL Server and
> then PostgreSQL. Everything has been working fine so far but now after
> executing our tests on Postgres, we saw a very slow running query on a
> large table in our database.
> I have tried asking on other platforms but no one has been able to give me
> a satisfying answer.
>
> *Postgres Version : *PostgreSQL 12.2, compiled by Visual C++ build 1914,
> 64-bit
> No notable errors in the Server log and the Postgres Server itself.
>
> The table structure :
>
> CREATE TABLE logtable
> (
> key character varying(20) COLLATE pg_catalog."default" NOT NULL,
> id integer,
> column3 integer,
> column4 integer,
> column5 integer,
> column6 integer,
> column7 integer,
> column8 integer,
> column9 character varying(128) COLLATE pg_catalog."default",
> column10 character varying(2048) COLLATE pg_catalog."default",
> column11 character varying(2048) COLLATE pg_catalog."default",
> column12 character varying(2048) COLLATE pg_catalog."default",
> column13 character varying(2048) COLLATE pg_catalog."default",
> column14 character varying(2048) COLLATE pg_catalog."default",
> column15 character varying(2048) COLLATE pg_catalog."default",
> column16 character varying(2048) COLLATE pg_catalog."default",
> column17 character varying(2048) COLLATE pg_catalog."default",
> column18 character varying(2048) COLLATE pg_catalog."default",
> column19 character varying(2048) COLLATE pg_catalog."default",
> column21 character varying(256) COLLATE pg_catalog."default",
> column22 character varying(256) COLLATE pg_catalog."default",
> column23 character varying(256) COLLATE pg_catalog."default",
> column24 character varying(256) COLLATE pg_catalog."default",
> column25 character varying(256) COLLATE pg_catalog."default",
> column26 character varying(256) COLLATE pg_catalog."default",
> column27 character varying(256) COLLATE pg_catalog."default",
> column28 character varying(256) COLLATE pg_catalog."default",
> column29 character varying(256) COLLATE pg_catalog."default",
> column30 character varying(256) COLLATE pg_catalog."default",
> column31 character varying(256) COLLATE pg_catalog."default",
> column32 character varying(256) COLLATE pg_catalog."default",
> column33 character varying(256) COLLATE pg_catalog."default",
> column34 character varying(256) COLLATE pg_catalog."default",
> column35 character varying(256) COLLATE pg_catalog."default",
> entrytype integer,
> column37 bigint,
> column38 bigint,
> column39 bigint,
> column40 bigint,
> column41 bigint,
> column42 bigint,
> column43 bigint,
> column44 bigint,
> column45 bigint,
> column46 bigint,
> column47 character varying(128) COLLATE pg_catalog."default",
> timestampcol timestamp without time zone,
> column49 timestamp without time zone,
> column50 timestamp without time zone,
> column51 timestamp without time zone,
> column52 timestamp without time zone,
> archivestatus integer,
> column54 integer,
> column55 character varying(20) COLLATE pg_catalog."default",
> CONSTRAINT pkey PRIMARY KEY (key)
> USING INDEX TABLESPACE tablespace
> )
>
> TABLESPACE tablespace;
>
> ALTER TABLE schema.logtable
> OWNER to user;
>
> CREATE INDEX idx_timestampcol
> ON schema.logtable USING btree
> ( timestampcol ASC NULLS LAST )
> TABLESPACE tablespace ;
>
> CREATE INDEX idx_test2
> ON schema.logtable USING btree
> ( entrytype ASC NULLS LAST)
> TABLESPACE tablespace
> WHERE archivestatus <= 1;
>
> CREATE INDEX idx_arcstatus
> ON schema.logtable USING btree
> ( archivestatus ASC NULLS LAST)
> TABLESPACE tablespace;
>
> CREATE INDEX idx_entrytype
> ON schema.logtable USING btree
> ( entrytype ASC NULLS LAST)
> TABLESPACE tablespace ;
>
>
> The table contains 14.000.000 entries and has about 3.3 GB of data:
> No triggers, inserts per day, probably 5-20 K per day.
>
> SELECT relname, relpages, reltuples, relallvisible, relkind, relnatts,
> relhassubclass, reloptions, pg_table_size(oid) FROM pg_class WHERE
> relname='logtable';
>
> relname
> |relpages|reltuples|relallvisible|relkind|relnatts|relhassubclass|reloptions|pg_table_size|
>
> ------------------|--------|---------|-------------|-------|--------|--------------|----------|-------------|
> logtable | 405988| 14091424| 405907|r | 54|false
> |NULL | 3326803968|
>
>
> The slow running query:
>
> SELECT column1,..., column54 where ((entrytype = 4000 or entrytype = 4001
> or entrytype = 4002) and (archivestatus <= 1)) order by timestampcol desc;
>
>
> This query runs in about 45-60 seconds.
> The same query runs in about 289 ms Oracle and 423 ms in SQL-Server.
> Now I understand that actually loading all results would take a while.
> (about 520K or so rows)
> But that shouldn't be exactly what happens right? There should be a
> resultset iterator which can retrieve all data but doesn't from the get go.
>
> With the help of some people in the slack and so thread, I've found a
> configuration parameter which helps performance :
>
> set random_page_cost = 1;
>
> This improved performance from 45-60 s to 15-35 s. (since we are using
> ssd's)
> Still not acceptable but definitely an improvement.
> Some maybe relevant system parameters:
>
> effective_cache_size 4GB
> maintenance_work_mem 1GB
> shared_buffers 2GB
> work_mem 1GB
>
>
> Currently I'm accessing the data through DbBeaver (JDBC -
> postgresql-42.2.5.jar) and our JAVA application (JDBC -
> postgresql-42.2.19.jar). Both use the defaultRowFetchSize=5000 to not load
> everything into memory and limit the results.
> The explain plan:
>
> EXPLAIN (ANALYZE, BUFFERS, SETTINGS, VERBOSE)...
> (Above Query)
>
>
> Gather Merge (cost=347142.71..397196.91 rows=429006 width=2558) (actual
> time=21210.019..22319.444 rows=515841 loops=1)
> Output: column1, .. , column54
> Workers Planned: 2
> Workers Launched: 2
> Buffers: shared hit=141487 read=153489
> -> Sort (cost=346142.69..346678.95 rows=214503 width=2558) (actual
> time=21148.887..21297.428 rows=171947 loops=3)
> Output: column1, .. , column54
> Sort Key: logtable.timestampcol DESC
> Sort Method: quicksort Memory: 62180kB
> Worker 0: Sort Method: quicksort Memory: 56969kB
> Worker 1: Sort Method: quicksort Memory: 56837kB
> Buffers: shared hit=141487 read=153489
> Worker 0: actual time=21129.973..21296.839 rows=166776 loops=1
> Buffers: shared hit=45558 read=49514
> Worker 1: actual time=21114.439..21268.117 rows=165896 loops=1
> Buffers: shared hit=45104 read=49506
> -> Parallel Bitmap Heap Scan on schema.logtable
> (cost=5652.74..327147.77 rows=214503 width=2558) (actual
> time=1304.813..20637.462 rows=171947 loops=3)
> Output: column1, .. , column54
> Recheck Cond: ((logtable.entrytype = 4000) OR
> (logtable.entrytype = 4001) OR (logtable.entrytype = 4002))
> Filter: (logtable.archivestatus <= 1)
> Heap Blocks: exact=103962
> Buffers: shared hit=141473 read=153489
> Worker 0: actual time=1280.472..20638.620 rows=166776 loops=1
> Buffers: shared hit=45551 read=49514
> Worker 1: actual time=1275.274..20626.219 rows=165896 loops=1
> Buffers: shared hit=45097 read=49506
> -> BitmapOr (cost=5652.74..5652.74 rows=520443 width=0)
> (actual time=1179.438..1179.438 rows=0 loops=1)
> Buffers: shared hit=9 read=1323
> -> Bitmap Index Scan on idx_entrytype
> (cost=0.00..556.61 rows=54957 width=0) (actual time=161.939..161.940
> rows=65970 loops=1)
> Index Cond: (logtable.entrytype = 4000)
> Buffers: shared hit=1 read=171
> -> Bitmap Index Scan on idx_entrytype
> (cost=0.00..2243.22 rows=221705 width=0) (actual time=548.849..548.849
> rows=224945 loops=1)
> Index Cond: (logtable.entrytype = 4001)
> Buffers: shared hit=4 read=576
> -> Bitmap Index Scan on idx_entrytype
> (cost=0.00..2466.80 rows=243782 width=0) (actual time=468.637..468.637
> rows=224926 loops=1)
> Index Cond: (logtable.entrytype = 4002)
> Buffers: shared hit=4 read=576
> Settings: random_page_cost = '1', search_path = '"$user", schema, public',
> temp_buffers = '80MB', work_mem = '1GB'
> Planning Time: 0.578 ms
> Execution Time: 22617.351 ms
>
> As mentioned before, oracle does this much faster.
>
>
> -------------------------------------------------------------------------------------------------------------------------
> | Id | Operation | Name |
> Rows | Bytes |TempSpc| Cost (%CPU)| Time |
>
> -------------------------------------------------------------------------------------------------------------------------
> | 0 | SELECT STATEMENT | |
> 6878 | 2491K| | 2143 (1)| 00:00:01 |
> | 1 | SORT ORDER BY | |
> 6878 | 2491K| 3448K| 2143 (1)| 00:00:01 |
> | 2 | INLIST ITERATOR | |
> | | | | |
> |* 3 | TABLE ACCESS BY INDEX ROWID BATCHED| logtable |
> 6878 | 2491K| | 1597 (1)| 00:00:01 |
> |* 4 | INDEX RANGE SCAN | idx_entrytype |
> 6878 | | | 23 (0)| 00:00:01 |
>
> -------------------------------------------------------------------------------------------------------------------------
>
> Is there much I can analyze, any information you might need to further
> analyze this?
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Semen Yefimenko 2021-05-07 08:04:01 Re: Very slow Query compared to Oracle / SQL - Server
Previous Message Alexey M Boltenkov 2021-05-06 20:17:24 Re: Very slow Query compared to Oracle / SQL - Server