Quick Links

Re: Query slow for new participants

From:	MichaelDBA <MichaelDBA(at)sqlexec(dot)com>
To:	Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc:	support(at)mekong(dot)be, pgsql-performance(at)postgresql(dot)org
Subject:	Re: Query slow for new participants
Date:	2019-02-26 00:30:18
Message-ID:	5C74889A.3050007@sqlexec.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Regarding shared_buffers, please install the pg_buffercache extension
and run the recommended queries with that extension during high load
times to really get an idea about the right value for shared_buffers.
Let's take the guess work out of it.

Regards,
Michael Vitale

> Justin Pryzby <mailto:pryzby(at)telsasoft(dot)com>
> Monday, February 25, 2019 6:59 PM
> On Tue, Feb 26, 2019 at 12:22:39AM +0100, support(at)mekong(dot)be wrote:
>
>> Hardware
>> Standard DS15 v2 (20 vcpus, 140 GB memory)
>
>> "effective_cache_size" "105GB" "configuration file"
>> "effective_io_concurrency" "200" "configuration file"
>> "maintenance_work_mem" "2GB" "configuration file"
>> "max_parallel_workers" "20" "configuration file"
>> "max_parallel_workers_per_gather" "10" "configuration file"
>> "max_worker_processes" "20" "configuration file"
>> "random_page_cost" "1.1" "configuration file"
>> "shared_buffers" "35GB" "configuration file"
>> "work_mem" "18350kB" "configuration file"
>
> I don't know for sure, but 35GB is very possibly too large shared_buffers. The
> rule of thumb is "start at 25% of RAM" but I think anything over 10-15GB is
> frequently too large, unless you can keep the whole DB in RAM (can you?)
>
>> Table Metadata
>> relname, relpages, reltuples, relallvisible, relkind, relnatts, relhassubclass, reloptions, pg_table_size(oid)
>> "companyarticledb" 6191886 "5.40276e+08" 6188459 "r" 44 false "50737979392"
>
> work_mem could probably benefit from being larger (just be careful that you
> don't end up with 20x parallel workers running complex plans each node of which
> using 100MB work_mem).
>
>> Full Table and Index Schema
>> The difference is very bad for the new company, even on the simplest query
>>
>> SELECT * FROM CompanyArticleDB
>> WHERE CompanyId = '77'
>> AND ArticleId= '7869071'
>
> It sounds to me like the planner thinks that the distribution of companyID and
> articleID are independent, when they're not. For example it think that
> companyID=33 filters out 99% of the rows.
>
>> companyid | integer | | not null |
>> articleid | integer | | not null |
>
>> EXPLAIN (ANALYZE, BUFFERS), not just EXPLAIN
>> SELECT * FROM CompanyArticleDB
>> WHERE CompanyId = '77'
>> AND ArticleId= '7869071'
>> "Index Scan using ix_companyarticledb_company on companyarticledb (cost=0.57..2.80 rows=1 width=193) (actual time=1011.335..1011.454 rows=1 loops=1)"
>> " Index Cond: (companyid = 77)"
>> " Filter: (articleid = 7869071)"
>> " Rows Removed by Filter: 2674361"
>> " Buffers: shared hit=30287"
>
>> Example for another participant, there another index is used.
>> "Index Scan using pk_pricedb on companyarticledb (cost=0.57..2.79 rows=1 width=193) (actual time=0.038..0.039 rows=0 loops=1)"
>> " Index Cond: ((companyid = 39) AND (articleid = 7869071))"
>> " Buffers: shared hit=4"
>
>> I do not know why this participant is different than the others except that
>> it was recently added.
>
> Were the tables ANALYZEd since then ? You could check:
> SELECT * FROM pg_stat_user_tables WHERE relname='companyarticledb';
>
> If you have small number of companyIDs (~100), then the table statistics may
> incldue a most-common-values list, and companies not in the MCV list may end up
> with different query plans, even without correlation issues.
>
> It looks like the NEW company has ~3e6 articles, out of a total ~5e8 articles.
> The planner may think that companyID doesn't exist at all, so scanning the idx
> on companyID will be slightly faster than using the larger, composite index on
> companyID,articleID.
>
> Justin
>
>> Indexes:
>> "pk_pricedb" PRIMARY KEY, btree (companyid, articleid)
>> "EnabledIndex" btree (enabled)
>> "ix_companyarticledb_article" btree (articleid)
>> "ix_companyarticledb_company" btree (companyid)
>> "participantarticlecodeindex" btree (articlecode)
>> "participantdescriptionindex" gin (participantdescription gin_trgm_ops)
>> Foreign-key constraints:
>> "fk_companyarticledb_accountsdb" FOREIGN KEY (modifiedby) REFERENCES accountsdb(id)
>> "fk_companyarticledb_accountsdb1" FOREIGN KEY (createdby) REFERENCES accountsdb(id)
>> "fk_companyarticledb_accountsdb2" FOREIGN KEY (preventioncounselorid) REFERENCES accountsdb(id)
>> "fk_companyarticledb_articledb" FOREIGN KEY (articleid) REFERENCES articledb(id)
>> "fk_companyarticledb_companydb" FOREIGN KEY (companyid) REFERENCES companydb(id)
>> "fk_companyarticledb_interfaceaccountdb" FOREIGN KEY (interfaceaccountid) REFERENCES interfaceaccountdb(id)
>> "fk_companyarticledb_supplieraccountdb" FOREIGN KEY (createdbysupplier) REFERENCES supplieraccountdb(id)
>> "fk_companyarticledb_supplieraccountdb1" FOREIGN KEY (modifiedbysupplier) REFERENCES supplieraccountdb(id)
>

In response to

Re: Query slow for new participants at 2019-02-25 23:59:20 from Justin Pryzby

Browse pgsql-performance by date

	From	Date	Subject
Next Message	kimaidou	2019-02-26 12:54:00	Re: Aggregate and many LEFT JOIN
Previous Message	Justin Pryzby	2019-02-25 23:59:20	Re: Query slow for new participants