From: | "ideriha(dot)takeshi(at)fujitsu(dot)com" <ideriha(dot)takeshi(at)fujitsu(dot)com> |
---|---|
To: | 'Konstantin Knizhnik' <k(dot)knizhnik(at)postgrespro(dot)ru>, 'Amit Langote' <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>, 'Thomas Munro' <thomas(dot)munro(at)gmail(dot)com> |
Subject: | RE: Global shared meta cache |
Date: | 2019-11-06 02:55:30 |
Message-ID: | OSAPR01MB1985C6A4FA02294DDF7A1056EA790@OSAPR01MB1985.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
>From: Konstantin Knizhnik [mailto:k(dot)knizhnik(at)postgrespro(dot)ru]
>If the assumption that working set of backend (set of tables accessed by this session)
>is small enough to fit in backend's memory is true, then global meta cache is not
>needed at all: it is enough to limit size of local cache and implement some eviction
>algorithm.
>If data is not found in local cache, then it is loaded from catalog in standard way.
>It is the simplest solution and may be it is good starting point for work in this direction.
Thank you for the reply.
I introduced GUC for users to choose if they want to use this feature or not.
But as you stated, if data size is not so much big, my suggestion does too much and simple threshold is enough.
The idea of threashold has been discussed in another thread, so I'd like to discuss it in that thread.
Though it's not active these days, ideas having been discussed are memory limit, access time limit, and hybrid.
It seems to me that discussion is converged into the idea of eviction by access timestamp.
https://www.postgresql.org/message-id/flat/20161219(dot)201505(dot)11562604(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp
>If there are cases when application need to work with hundreds of tables
>(partitioning?) then we can either store in local cache references to global cache either
>perform two lookups: in local and global caches.
I think this is my target. In case of especially many partitioned table,
and many (more than 100) columns and so many backends, sharing
cache would have more benefits for memory usage and performance
than having only simple threshold.
I did experiment before. One parent table has about 145 interger columns and
this table is partitioned into about 350 child tables in average.
There is 11 parent tables and about 3850 tables in total.
When I did "select * from parent_table" to 11 parent tables, only CacheMemoryContext
consumed about 0.37GB and in case of 100 backends, it consumed about 37GB.
This is because he number of system catalog cache for pg_statistics is very large
(about 577,000 entries). This number is almost same as the number of columns (145) times
the number of tables (3850). (Sorry that the model and figures are not simple to understand.)
By the way, in my current patch there are some redundant codes.
For example, LWLocks are used too much even if you can actually use spin locks.
Another thing is increasing/decreasing reference count of local reference even if
local reference cache doesn't need to be protected.
I'll fix these things and submit statistics about memory usage and performance.
Regards,
Takeshi Ideriha
From | Date | Subject | |
---|---|---|---|
Next Message | Chapman Flack | 2019-11-06 03:00:58 | Re: Should we make scary sounding, but actually routine, errors less scary? |
Previous Message | Thomas Munro | 2019-11-06 02:43:59 | Re: Keep compiler silence (clang 10, implicit conversion from 'long' to 'double' ) |