Re: [Proposal] Global temporary tables

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, 曾文旌(义从) <wenjing(dot)zwj(at)alibaba-inc(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, 蔡松露(子嘉) <zijia(at)taobao(dot)com>, "Cai, Le" <le(dot)cai(at)alibaba-inc(dot)com>, 萧少聪(铁庵) <shaocong(dot)xsc(at)alibaba-inc(dot)com>
Subject: Re: [Proposal] Global temporary tables
Date: 2020-01-09 15:07:46
Message-ID: 3f360d7b-a46a-79f6-27b6-1b1df8c3ee82@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 06.01.2020 14:01, Tomas Vondra wrote:
> On Mon, Jan 06, 2020 at 01:04:15PM +0800, 曾文旌(义从) wrote:
>> In the previous communication
>>
>> 1 we agreed on the general direction
>> 1.1 gtt use local (private) buffer
>> 1.2 no replica access in first version
>>
>
> OK, good.
>
>> 2 We feel that gtt needs to maintain statistics, but there is no
>> agreement on what it will be done.
>>
>
> I certainly agree GTT needs to maintain statistics, otherwise it'll lead
> to poor query plans. AFAIK the current patch stores the info in a hash
> table in a backend private memory, and I don't see how else to do that
> (e.g. storing it in a catalog would cause catalog bloat).
>
> FWIW this is a reasons why I think just using shared buffers (instead of
> local ones) is not sufficient to support parallel queriesl as proposed
> by Alexander. The workers would not know the stats, breaking planning of
> queries in PARALLEL SAFE plpgsql functions etc.

I do not think that "all or nothing" approach is so good for software
development as for database transactions.
Yes, if we have function in PL/pgSQL which performs queries om temporary
tables, then
parallel workers may build inefficient plan for this queries due to lack
of statistics.
From my point of view this is not a pitfall of GTT but result of lack
of global plan cache in Postgres. And it should be fixed not at GTT level.

Also I never see real use cases with such functions, even in the systems
which using hard temporary tables and stored procedures.
But there are many other real problems with temp tables  (except already
mentioned in this thread).
In PgPro/EE we have fixes for some of them, for example:

1. Do not reserve space in the file for temp relations. Right now append
of relation cause writing zero page to the disk by mdextend.
It cause useless disk IO for temp tables which in most cases fit in
memory and should not be written at disk.

2. Implicitly perform analyze of temp table intermediately after storing
data in it. Usually tables are analyzed by autovacuum in background.
But it doesn't work for temp tables which are not processes by
autovacuum and are accessed immediately after filling them with data and
lack of statistic  may cause
building very inefficient plan. We have online_analyze extension which
force analyze of the table after appending some bulk of data to it.
It can be used for normal table but most of all it is useful for temp
relations.

Unlike hypothetical example with parallel safe function working with
temp tables,
this are real problems observed by some of our customers.
Them are applicable both to local and global temp tables and this is why
I do not want to discuss them in context of GTT.

>
>> 3 Still no one commented on GTT's transaction information processing,
>> they include
>> 3.1 Should gtt's frozenxid need to be care?
>> 3.2 gtt’s clog clean
>> 3.3 How to deal with "too old" gtt data
>>
>
> No idea what to do about this.
>

I wonder what is the specific of GTT here?
The same problem takes place for normal (local) temp tables, doesn't it?

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2020-01-09 15:09:11 Re: Removing pg_pltemplate and creating "trustable" extensions
Previous Message Tom Lane 2020-01-09 15:06:06 Re: Recognizing superuser in pg_hba.conf