Re: [Proposal] Global temporary tables

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: 曾文旌(义从) <wenjing(dot)zwj(at)alibaba-inc(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, 蔡松露(子嘉) <zijia(at)taobao(dot)com>, "Cai, Le" <le(dot)cai(at)alibaba-inc(dot)com>, 萧少聪(铁庵) <shaocong(dot)xsc(at)alibaba-inc(dot)com>
Subject: Re: [Proposal] Global temporary tables
Date: 2020-01-09 16:48:29
Message-ID: 20200109164829.mdl3z5xalmaqxs67@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 09, 2020 at 02:17:08PM +0300, Konstantin Knizhnik wrote:
>
>
>On 06.01.2020 8:04, 曾文旌(义从) wrote:
>>In the previous communication
>>
>>1 we agreed on the general direction
>>1.1 gtt use local (private) buffer
>>1.2 no replica access in first version
>>
>>2 We feel that gtt needs to maintain statistics, but there is no agreement on what it will be done.
>>
>>3 Still no one commented on GTT's transaction information processing, they include
>>3.1 Should gtt's frozenxid need to be care?
>>3.2 gtt’s clog clean
>>3.3 How to deal with "too old" gtt data
>>
>>I suggest we discuss further, reach an agreement, and merge the two patches to one.
>>
>
>I also hope that we should come to the common solution for GTT.
>If we do not try to address parallel execution issues and access to
>temp tables at replicas (and I agreed
>that it should be avoided in first version of the patch), then GTT
>patch becomes quite small.
>

Well, that was kinda my goal - making the patch as small as possible by
eliminating bits that are contentious or where we don't know the
solution (like planning for parallel queries).

>The most complex and challenged task is to support GTT for all kind of
>indexes. Unfortunately I can not proposed some good universal solution
>for it.
>Just patching all existed indexes implementation seems to be the only
>choice.
>

I haven't looked at the indexing issue closely, but IMO we need to
ensure that every session sees/uses only indexes on GTT that were
defined before the seesion started using the table.

Can't we track which indexes a particular session sees, somehow?

>Statistic is another important case.
>But once again I do not completely understand why we want to address
>all this issues with statistic in first version of the patch?

I think the question is which "issues with statistic" you mean. I'm sure
we can ignore some of them, e.g. the one with parallel workers not
having any stats (assuming we consider functions using GTT to be
parallel restricted).

>It contradicts to the idea to make this patch as small as possible.

Well, there's "making patch as small as possible" vs. "patch behaving
correctly" trade-off ;-)

>Also it seems to me that everybody agreed that users very rarely
>create indexes for temp tables and explicitly analyze them.

I certainly *disagree* with this.

We often see temporary tables as a fix or misestimates in complex
queries, and/or as a replacement for CTEs with statistics/indexes. In
fact it's a pretty valuable tool when helping customers with complex
queries affected by poor estimates.

>So I think GTT will be useful even with limited support of statistic.
>In my version statistics for GTT is provided by pushing correspondent
>information to backend's cache for pg_statistic table.

I think someone pointed out pushing stuff directly into the cache is
rather problematic, but I don't recall the details.

>Also I provided pg_temp_statistic view for inspecting it by users. The
>idea to make pg_statistic a view which combines statistic of normal
>and temporary tables is overkill from my point of view.
>
>I do not understand why do we need to maintain hash with some extra
>information for GTT in backends memory (as it was done in Wenjing
>patch).
>Also idea to use create extension for accessing this information seems
>to be dubious.
>

I think the extension was more a PoC rather than a final solution.

regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2020-01-09 17:10:28 Re: [HACKERS] pg_shmem_allocations view
Previous Message Stephen Frost 2020-01-09 16:36:38 Re: Recognizing superuser in pg_hba.conf