Re: Implementation of global temporary tables?

From: Zhaomo Yang <zhy001(at)cs(dot)ucsd(dot)edu>
To: pavel(dot)stehule(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org, tgl(at)sss(dot)pgh(dot)pa(dot)us, Kevin(dot)Grittner(at)wicourts(dot)gov, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Implementation of global temporary tables?
Date: 2015-07-08 07:08:50
Message-ID: CA+0EDdACCx8w4nK-wdj-eodbJn4juChnHoUWVMM3u3uhLVPnJw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> more global temp tables are little bit comfortable for developers,
I'd like to emphasize this point. This feature does much more than saving a
developer from issuing a CREATE TEMP TABLE statement in every session. Here
are two common use cases and I'm sure there are more.

(1)
Imagine in a web application scenario, a developer wants to cache some
session information in a temp table. What's more, he also wants to specify
some rules which reference the session information. Without this feature,
the rules will be removed at the end of every session since they depend on
a temporary object. Global temp tables will allow the developer to define
the temp table and the rules once.

(2)
The second case is mentioned by Tom Lane back in 2010 in a thread about
global temp tables.
(http://www.postgresql.org/message-id/9319.1272130283@sss.pgh.pa.us)
"The context that I've seen it come up in is that people don't want to
clutter their functions with
create-it-if-it-doesn't-exist logic, which you have to have given the
current behavior of temp tables."

> 2.a - using on demand created temp tables - most simple solution, but
> doesn't help with catalogue bloating

I've read the thread and people disapprove this approach because of the
potential catalog bloat. However, I'd like to champion it. Indeed, this
approach may have a bloat issue. But for users who needs global temp
tables, they now have to create a new temp table in every session, which
means they already have the bloat problem and presumably they have some
policy to deal with it. In other words, implementing global temp tables by
this approach gives users the same performance, plus the convenience the
feature brings.

The root problem here is that whether "whether having the unoptimized
feature is better than
having no feature at all". Actually, there was a very similar discussion
back in 2009 on global temp tables. Let me borrow Kevin Grittner's and Tom
Lane's arguments here.

Kevin Grittner's argument:

http://www.postgresql.org/message-id/49F82AEA.EE98.0025.0@wicourts.gov
"... If you're saying we can implement the standard's global temporary
tables in a way that performs better than current temporary tables, that's
cool. That would be a nice "bonus" in addition to the application
programmer convenience and having another tick-mark on the standards
compliance charts. Do you think that's feasible? If not, the feature
would be useful to some with the same performance that temporary tables
currently provide."

Tom Lane's arguments:

http://www.postgresql.org/message-id/24110.1241035178@sss.pgh.pa.us
"I'm all for eliminating catalog overheads, if we can find a way to do
that. I don't think that you get to veto implementation of the feature
until we can find a way to optimize it better. The question is not about
whether having the optimization would be better than not having it --- it's
about whether having the unoptimized feature is better than having no
feature at all (which means people have to implement the same behavior by
hand, and they'll *still* not get the optimization)."

There have been several threads here discussing global temp table since
2007. Quite a few ideas aimed to avoid the bloat issue by not storing the
metadata of the session copy in the catalog. However, it seems that none of
them has been implemented, or even has a feasible design. So why don't we
implement it in a unoptimized way first?

> Is there still interest about this feature?
I'm very interested in this feature. I'm thinking about one implementation
which is similar to Pavel's 2009 proposal (
http://www.postgresql.org/message-id/162867790904271344s1ec96d90j6cde295fdcc7806f@mail.gmail.com)
Here are the major ideas of my design:

(1)
Creating the cross-session persistent schema as a regular table and
creating session-private temp tables when a session first accesses it.

(2)
For DML queries, The global temp table is overloaded by its session copy
after the relation is opened by an oid or a rangevar. For DDL queries,
which copy is used depends on whether the query needs to access the data or
metadata of the global temp table.

There are more differences between this design and Pavel's 2009 proposal
and I'd like to send a detailed proposal to the mailing list but first I
want to know if our community would accept a global temp table
implementation which provides the same performance as currently temp tables
do.

Thanks,
Zhaomo

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2015-07-08 07:52:46 Re: RFC: replace pg_stat_activity.waiting with something more descriptive
Previous Message Pavel Stehule 2015-07-08 06:35:42 Re: PL/pgSQL, RAISE and error context