From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | Zhaomo Yang <zhy001(at)cs(dot)ucsd(dot)edu> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Robert Haas <robertmhaas(at)gmail(dot)com> |
Subject: | Re: Implementation of global temporary tables? |
Date: | 2015-07-09 05:45:06 |
Message-ID: | CAFj8pRDiKOBDViR4767AXS64xw9_3LmY+wkXxYcMgbg2GQpeCw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
2015-07-09 7:32 GMT+02:00 Zhaomo Yang <zhy001(at)cs(dot)ucsd(dot)edu>:
> > I am not sure, if it is not useless work.
>
> I don't understand why an implementation taking approach 2.a would be
> useless. As I said, its performance will be no worse than current temp
> tables and it will provide a lot of convenience to users who need to create
> temp tables in every session.
>
Surely it should be step forward. But you will to have to solve lot of
problems with "duplicated" tables in system catalogue, and still it doesn't
solve the main problem with temporary tables - the bloating catalogue - and
related performance degradation.
Although global temp tables is nice to have feature (for PLpgSQL
developers), we can live without it - and with some patterns and
extensions, we are living well. But the performance issue is not be fixed
by any pattern. So the major motivation for introduction of global temp
tables is performance - from 90%. It should be a primary target to merge
this feature to upstream. I believe, when bloating will be solved, then the
chance to accept this patch will be pretty high.
Regards
Pavel
>
> Thanks,
> Zhaomo
>
> On Tue, Jul 7, 2015 at 11:53 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
> wrote:
>
>> Hi
>>
>>
>> 2015-07-08 9:08 GMT+02:00 Zhaomo Yang <zhy001(at)cs(dot)ucsd(dot)edu>:
>>
>>> > more global temp tables are little bit comfortable for developers,
>>> I'd like to emphasize this point. This feature does much more than
>>> saving a developer from issuing a CREATE TEMP TABLE statement in every
>>> session. Here are two common use cases and I'm sure there are more.
>>>
>>> (1)
>>> Imagine in a web application scenario, a developer wants to cache some
>>> session information in a temp table. What's more, he also wants to specify
>>> some rules which reference the session information. Without this feature,
>>> the rules will be removed at the end of every session since they depend on
>>> a temporary object. Global temp tables will allow the developer to define
>>> the temp table and the rules once.
>>>
>>> (2)
>>> The second case is mentioned by Tom Lane back in 2010 in a thread about
>>> global temp tables.
>>> (http://www.postgresql.org/message-id/9319.1272130283@sss.pgh.pa.us)
>>> "The context that I've seen it come up in is that people don't want to
>>> clutter their functions with
>>> create-it-if-it-doesn't-exist logic, which you have to have given the
>>> current behavior of temp tables."
>>>
>>> > 2.a - using on demand created temp tables - most simple solution, but
>>> > doesn't help with catalogue bloating
>>>
>>> I've read the thread and people disapprove this approach because of the
>>> potential catalog bloat. However, I'd like to champion it. Indeed, this
>>> approach may have a bloat issue. But for users who needs global temp
>>> tables, they now have to create a new temp table in every session, which
>>> means they already have the bloat problem and presumably they have some
>>> policy to deal with it. In other words, implementing global temp tables by
>>> this approach gives users the same performance, plus the convenience the
>>> feature brings.
>>>
>>> The root problem here is that whether "whether having the unoptimized
>>> feature is better than
>>> having no feature at all". Actually, there was a very similar discussion
>>> back in 2009 on global temp tables. Let me borrow Kevin Grittner's and Tom
>>> Lane's arguments here.
>>>
>>> Kevin Grittner's argument:
>>>
>>> http://www.postgresql.org/message-id/49F82AEA.EE98.0025.0@wicourts.gov
>>> "... If you're saying we can implement the standard's global temporary
>>> tables in a way that performs better than current temporary tables, that's
>>> cool. That would be a nice "bonus" in addition to the application
>>> programmer convenience and having another tick-mark on the standards
>>> compliance charts. Do you think that's feasible? If not, the feature
>>> would be useful to some with the same performance that temporary tables
>>> currently provide."
>>>
>>> Tom Lane's arguments:
>>>
>>> http://www.postgresql.org/message-id/24110.1241035178@sss.pgh.pa.us
>>> "I'm all for eliminating catalog overheads, if we can find a way to do
>>> that. I don't think that you get to veto implementation of the feature
>>> until we can find a way to optimize it better. The question is not about
>>> whether having the optimization would be better than not having it --- it's
>>> about whether having the unoptimized feature is better than having no
>>> feature at all (which means people have to implement the same behavior by
>>> hand, and they'll *still* not get the optimization)."
>>>
>>> There have been several threads here discussing global temp table since
>>> 2007. Quite a few ideas aimed to avoid the bloat issue by not storing the
>>> metadata of the session copy in the catalog. However, it seems that none of
>>> them has been implemented, or even has a feasible design. So why don't we
>>> implement it in a unoptimized way first?
>>>
>>
>> I am not sure, if it is not useless work.
>>
>> Now, I am thinking so best implementation of global temp tables is
>> enhancing unlogged tables to have local content. All local data can be
>> saved in session memory. Usually it is less than 2KB with statistic, and
>> you don't need to store it in catalogue. When anybody is working with any
>> table, related data are copied to system cache - and there can be injected
>> a implementation of global temp tables.
>>
>> regards
>>
>> Pavel Stehule
>>
>>
>>>
>>> > Is there still interest about this feature?
>>> I'm very interested in this feature. I'm thinking about one
>>> implementation which is similar to Pavel's 2009 proposal (
>>> http://www.postgresql.org/message-id/162867790904271344s1ec96d90j6cde295fdcc7806f@mail.gmail.com)
>>> Here are the major ideas of my design:
>>>
>>> (1)
>>> Creating the cross-session persistent schema as a regular table and
>>> creating session-private temp tables when a session first accesses it.
>>>
>>> (2)
>>> For DML queries, The global temp table is overloaded by its session copy
>>> after the relation is opened by an oid or a rangevar. For DDL queries,
>>> which copy is used depends on whether the query needs to access the data or
>>> metadata of the global temp table.
>>>
>>> There are more differences between this design and Pavel's 2009 proposal
>>> and I'd like to send a detailed proposal to the mailing list but first I
>>> want to know if our community would accept a global temp table
>>> implementation which provides the same performance as currently temp tables
>>> do.
>>>
>>> Thanks,
>>> Zhaomo
>>>
>>>
>>>
>>>
>>>
>>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Langote | 2015-07-09 05:53:19 | Comment nitpicking in predicate_refuted_by_recurse() |
Previous Message | Zhaomo Yang | 2015-07-09 05:32:26 | Re: Implementation of global temporary tables? |