From: | Peter Volk <peterb(dot)volk(at)gmx(dot)net> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Followup Timestamp to timestamp with TZ conversion |
Date: | 2021-07-22 16:36:41 |
Message-ID: | CAFWHPeZWrdDd1zFpp+sS4zOO6c6jbD-cpsfh1KtG=PrcwnfS3w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Tom,
thanks for the reply, I do understand that if a rewrite of the table
needs to be avoided the binary image needs to be the same. Since PG 12
there is an optimisation to avoid a rewrite of timestamp columns if
they are converted to timestamp with tz and the target tz offset is 0
I am referring to the function
ATColumnChangeRequiresRewrite(Node *expr, AttrNumber varattno)
in which the following is checked:
(b/src/backend/commands/tablecmds.c)
else if (IsA(expr, FuncExpr))
{
FuncExpr *f = (FuncExpr *) expr;
switch (f->funcid)
{
case F_TIMESTAMPTZ_TIMESTAMP:
case F_TIMESTAMP_TIMESTAMPTZ:
if (TimestampTimestampTzRequiresRewrite())
return true;
else
expr = linitial(f->args);
break;
default:
return true;
}
and TimestampTimestampTzRequiresRewrite checks if the offset is 0:
(b/src/backend/utils/adt/timestamp.c)
TimestampTimestampTzRequiresRewrite()
*
* Returns false if the TimeZone GUC setting causes timestamp_timestamptz and
* timestamptz_timestamp to be no-ops, where the return value has the same
* bits as the argument. Since project convention is to assume a GUC changes
* no more often than STABLE functions change, the answer is valid that long.
*/
bool
TimestampTimestampTzRequiresRewrite(void)
{
long offset;
if (pg_get_timezone_offset(session_timezone, &offset) && offset == 0)
PG_RETURN_BOOL(false);
PG_RETURN_BOOL(true);
}
So in this case it is already proven that there is a binary equality
between the data types timestamp and timestamp with tz if the offset
is considered with 0. Hence this type of optimisation should / could
also apply to indexes as well as the columns used in partitions
Thanks,
Peter
On Thu, Jul 22, 2021 at 5:29 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Peter Volk <peterb(dot)volk(at)gmx(dot)net> writes:
> > The problem is that I have a 60TB+ PG installation for which we need to
> > modify all of the timestamp columns to timestamp with tz. The data in the
> > columns are already in UTC so we can benefit from the patch listed above.
> > Yet there are 2 cases in which we are having an issue.
>
> > 1) Index rebuilds: The patch is only avoiding a rewrite of the table data
> > but is not avoiding a rebuild of the indexes. Following the logic in the
> > patch above this should also be avoidable under the same condition
>
> I don't think that follows. What we are concerned about when determining
> whether a heap rewrite can be skipped is whether the stored heap entries
> are bit-compatible between the two data types. To decide that an index
> rebuild is not needed, you'd need to further determine that their sort
> orders are equivalent (for a btree index, or who-knows-what semantic
> condition for other types of indexes). We don't attempt to do that,
> so index rebuilds are always needed.
>
> As a thought experiment to prove that this is an issue, suppose that
> somebody invented an unsigned integer type, and made the cast from
> regular int4 follow the rules of a C cast, so that e.g. -1 becomes
> 2^32-1. Given that, an ALTER TYPE from int4 to the unsigned type
> could skip the heap rewrite. But we absolutely would have to rebuild
> any btree index on the column, because the sort ordering of the two
> types is different. OTOH, it's quite likely that a hash index would
> not really need to be rebuilt. So this is a real can of worms and
> we've not cared to open it.
>
> > 2) Partitioned tables with the timestamp as partition column: In this case
> > the current version does not allow a modification of the column data type
> > at all.
>
> PG's partitioning features are still being built out, but I would not
> hold my breath for that specific thing to change. Again, the issue
> is that bit-compatibility of individual values doesn't prove much
> about comparison semantics, so it's not clear that a change of
> data type still allows the value-to-partition assignment to be
> identical. (This is clearly an issue for RANGE partitioning. Maybe
> it could be finessed for HASH or LIST, but you'd still be needing
> semantic assumptions that go beyond mere bit-compatibility of values.)
>
> regards, tom lane
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | tushar | 2021-07-22 17:14:31 | Re: refactoring basebackup.c |
Previous Message | Bauyrzhan Sakhariyev | 2021-07-22 16:24:35 | Re: truncating timestamps on arbitrary intervals |