Re: AS OF queries

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Greg Stark <stark(at)mit(dot)edu>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: AS OF queries
Date: 2017-12-27 07:29:35
Message-ID: CAMsr+YHz+TPuqeE3oKsbbCVKZEPSr2B06tD-tWWMX10DvFurNA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 25 December 2017 at 15:59, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru
> wrote:

>
>
> On 25.12.2017 06:26, Craig Ringer wrote:
>
> On 24 December 2017 at 04:53, konstantin knizhnik <
> k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
>
>>
>>
>> But what if I just forbid to change recent_global_xmin?
>> If it is stalled at FirstNormalTransactionId and never changed?
>> Will it protect all versions from been deleted?
>>
>
> That's totally impractical, you'd have unbounded bloat and a nonfunctional
> system in no time.
>
> You'd need a mechanism - akin to what we have with replication slots - to
> set a threshold for age.
>
>
> Well, there are systems with "never delete" and "append only" semantic.
> For example, I have participated in SciDB project: database for scientific
> applications.
> One of the key requirements for scientific researches is reproducibility.
> From the database point of view it means that we need to store all raw
> data and never delete it.
>

PostgreSQL can't cope with that for more than 2^31 xacts, you have to
"forget" details of which xacts created/updated tuples and the contents of
deleted tuples, or you exceed our xid limit. You'd need 64-bit XIDs, or a
redo-buffer based heap model (like the zheap stuff) with redo buffers
marked with an xid epoch, or something like that.

> I am not sure that it should be similar with logical replication slot.
>
Here semantic is quite clear: we preserve segments of WAL until them are
> replicated to the subscribers.
>

Er, what?

This isn't to do with restart_lsn. That's why I mentioned *logical*
replication slots.

I'm talking about how they interact with GetOldestXmin using their xmin and
catalog_xmin.

You probably won't want to re-use slots, but you'll want something akin to
that, a transaction age threshold. Otherwise your system has a finite end
date where it can no longer function due to xid count, or if you solve
that, it'll slowly choke on table bloat etc. I guess if you're willing to
accept truly horrible performance...

> With time travel situation is less obscure: we may specify some threshold
> for age - keep data for example for one year.
>

Sure. You'd likely do that by mapping commit timestamps => xids and using
an xid threshold though.

> But unfortunately trick with snapshot (doesn't matter how we setup oldest
> xmin horizon) affect all tables.
>

You'd need to be able to pass more info into HeapTupleSatisfiesMVCC etc. I
expect you'd probably add a new snapshot type (like logical decoding did
with historic snapshots), that has a new Satisfies function. But you'd have
to be able to ensure all snapshot Satisfies callers had the required extra
info - like maybe a Relation - which could be awkward for some call sites.

The user would have to be responsible for ensuring sanity of FK
relationships etc when specifying different snapshots for different
relations.

Per-relation time travel doesn't seem totally impractical so long as you
can guarantee that there is some possible snapshot for which the catalogs
defining all the relations and types are simultaneously valid, i.e. there's
no disjoint set of catalog changes. Avoiding messy performance implications
with normal queries might not even be too bad if you use a separate
snapshot model, so long as you can avoid callsites having to do extra work
in the normal case.

Dealing with dropped columns and rewrites would be a pain though. You'd
have to preserve the dropped column data when you re-projected the rewrite
tuples.

> There is similar (but not the same) problem with logical replication:
> assume that we need to replicate only one small table. But we have to pin
> in WAL all updates of other huge table which is not involved in logical
> replication at all.
>

I don't really see how that's similar. It's concerned with WAL, wheras what
you're looking at is heaps and bloat from old versions. Completely
different, unless you propose to somehow reconstruct data from old WAL to
do historic queries, which would be o_O ...

> Well, I am really not sure about user's demands to time travel. This is
> one of the reasons of initiating this discussion in hackers... May be it is
> not the best place for such discussion, because there are mostly Postgres
> developers and not users...
> At least, from experience of few SciDB customers, I can tell that we
> didn't have problems with schema evolution: mostly schema is simple, static
> and well defined.
> There was problems with incorrect import of data (this is why we have to
> add real delete), with splitting data in chunks (partitioning),...
>

Every system I've ever worked with that has a "static" schema has landed up
not being so static after all.

I'm sure there are exceptions, but if you can't cope with catalog changes
you've excluded the immense majority of users. Even the ones who promise
they don't ever need to change anything ... land up changing things.

> The question is how we should handle such catalog changes if them are
>> happen. Ideally we should not allow to move back beyond this point.
>> Unfortunately it is not so easy to implement.
>>
>
> I think you can learn a lot from studying logical decoding here.
>
>
> Working with multimaster and shardman I have to learn a lot about logical
> replication.
> It is really powerful and flexible mechanism ... with a lot of limitations
> and problems: lack of catalog replication, inefficient bulk insert, various
> race conditions,...
> But I think that time travel and logical replication are really serving
> different goals so require different approaches.
>

Of course. I'm pointing out that we solve the catalog-change problem using
historic snapshots, and THAT is what you'd be wanting to look at. Also what
it does with the rewrite map.

However, you'd have a nightmare of a time getting the syscache to deliver
you different data depending on which table's catalogs you're looking for.
And what if there's some UDT that appears in >1 table with different AS OF
times, but with different definitions at different times? Yuck.

More importantly you can't construct a historic snapshot at some arbitrary
point in time. It depends on the maintenance of state that's done with
logical decoding and xlogreader. So I don't know how you'd construct a
historic snapshot for "June 24 at 2:01 am".

Ignoring concerns with catalog changes sounds convenient but in practice
it's a total waste of time IMO. If nothing else there's temp tables to deal
with.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2017-12-27 07:44:25 Re: Using ProcSignal to get memory context stats from a running backend
Previous Message Robert Haas 2017-12-27 06:47:59 Re: Getting rid of "tuple concurrently updated" elog()s with concurrent DDLs (at least ALTER TABLE)