From: | Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: eXtensible Transaction Manager API |
Date: | 2015-11-09 08:38:55 |
Message-ID: | 56405B9F.6030406@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 09.11.2015 07:46, Amit Kapila wrote:
> On Sat, Nov 7, 2015 at 10:23 PM, Konstantin Knizhnik
> <k(dot)knizhnik(at)postgrespro(dot)ru <mailto:k(dot)knizhnik(at)postgrespro(dot)ru>> wrote:
>
>
> Lock manager is one of the tasks we are currently working on.
> There are still a lot of open questions:
> 1. Should distributed lock manager (DLM) do something else except
> detection of distributed deadlock?
>
>
> I think so. Basically DLM should be responsible for maintaining
> all the lock information which inturn means that any backend process
> that needs to acquire/release lock needs to interact with DLM, without
> that
> I don't think even global deadlock detection can work (to detect deadlocks
> among all the nodes, it needs to know the lock info of all nodes).
I hope that it will not needed, otherwise it will add significant
performance penalty.
Unless I missed something, locks can still managed locally, but we need
DLM to detect global deadlocks.
Deadlock detection in PostgreSQL is performed only after timeout
expiration for lock waiting. Only then lock graph is analyzed for
presence of loops. At this moment we can send our local lock graph to
DLM. If it is really distributed deadlock, then sooner or later
all nodes involved in this deadlock will send their local graphs to DLM
and DLM will be able to build global graph and detect distributed
deadlock. In this scenario DLM will be accessed very rarely - only when
lock can not be granted within deadlock_timeout.
One of the possible problems is false global deadlock detection, because
local lock graphs corresponds to different moments of time, so we can
found "false" loop. I am still thinking about best solution of this problem.
> This is somewhat inline with what currently we do during lock conflicts,
> i.e if the backend incur a lock conflict and decides to sleep, before
> sleeping it tries to detect an early deadlock, so if we make DLM
> responsible
> for managing locks the same could be even achieved in distributed system.
>
> 2. Should DLM be part of XTM API or it should be separate API?
>
>
> We might be able to do it either ways (make it part of XTM API or devise a
> separate API). I think here more important point is to first get the
> high level
> design for Distributed Transactions (which primarily includes consistent
> Commit/Rollback, snapshots, distributed lock manager (DLM) and recovery).
>
> 3. Should DLM be implemented by separate process or should it be
> part of arbiter (dtmd).
>
>
> That's important decision. I think it will depend on which kind of design
> we choose for distributed transaction manager (arbiter based solution or
> non-arbiter based solution, something like tsDTM). I think DLM should be
> separate, else arbiter will become hot-spot with respect to contention.
There are pros and contras.
Pros for integrating DLM in DTM:
1. DTM arbiter has information about local to global transaction ID
mapping which may be needed to DLM
2. If my assumptions about DLM are correct, then it will be accessed
relatively rarely and should not cause significant
impact on performance.
Contras:
1. tsDTM doesn't need centralized arbiter but still needs DLM
2. Logically DLM and DTM are independent components
>
> Can you please explain more about tsDTM approach, how timestamps
> are used, what is exactly CSN (is it Commit Sequence Number) and
> how it is used in prepare phase? IS CSN used as timestamp?
> Is the coordinator also one of the PostgreSQL instance?
In tsDTM approach system time (in microseconds) is used as CSN (commit
sequence number).
We also enforce that assigned CSN is unique and monotonic within
PostgreSQL instance.
CSN are assigned locally and do not require interaction with some other
cluster nodes.
This is why in theory tsDTM approach should provide good scalability.
>>
>> I think in this patch, it is important to see the completeness of
>> all the
>> API's that needs to be exposed for the implementation of distributed
>> transactions and the same is difficult to visualize without
>> having complete
>> picture of all the components that has some interaction with the
>> distributed
>> transaction system. On the other hand we can do it in
>> incremental fashion
>> as and when more parts of the design are clear.
>
> That is exactly what we are going to do - we are trying to
> integrate DTM with existed systems (pg_shard, postgres_fdw, BDR)
> and find out what is missed and should be added. In parallel we
> are trying to compare efficiency and scalability of different
> solutions.
>
>
> One thing, I have noticed that in DTM approach, you seems to
> be considering a centralized (Arbiter) transaction manager and
> centralized Lock manager which seems to be workable approach,
> but I think such an approach won't scale in write-heavy or mixed
> read-write workload. Have you thought about distributing responsibility
> of global transaction and lock management?
From my point of view there are two different scenarios:
1. When most transactions are local and only few of them are global (for
example most operation in branch of the back are
performed with accounts of clients of this branch, but there are few
transactions involved accounts from different branches).
2. When most or all transactions are global.
It seems to me that first approach is more popular in real life and
actually good performance of distributed system can be achieved only in
case when most transaction are local (involves only one node). There are
several approaches allowing to optimize local transactions. For example
once used in SAP HANA
(http://pi3.informatik.uni-mannheim.de/~norman/dsi_jour_2014.pdf)
We have also DTM implementation based on this approach but it is not yet
working.
If most of transaction are global, them affect random subsets of cluster
nodes (so it is not possible to logically split cluster into groups of
tightly coupled nodes) and number of nodes is not very large (<10) then
I do not think that there can be better alternative (from performance
point of view) than centralized arbiter.
But it is only my speculations and it will be really very interesting
for me to know access patterns of real customers, using distributed systems.
> I think such a system
> might be somewhat difficult to design, but the scalability will be better.
>
>
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com/>
From | Date | Subject | |
---|---|---|---|
Next Message | Albe Laurenz | 2015-11-09 08:50:56 | Re: Git cartoon |
Previous Message | Amit Kapila | 2015-11-09 07:41:47 | Re: Parallel Seq Scan |