Quick Links

Re: Introducing an advanced Frequent Update

From:	"Luke Lonergan" <LLonergan(at)greenplum(dot)com>
To:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Introducing an advanced Frequent Update
Date:	2006-11-06 23:46:01
Message-ID:	3E37B936B592014B978C4415F90D662D04E6EB21@MI8NYCMAIL06.Mi8.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Simon,

Bring it on! We at GP have been evaluating various approaches to index
organized tables which unify index with heap storage to solve some of
the problems you mention. Split index and heap is a big issue in
Postgres and we'd all welcome a good solution to it, even for limited
circumstances like single index organization or the like.

- Luke

> -----Original Message-----
> From: pgsql-hackers-owner(at)postgresql(dot)org
> [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Simon Riggs
> Sent: Monday, November 06, 2006 1:51 PM
> To: pgsql-hackers(at)postgresql(dot)org
> Subject: [HACKERS] Introducing an advanced Frequent Update
> Optimization
>
> EnterpriseDB has been running a research project to improve
> the performance of heavily updated tables. We have a number
> of approaches prototyped and we'd like to discuss the best of
> these now on -hackers for community input and patch
> submission to PostgreSQL core.
>
> The most important step with any proposal is to agree that we
> have an issue that needs improvement, discuss how widespread
> that issue is and find some clear test cases that show up the
> problems. Tests are:
>
> 1. pgbench reveals performance that will degrade over a long period.
>
> 2. DBT-2 reveals performance that will degrade over a long
> period. Many tests over a 2 hour period don't fully show
> this, especially when the test is cafeully tuned.
>
> 3. Some common scenarios in applications are where some rows
> of a table are "hot" from being constantly updated, while
> others are not. An example of such a test case is the
> truckin' test, included here. It's based directly on a
> specific customer application, but its been generalised to
> make sure the underlying design pattern is clear.
>
> These tests reveal the following issues, all of which are known:
>
> - update performs inserts into indexes, as well as into heap blocks
>
> - VACUUM can remove heap blocks easily, but performs much
> worse on indexes, making VACUUM a less good solution. We have
> now been able to speed up index VACUUM, but this require us
> to scan the whole index for correct locking. VACUUM scans the
> whole table, whereas dead rows may well be localised.
> Heap-needs-vacuum-bitmap has been proposed here, but no
> solution currently exists for vacuuming only parts of indexes
> and so proposals for concurrent vacuums are now being considered.
>
> - indexes that have been stretched apart by updates do not
> ever coalesce again and require regular REINDEX, which is not
> yet possible concurrently; the contention caused by this
> would be catastrophic for performance, even if anybody knew
> of a way to do this concurrently.
>
> - There are specific issues with the optimizer's ability to
> understand dead row numbers, which can in some cases lead to
> SeqScan plans that are inappropriate when tables grow because
> of updates. This is a red-herring that can lead to people
> thinking the situation is worse than it is; that needs
> fixing, but the core issues mentioned above remain.
>
> To alleviate these problems we've added features such as WITH
> fillfactor for heaps and table-level autovacuum tuning.
> Tuning all of these features to good effect is an art form
> that is beyond the reasonable for most users. Many internal
> optimizations have been made in this area and as can be seen,
> many are still required to achieve better performance.
>
> The proposal about to be made takes a more radical approach
> and re-examines the architecture of the heap, to allow us to
> consider much faster designs for heavy UPDATEs. Although
> initially radical, the proposal appears to be fully MVCC
> correct, crash safe as well as being much faster under heavy
> updates, while approximately neutral in other cases with no
> major downsides.
>
> Why should we care? The UPDATE case has obvious use-cases in
> a business design pattern I'll call CustomerAccountDebit
> which is pervasive in pay-per-use websites, banks, telephone
> companies, road traffic monitoring etc etc. It's also
> pervasive in Data Warehousing where summary
> tables/materialized views are regularly updated to maintain a
> current picture of spending, movements or any other
> accumulation of event detail. It's everywhere, basically.
>
> Your various viewpoints on the above are welcome, but
> assuming for the moment that you agree so far, we can move
> towards the proposal...
>
> These discussions will likely be lengthy if taken seriously
> and need to cover a range of different topics to ensure we
> cover what we know and ensure we listen to all the feedback
> everybody gives. To that end, I'd like to introduce two
> colleagues of mine to the community, Pavan Deolasee and
> Nikhil Sontakke who have been working hard on developing the
> prototypes and measuring/tuning them respectively.
>
> I would stress that we are not bringing our first prototype
> to the table, but actually design #5. We think you'll be
> interested, but we won't take that for granted.
>
> Our next steps will be to
>
> - discuss various other approaches to the problem, and why we
> are now proposing one specific approach and receive "why dont
> we..." feedback and additional ideas (Simon)
>
> - discuss the proposal in technical depth, explain the
> challenges that remain and ask for feedback and input on
> those, with specific regard to low-level coding (Pavan)
>
> - present details of performance testing done so far (Nikhil)
>
> - explain the measures we have taken to prove the correctness
> of our approach for MVCC, crash safety and PITR (Simon)
>
> Each of these areas will be started as a separate thread of
> discussion on -hackers, to allow us to stay focused on those topics.
>
> But before we do that, any comments on the above?
>
> ---
>
> The truckin test case included here consists of a complex
> update function that is executed by a custom pgbench script,
> on the postgres database.
>
> psql -f truckin.sql postgres
>
> for each test
>
> pgbench -n -f truckin.pgb postgres
>
> --
> Simon Riggs
> EnterpriseDB http://www.enterprisedb.com
>
>

Responses

Block B-tree etc. (was Re: Introducing an advanced Frequent Update) at 2006-11-07 09:51:21 from Heikki Linnakangas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Josh Berkus	2006-11-07 00:28:42	Re: Direct Client Req from ProV-----
Previous Message	David Fetter	2006-11-06 23:44:06	Re: Indicate disabled triggers in \d