From: | "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com> |
---|---|
To: | "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | "Pavan Deolasee" <pavan(dot)deolasee(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: HOT - whats next ? |
Date: | 2007-03-02 15:32:02 |
Message-ID: | 2e78013d0703020732y303c782cy59d6f8f55f1911db@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 3/2/07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> "Pavan Deolasee" <pavan(dot)deolasee(at)enterprisedb(dot)com> writes:
> > - Another problem with the current HOT patch is that it generates
> > tuple level fragmentation while reusing LP_DELETEd items when
> > the new tuple is of smaller size than the original one. Heikki
> > supported using best-fit strategy to reduce the fragmentation
> > and thats worth trying. But ISTM that we can also correct
> > row-level defragmentation whenever we run out of free space
> > and LP_DELETEd tuples while doing UPDATE. Since this does not
> > require moving tuples around, we can do this by a simple EXCLUSIVE
> > lock on the page.
>
> You are mistaken. To move existing tuples requires
> LockBufferForCleanup, the same as VACUUM needs; otherwise some other
> backend might continue to access a tuple it found previously.
I am not suggesting moving tuples around. This is a specific case
of reusing LP_DELETEd tuples. For example, say the HOT-update
chain had two tuples, the first one is of length 100 and next one is
of length 125. When the first becomes dead, we remove it from the
chain and set its LP_DELETE true. Now, this tuple is say reused
to store a tuple of length 80, this results in tuple level fragmentation
of 20 bytes. The information about the original size of the tuple is
lost. Later of when this tuple is also LP_DELETEd, we can not
use it store tuple of size greater than 80, even though there is
unused free space of another 20 bytes.
What I am suggesting is to clean up this fragmentation (only
for LP_DELETEd tuples) by resetting the lp_len of these
tuples to the max possible value. None of the live tuples are
touched.
Btw, I haven't yet implemented this stuff, so I am seeking
opinions.
How much testing of this patch's concurrent behavior has been done?
> I'm wondering if any other locking thinkos are in there ...
I have tested it on pgbench with maximum 90 clinets and 90
scaling factor, with 50000 txns/client (please see my another
post of preliminary results). I have done this quite a few time.
Not that I am saying there are no bugs, but I have good
confidence in the patch. These tests are done on SMP
machines. I also run data consistency checks at the end
of pgbench runs to validate the UPDATEs.
I also ran 4 hour DBT2 tests 3-4 times, not seen any failures.
I would appreciate if there are any independent tests, may be
in different setups.
Thanks,
Pavan
--
EnterpriseDB http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Heikki Linnakangas | 2007-03-02 15:41:56 | Re: UPSERT |
Previous Message | Tom Lane | 2007-03-02 15:13:18 | Re: UPSERT |