Re: Reducing tuple overhead

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, hlinnaka <hlinnaka(at)iki(dot)fi>, Bruce Momjian <bruce(at)momjian(dot)us>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Subject: Re: Reducing tuple overhead
Date: 2015-05-01 04:02:29
Message-ID: CAA4eK1+rUvqWPdQGwfudKiAOrPnDYsnJu21=gaNbdG5=rfhpyQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 30, 2015 at 7:24 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Thu, Apr 30, 2015 at 9:46 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:
> > As the index expression contain table columns and all the functions
> > or operators used in expression must be IMMUTABLE, won't that
> > guarantee to avoid such a situation?
>
> The concern is that they might be labeled as immutable but not
> actually behave that way.
>
> The other, related problem is that the ordering operator might start
> to return different results than it did at index creation time. For
> example, consider a btree index built on a text column. Now consider
> 'yum update'. glibc gets updated, collation ordering of various
> strings change, and now you've got tuples that are in the "wrong
> place" in the index, because when the index was built, we thought A <
> B, but now we think B < A. You would think the glibc maintainers
> might avoid such changes in minor releases, or that the Red Hat guys
> would avoid packaging and shipping those changes in minor releases,
> but you'd be wrong. We've seen cases where the master and the standby
> were both running RHEL X.Y (same X.Y on both servers) but they didn't
> agree on the collation definitions, so queries that worked on the
> master failed on the slave.
>

This is quite similar to IMMUTABLE function, but for an operator. I think
guaranteeing the immutable nature for a function is the job of
application/user.
Oracle uses something similar (DETERMINISTIC functions) for function based
indexes and asks user to ensure the DETERMINISTIC property of function.
I think having synchronous delete (delete from index at the same time as
from heap) is one of the main differentiating factor for not having bloat in
indexes in some of the other databases. If we don't want to change the
current property for functions or operators for indexes, then we can have a
new
type of index which can have visibility information and users are advised to
use such an index where they can ensure that functions or operators for
that index are IMMUTABLE. Here, there is one argument that users might or
might not be able to ensure the same, but I think if it can be ensured by
users
of other successful databases, then the same should be possible for
PostgreSQL
users as well, after all this can bring a lot of value on table (avoiding
index bloat)
for users.

> > I think it will still
> > give us lot of benefit in more common cases.
>
> It's hard to figure out exactly what can work here. Aside from
> correctness issues, the biggest problem with refinding the index
> tuples one-by-one and killing them is that it may suck for bulk
> operations. When you delete one tuple from the table, refinding the
> index tuples and killing them immediately is probably the smartest
> thing you can do. If you postpone it, somebody may be forced to split
> the page because it's full, whereas if you do it right away, the next
> guy who wants to put a tuple on that page may be able to make it fit.
> That's a big win.
>

Exactly, I have that big win in my mind.

> But if you want to delete 10 million tuples, doing an index scan for
> each one may induce a tremendous amount of random I/O on the index
> pages, possibly visiting and dirtying the same pages more than once.
> Scanning the whole index for tuples to kill avoids that.
>

Even doing it only from heap might not be cheap.
I think for doing BULK delete (as you described), users of other
databases have some smart ways like:
a.
If possible drop the indexes
b.
instead of deleting the records you don't want, SELECT OUT the
records you do -- drop the old table -- rename new table.
c.
Archive instead of delete Instead of deleting, just set a flag to mark
a row as archived, invalid, deleted, etc. This will avoid the immediate
overhead of index maintenance. Ignore the rows temporarily and let
some other job delete them later. The large downside to this is that it
affects any query on the table.
...
possibly some other ways.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2015-05-01 04:03:14 Re: shared_libperl, shared_libpython
Previous Message Abhijit Menon-Sen 2015-05-01 03:34:19 Re: initdb -S and tablespaces