Re: Reducing tuple overhead

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, hlinnaka <hlinnaka(at)iki(dot)fi>, Bruce Momjian <bruce(at)momjian(dot)us>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Subject: Re: Reducing tuple overhead
Date: 2015-04-30 13:54:34
Message-ID: CA+TgmobXPQt_a-956+xDC++cYLdihUYfxS6Y+en3a0sTtORNCg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 30, 2015 at 9:46 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> As the index expression contain table columns and all the functions
> or operators used in expression must be IMMUTABLE, won't that
> guarantee to avoid such a situation?

The concern is that they might be labeled as immutable but not
actually behave that way.

The other, related problem is that the ordering operator might start
to return different results than it did at index creation time. For
example, consider a btree index built on a text column. Now consider
'yum update'. glibc gets updated, collation ordering of various
strings change, and now you've got tuples that are in the "wrong
place" in the index, because when the index was built, we thought A <
B, but now we think B < A. You would think the glibc maintainers
might avoid such changes in minor releases, or that the Red Hat guys
would avoid packaging and shipping those changes in minor releases,
but you'd be wrong. We've seen cases where the master and the standby
were both running RHEL X.Y (same X.Y on both servers) but they didn't
agree on the collation definitions, so queries that worked on the
master failed on the slave.

> If not, then I think for such indexes we might need to search
> for tupleoffset in the entire index and mark it as Delete or may be
> for such indexes follow the current mechanism.

I think that *is* the current mechanism.

> I think it will still
> give us lot of benefit in more common cases.

It's hard to figure out exactly what can work here. Aside from
correctness issues, the biggest problem with refinding the index
tuples one-by-one and killing them is that it may suck for bulk
operations. When you delete one tuple from the table, refinding the
index tuples and killing them immediately is probably the smartest
thing you can do. If you postpone it, somebody may be forced to split
the page because it's full, whereas if you do it right away, the next
guy who wants to put a tuple on that page may be able to make it fit.
That's a big win.

But if you want to delete 10 million tuples, doing an index scan for
each one may induce a tremendous amount of random I/O on the index
pages, possibly visiting and dirtying the same pages more than once.
Scanning the whole index for tuples to kill avoids that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2015-04-30 13:59:29 Re: contrib/fuzzystrmatch/dmetaphone.c license
Previous Message Amit Kapila 2015-04-30 13:46:19 Re: Reducing tuple overhead