Quick Links

Re: pgstattuple extension for indexes

From:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To:	Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc:	ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: pgstattuple extension for indexes
Date:	2006-08-17 19:54:20
Message-ID:	20060817195419.GD21363@pervasive.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-patches

On Thu, Aug 17, 2006 at 02:23:48PM +0200, Martijn van Oosterhout wrote:
> On Thu, Aug 17, 2006 at 12:55:28PM +0900, ITAGAKI Takahiro wrote:
> > I think this condition should be regarded as full-fragmented,
> > but pgstatindex reports that the leaf_fragmentation is 50%.
> >
> > Presently, fragmentation factor is computed as the code below:
> >
> > if (opaque->btpo_next != P_NONE && opaque->btpo_next < blkno)
> > stat->fragments++;
> >
> > But the method has the above problem. So I suggest to use whether
> > the right link points to the next adjacent page or not.
> >
> > if (opaque->btpo_next != P_NONE && opaque->btpo_next != blkno + 1)
> > stat->fragments++;
> >
> > Do you think which method is better? Or do you have other ideas?
>
> If we do it your way, then every index will probably get a
> fragmentation of nearly 100%. That's not very useful. I don't agree
> that your example is fully fragmented, since on the first read the OS
> will read the next four (or more) blocks so all the others are
> zero-cost.

Ok, fine... expand the example out to an index that's not trivial in
size. Even with read-ahead, once you get to a few megs (which is
obviously not that big), you're seeking.

> A more useful definition of fragmentation would be: if you're scanning
> forward through an index, how often do you have to jump backwards
> again. The current calculation seems to get that fairly right...

Well, what really matters is how often you'll need to seek (either
intra-track or inter-track). Any time you need to seek you're hit with a
pretty serious penalty. And until we have an asyncronous prefetch
process, a forward seek will most likely be just as expensive as a
backwards seek, because by the time the data winds it's way from the
drive back to PostgreSQL and the next read request winds it's way back
down to the drive the data you wanted has probably flown past the head.
Granted, I'm ignoring OS read-ahead here, but in a heavily fragmented
index that you're actually reading off disk (ie: it's not trivially
small), that read-ahead isn't likely to help you too much.

Given all that, I'd argue that it's best to consider any page that isn't
in-order as another fragment.

On another note, now that scans happen at a per-page level, does that
make some kind of online index clustering command a possibility? Another
thought that comes to mind is putting enough brains in the indexes or
the FSM to request free pages that are in a specific region of the index
file. That would allow things to stay less fragmented. Of course a
similar method could be used to try and maintain a table heap in cluster
order, and I suspect that method would probably be a lot easier to
impliment.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

In response to

Re: pgstattuple extension for indexes at 2006-08-17 12:23:48 from Martijn van Oosterhout

Responses

Re: pgstattuple extension for indexes at 2006-08-17 21:10:45 from Martijn van Oosterhout

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jim C. Nasby	2006-08-17 20:15:12	Re: Going for "all green" buildfarm results
Previous Message	Alvaro Herrera	2006-08-17 19:45:10	Re: BugTracker (Was: Re: 8.2 features status)

Browse pgsql-patches by date

	From	Date	Subject
Next Message	Jim C. Nasby	2006-08-17 20:15:12	Re: Going for "all green" buildfarm results
Previous Message	Jie Zhang	2006-08-17 19:42:20	Re: [PATCHES] WIP: bitmap indexes