Re: How to best use 32 15k.7 300GB drives?

From: Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Scott Carey <scott(at)richrelevance(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, Robert Schnabel <schnabelr(at)missouri(dot)edu>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: How to best use 32 15k.7 300GB drives?
Date: 2011-02-04 09:19:13
Message-ID: 4D4BC491.7080908@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

03.02.11 20:42, Robert Haas написав(ла):
> 2011/1/30 Віталій Тимчишин<tivv00(at)gmail(dot)com>:
>> I was thinking if a table file could be deleted if it has no single live
>> row. And if this could be done by vacuum. In this case vacuum on table that
>> was fully updated recently could be almost as good as cluster - any scan
>> would skip such non-existing files really fast. Also almost no disk space
>> would be wasted.
> VACUUM actually already does something along these lines. If there
> are 1 or any larger number of entirely-free pages at the end of a
> table, VACUUM will truncate them away. In the degenerate case where
> ALL pages are entirely-free, this results in zeroing out the file.
>
> The problem with this is that it rarely does much. Consider a table
> with 1,000,000 pages, 50% of which contain live rows. On average, how
> many pages will this algorithm truncate away? Answer: if the pages
> containing live rows are randomly distributed, approximately one.
Yes, but take into account operations on a (by different reasons)
clustered tables, like removing archived data (yes I know, this is best
done with partitioning, but one must still go to a point when he will
decide to use partitioning :) ).
> Your idea of having a set of heaps rather than a single heap is an
> interesting one, but it's pretty much catering to the very specific
> case of a full-table update. I think the code changes needed would be
> far too invasive to seriously contemplate doing it just for that one
> case - although it is an important case that I would like to see us
> improve.
Why do you expect such a invasive code changes? I know little about
postgresql code layering, but what I propose (with changing delete to
truncate) is:
1) Leave tuple addressing as it is now
2) Allow truncated files, treating non-existing part as if it contained
not used tuples
3) Make vacuum truncate file if it has not used tuples at the end.

The only (relatively) tricky thing I can see is synchronizing truncation
with parallel ongoing scan.

Best regards, Vitalii Tymchyshyn

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Greg Smith 2011-02-04 11:33:48 Re: getting the most of out multi-core systems for repeated complex SELECT statements
Previous Message Torsten Zühlsdorff 2011-02-04 08:43:23 Re: [HACKERS] Slow count(*) again...