Quick Links

Clustering with enough work_mem: copy heap in mem?

From:	Scara Maccai <m_lists(at)yahoo(dot)it>
To:	pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject:	Clustering with enough work_mem: copy heap in mem?
Date:	2009-08-20 14:28:12
Message-ID:	943092.19231.qm@web24612.mail.ird.yahoo.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Hi,

I have a table with 15M rows. Table is around 5GB on disk.

Clustering the table takes 5 minutes.

A seq scan takes 20 seconds.

I guess clustering is done using a seq scan on the index and then fetching the proper rows in the heap.
If that's the case, fetching random rows on disk is the cause of the enormous time it takes to cluster the table.

Since I can set work_mem > 5GB. couldn't postgres do something like:

- read the whole table in memory
- access the table in memory instead of the disk when reading the "indexed" data

I mean: there's access exclusive lock on the table while clustering, so I don't see any problem in doing it... this way you could

- avoid sorting (which is what is used in the method "create newtable as select * from oldtable order by mycol", and can be slow with 15M rows, plus in my case uses 8GB of ram...)
- avoid random-reading on disk

Am I missing something or it's just that "hasn't been done yet"?

Responses

Re: Clustering with enough work_mem: copy heap in mem? at 2009-08-20 14:35:25 from Scott Marlowe
Re: Clustering with enough work_mem: copy heap in mem? at 2009-08-20 14:48:24 from Alvaro Herrera

Browse pgsql-general by date

	From	Date	Subject
Next Message	Scott Marlowe	2009-08-20 14:35:25	Re: Clustering with enough work_mem: copy heap in mem?
Previous Message	Tom Lane	2009-08-20 14:20:15	Re: ERROR: could not access file "$libdir/xxid": No such file or directory