From: | Hannu Krosing <hannu(at)skype(dot)net> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Zeugswetter Andreas DCP SD <ZeugswetterA(at)spardat(dot)at>, Greg Stark <gsstark(at)mit(dot)edu>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Rod Taylor <pg(at)rbt(dot)ca>, "Bort, Paul" <pbort(at)tmwsystems(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Compression and on-disk sorting |
Date: | 2006-05-19 19:02:50 |
Message-ID: | 1148065370.3833.9.camel@localhost.localdomain |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Ühel kenal päeval, R, 2006-05-19 kell 14:53, kirjutas Tom Lane:
> "Jim C. Nasby" <jnasby(at)pervasive(dot)com> writes:
> > On Fri, May 19, 2006 at 09:29:03AM +0200, Martijn van Oosterhout wrote:
> >> I'm seeing 250,000 blocks being cut down to 9,500 blocks. That's almost
> >> unbeleiveable. What's in the table? It would seem to imply that our
> >> tuple format is far more compressable than we expected.
>
> > It's just SELECT count(*) FROM (SELECT * FROM accounts ORDER BY bid) a;
> > If the tape routines were actually storing visibility information, I'd
> > expect that to be pretty compressible in this case since all the tuples
> > were presumably created in a single transaction by pgbench.
>
> It's worse than that: IIRC what passes through a heaptuple sort are
> tuples manufactured by heap_form_tuple, which will have consistently
> zeroed header fields. However, the above isn't very helpful since the
> rest of us have no idea what that "accounts" table contains. How wide
> is the tuple data, and what's in it?
Was he not using pg_bench data ?
> (This suggests that we might try harder to strip unnecessary header info
> from tuples being written to tape inside tuplesort.c. I think most of
> the required fields could be reconstructed given the TupleDesc.)
I guess that tapefiles compress better than averahe table because they
are sorted, and thus at least a little more repetitive than the rest.
If there are varlen types, then they usually also have abundance of
small 4-byte integers, which should also compress at least better than
4/1, maybe a lot better.
--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia
Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com
From | Date | Subject | |
---|---|---|---|
Next Message | Jim C. Nasby | 2006-05-19 19:03:13 | Re: [HACKERS] Toward A Positive Marketing Approach. |
Previous Message | Jim C. Nasby | 2006-05-19 18:58:02 | Re: [HACKERS] [OT] MySQL is bad, but THIS bad? |