Quick Links

Re: [SQL] Re: lztext and compression ratios...

From:	JanWieck(at)t-online(dot)de (Jan Wieck)
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	PostgreSQL HACKERS <pgsql-hackers(at)postgresql(dot)org>, PostgreSQL GENERAL <pgsql-general(at)postgresql(dot)org>, PostgreSQL SQL <pgsql-sql(at)postgresql(dot)org>
Subject:	Re: [SQL] Re: lztext and compression ratios...
Date:	2000-07-06 21:09:27
Message-ID:	200007062109.XAA19800@hot.jw.home
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general pgsql-hackers pgsql-sql

Tom Lane wrote:
> JanWieck(at)t-online(dot)de (Jan Wieck) writes:
> >> As long as you brought it up: how sure are you that the method you've
> >> used is not subject to any patents?
>
> > Now that you ask for it: I'm not sure. Could be.
>
> >> If you can show that this method uses no ideas not found in zlib,
> >> then I'll feel reassured
>
> > To do so I don't know enough about the algorithms used in
> > zlib. Is there someone out here who could verify that if I
> > detailed enough describe what our compression code does?
>
> After a quick look at the code, I don't think there is anything
> problematic about the data representation or the decompression
> algorithm. The compression algorithm is another story, and it's
> not real well commented :-(. The important issues are how you
> search for matches in the past text and how you decide which match
> is the best one to use. Please update the code comments to describe
> that, and I'll take another look.

Done. You'll find a new section in the top comments.

While writing it I noticed that the algorithm is really
expensive for big items. The history lookup table allocated
is 8 times (on 32 bit architectures) the size of the input.
So if you want to have 1MB compressed, it'll allocate 8MB for
the history. It hit me when I was hunting a bug in the
toaster earlier today. Doing an update to a toasted item of
5MB, resulting in a new value of 10MB, the backend blew up to
290MB of virtual memory - oh boy. I definitely need to make
that smarter.

When I wrote it I never thought about items that big. It was
before we had the idea of TOAST.

This all might open another discussion I'll start in a
separate thread.

Jan

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck(at)Yahoo(dot)com #

In response to

Re: lztext and compression ratios... at 2000-07-06 14:22:55 from Tom Lane

Responses

Re: lztext and compression ratios... at 2000-07-07 05:30:40 from Tom Lane
Re: Re: [SQL] Re: [GENERAL] lztext and compression ratios... at 2000-07-07 13:14:07 from eisentrp

Browse pgsql-general by date

	From	Date	Subject
Next Message	Helge Haugland	2000-07-06 21:12:59	Re: Find all the dates in the calendar week?
Previous Message	Jeffrey A. Rhines	2000-07-06 20:33:20	OUTER JOIN workaround... ideas?

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	The Hermit Hacker	2000-07-06 21:13:31	Re: Article on MySQL vs. Postgres
Previous Message	Pavel Janík ml.	2000-07-06 21:06:09	Re: current CVS: undefined reference to `PGLZ_RAW_SIZE'

Browse pgsql-sql by date

	From	Date	Subject
Next Message	Jan Wieck	2000-07-06 21:31:44	Re: confused by select.
Previous Message	Richard	2000-07-06 21:05:56	Re: confused by select.