Re: [SQL] Re: lztext and compression ratios...

From: JanWieck(at)t-online(dot)de (Jan Wieck)
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL HACKERS <pgsql-hackers(at)postgresql(dot)org>, PostgreSQL GENERAL <pgsql-general(at)postgresql(dot)org>, PostgreSQL SQL <pgsql-sql(at)postgresql(dot)org>
Subject: Re: [SQL] Re: lztext and compression ratios...
Date: 2000-07-06 21:09:27
Message-ID: 200007062109.XAA19800@hot.jw.home
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers pgsql-sql

Tom Lane wrote:
> JanWieck(at)t-online(dot)de (Jan Wieck) writes:
> >> As long as you brought it up: how sure are you that the method you've
> >> used is not subject to any patents?
>
> > Now that you ask for it: I'm not sure. Could be.
>
> >> If you can show that this method uses no ideas not found in zlib,
> >> then I'll feel reassured
>
> > To do so I don't know enough about the algorithms used in
> > zlib. Is there someone out here who could verify that if I
> > detailed enough describe what our compression code does?
>
> After a quick look at the code, I don't think there is anything
> problematic about the data representation or the decompression
> algorithm. The compression algorithm is another story, and it's
> not real well commented :-(. The important issues are how you
> search for matches in the past text and how you decide which match
> is the best one to use. Please update the code comments to describe
> that, and I'll take another look.

Done. You'll find a new section in the top comments.

While writing it I noticed that the algorithm is really
expensive for big items. The history lookup table allocated
is 8 times (on 32 bit architectures) the size of the input.
So if you want to have 1MB compressed, it'll allocate 8MB for
the history. It hit me when I was hunting a bug in the
toaster earlier today. Doing an update to a toasted item of
5MB, resulting in a new value of 10MB, the backend blew up to
290MB of virtual memory - oh boy. I definitely need to make
that smarter.

When I wrote it I never thought about items that big. It was
before we had the idea of TOAST.

This all might open another discussion I'll start in a
separate thread.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck(at)Yahoo(dot)com #

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Helge Haugland 2000-07-06 21:12:59 Re: Find all the dates in the calendar week?
Previous Message Jeffrey A. Rhines 2000-07-06 20:33:20 OUTER JOIN workaround... ideas?

Browse pgsql-hackers by date

  From Date Subject
Next Message The Hermit Hacker 2000-07-06 21:13:31 Re: Article on MySQL vs. Postgres
Previous Message Pavel Janík ml. 2000-07-06 21:06:09 Re: current CVS: undefined reference to `PGLZ_RAW_SIZE'

Browse pgsql-sql by date

  From Date Subject
Next Message Jan Wieck 2000-07-06 21:31:44 Re: confused by select.
Previous Message Richard 2000-07-06 21:05:56 Re: confused by select.