From: | JanWieck(at)t-online(dot)de (Jan Wieck) |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | PostgreSQL HACKERS <pgsql-hackers(at)postgresql(dot)org>, PostgreSQL GENERAL <pgsql-general(at)postgresql(dot)org>, PostgreSQL SQL <pgsql-sql(at)postgresql(dot)org> |
Subject: | Re: [SQL] Re: lztext and compression ratios... |
Date: | 2000-07-06 21:09:27 |
Message-ID: | 200007062109.XAA19800@hot.jw.home |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-hackers pgsql-sql |
Tom Lane wrote:
> JanWieck(at)t-online(dot)de (Jan Wieck) writes:
> >> As long as you brought it up: how sure are you that the method you've
> >> used is not subject to any patents?
>
> > Now that you ask for it: I'm not sure. Could be.
>
> >> If you can show that this method uses no ideas not found in zlib,
> >> then I'll feel reassured
>
> > To do so I don't know enough about the algorithms used in
> > zlib. Is there someone out here who could verify that if I
> > detailed enough describe what our compression code does?
>
> After a quick look at the code, I don't think there is anything
> problematic about the data representation or the decompression
> algorithm. The compression algorithm is another story, and it's
> not real well commented :-(. The important issues are how you
> search for matches in the past text and how you decide which match
> is the best one to use. Please update the code comments to describe
> that, and I'll take another look.
Done. You'll find a new section in the top comments.
While writing it I noticed that the algorithm is really
expensive for big items. The history lookup table allocated
is 8 times (on 32 bit architectures) the size of the input.
So if you want to have 1MB compressed, it'll allocate 8MB for
the history. It hit me when I was hunting a bug in the
toaster earlier today. Doing an update to a toasted item of
5MB, resulting in a new value of 10MB, the backend blew up to
290MB of virtual memory - oh boy. I definitely need to make
that smarter.
When I wrote it I never thought about items that big. It was
before we had the idea of TOAST.
This all might open another discussion I'll start in a
separate thread.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck(at)Yahoo(dot)com #
From | Date | Subject | |
---|---|---|---|
Next Message | Helge Haugland | 2000-07-06 21:12:59 | Re: Find all the dates in the calendar week? |
Previous Message | Jeffrey A. Rhines | 2000-07-06 20:33:20 | OUTER JOIN workaround... ideas? |
From | Date | Subject | |
---|---|---|---|
Next Message | The Hermit Hacker | 2000-07-06 21:13:31 | Re: Article on MySQL vs. Postgres |
Previous Message | Pavel Janík ml. | 2000-07-06 21:06:09 | Re: current CVS: undefined reference to `PGLZ_RAW_SIZE' |
From | Date | Subject | |
---|---|---|---|
Next Message | Jan Wieck | 2000-07-06 21:31:44 | Re: confused by select. |
Previous Message | Richard | 2000-07-06 21:05:56 | Re: confused by select. |