From: | Benedikt Grundmann <bgrundmann(at)janestreet(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Claudio Freire <klaussfreire(at)gmail(dot)com>, Takeshi Yamamuro <yamamuro(dot)takeshi(at)lab(dot)ntt(dot)co(dot)jp>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Improve compression speeds in pg_lzcompress.c |
Date: | 2013-01-09 07:56:12 |
Message-ID: | CADbMkNPrKe2P7Oku=2sNGyLrd8+wQad_YBpvJtmJBtV17Tmf4A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> Personally, my biggest gripe about the way we do compression is that
> it's easy to detoast the same object lots of times. More generally,
> our in-memory representation of user data values is pretty much a
> mirror of our on-disk representation, even when that leads to excess
> conversions. Beyond what we do for TOAST, there's stuff like numeric
> where not only toast but then post-process the results into yet
> another internal form before performing any calculations - and then of
> course we have to convert back before returning from the calculation
> functions. And for things like XML, JSON, and hstore we have to
> repeatedly parse the string, every time someone wants to do anything
> to do. Of course, solving this is a very hard problem, and not
> solving it isn't a reason not to have more compression options - but
> more compression options will not solve the problems that I personally
> have in this area, by and large.
>
> At the risk of saying something totally obvious and stupid as I haven't
looked at the actual representation this sounds like a memoisation
problem. In ocaml terms:
type 'a rep =
| On_disk_rep of Byte_sequence
| In_memory_rep of 'a
type 'a t = 'a rep ref
let get_mem_rep t converter =
match !t with
| On_disk_rep seq ->
let res = converter seq in
t := In_memory_rep res;
res
| In_memory_rep x -> x
;;
... (if you need the other direction that it's straightforward too)...
Translating this into c is relatively straightforward if you have the
luxury of a fresh start
and don't have to be super efficient:
typedef enum { ON_DISK_REP, IN_MEMORY_REP } rep_kind_t;
type t = {
rep_kind_t rep_kind;
union {
char *on_disk;
void *in_memory;
} rep;
};
void *get_mem_rep(t *t, void * (*converter)(char *)) {
void *res;
switch (t->rep_kind) {
case ON_DISK_REP:
res = converter(t->on_disk);
t->rep.in_memory = res;
t->rep_kind = IN_MEMORY_REP;
return res;
case IN_MEMORY_REP;
return t->rep.in_memory;
}
}
Now of course fitting this into the existing types and ensuring that there
is neither too early freeing of memory nor memory leaks or other bugs is
probably a nightmare and why you said that this is a hard problem.
Cheers,
Bene
From | Date | Subject | |
---|---|---|---|
Next Message | Amit kapila | 2013-01-09 08:05:02 | Re: Performance Improvement by reducing WAL for Update Operation |
Previous Message | Shigeru Hanada | 2013-01-09 07:03:44 | Re: PATCH: optimized DROP of multiple tables within a transaction |