Quick Links

Planning a change of representation in the planner

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	pgsql-hackers(at)postgreSQL(dot)org
Subject:	Planning a change of representation in the planner
Date:	2003-02-07 04:35:48
Message-ID:	23775.1044592548@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Currently, the planner spends a good deal of time pushing around lists
of small integers, because it uses such lists to identify join
relations. For example, given SELECT ... FROM a, b, c WHERE ...
the list (1,2) (or equivalently (2,1)) would represent the join of
a and b.

This representation is pretty clearly a hangover from the days when the
Postgres planner was written in Lisp :-(. It's inefficient --- common
operations like union, intersection, and is-subset tests require O(N^2)
steps. And it's error-prone: I just had my nose rubbed once again in
the nasty things that happen if you accidentally get some duplicate
entries in a relation ID list. (It's nasty because some but not all of
the low-level list-as-set operations depend on the assumption that the
elements of a given list are distinct.)

I'm thinking of replacing this representation by a
variable-length-bitmap representation. Basically it would be like

struct bitmapset {
int nwords; /* number of words in array */
int array[1]; /* really [nwords] */
};

Each array element would hold 32 bits; the integer i is a member of
the set iff (array[i/32] >> (i%32)) & 1 == 1. For sets containing no
elements over 31 (which would account for the vast majority of queries)
only a single word would be needed in the array. Operations like set
union, intersection, and subset test could process 32 bits at a time ---
they'd reduce to trivial C operations like |=, &=, & ~, applied once per
word. There would be a few things that would be slower (like iterating
through the actual integer elements of a set) but AFAICT this
representation is much better suited to the planner's needs than the
list method.

I've been thinking of doing this for a while just on efficiency grounds,
but kept putting it off because I don't expect much of any performance
gain on simple queries. (You need a dozen or so tables in a query
before the inefficiencies of the list representation really start to
hurt.) But tonight I'm thinking I'll do it anyway, because it'll also
be impervious to duplicate-element bugs.

Comments?

regards, tom lane

Responses

Re: Planning a change of representation in the planner at 2003-02-07 08:08:56 from Hannu Krosing

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Rajesh Kumar Mallah	2003-02-07 06:57:12	Re: [OpenFTS-general] Alpha version of contrib/tsearch is available for testing
Previous Message	Thomas O'Dowd	2003-02-07 04:05:36	Wrong charset mappings