Tightly packing expressions

From: Douglas Doole <dougdoole(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Tightly packing expressions
Date: 2017-08-03 18:01:06
Message-ID: CADE5jY+p2c-vPPK4zG1EyJz=Ttsak3WB+mCfYOAYMxb=ZZSxbQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andres,
Back when you were getting ready to commit your faster expressions, I
volunteered to look at switching from an array of fixed sized steps to
tightly packed structures. (
https://www.postgresql.org/message-id/20170314221648.jrcgh5n7ld4ej2o7@alap3.anarazel.de)
I've finally found time to make it happen.

Using tightly packed structures makes the expressions a lot smaller. I
instrumented some before and after builds and ran "make check" just to see
how much memory expressions were using. What I found was:

There were ~104K expressions.

Memory - bytes needed to hold the steps of the expressions
Allocated Memory - memory is allocated in blocks, this is the bytes
allocated to hold the expressions
Wasted Memory - the difference between the allocated and used memory

Original code:
Memory: min=64 max=9984 total=28232512 average=265
Allocated Memory: min=1024 max=16384 total=111171584 average=1045
Wasted Memory: min=0 max=6400 total=82939072 average=780

New code:
Memory: min=32 (50%) max=8128 (82%) total=18266712 (65%) average=175 (66%)
Allocated Memory: min=192 (19%) max=9408 (57%) total=29386176 (26%)
average=282 (27%)
Wasted Memory: min=0 (0%) max=1280 (20%) total=11119464 (13%) average=106
(14%)

So on average expressions are about a third smaller than with the fixed
size array. The new approach doesn't overallocate as badly as the array
approach so the packed approach cuts memory consumption by over 70%. (Of
course, the array code could be tuned to reduce the overhead as well.)

Another benefit of the way I did this is that the expression memory is
never reallocated. This means that it is safe for one step to point
directly to a field in another step without needing to separately palloc
storage. That should allow us to simplify the code and reduce pointer
traversals. (I haven't exploited this in the patch. I figured it would be a
task for round 2 assuming you like what I've done here.)

The work was mostly mechanical. The only tricky bit was dealing with the
places where you jump to step n+1 while building step n. Since we can't
tell the address of step n+1 until it is actually built, I had to defer
resolution of the jumps. So all the interesting bits of this patch are in
ExprEvalPushStep().

Let me know what you think.

- Doug
Salesforce

Attachment Content-Type Size
exprBuild.patch text/x-patch 257.4 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-08-03 18:01:46 Re: Unused variable scanned_tuples in LVRelStats
Previous Message Robert Haas 2017-08-03 17:45:02 Re: Add Roman numeral conversion to to_number