From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | John Naylor <jcnaylor(at)gmail(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: reducing the footprint of ScanKeyword (was Re: Large writable variables) |
Date: | 2018-12-22 17:20:00 |
Message-ID: | 31489.1545499200@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
John Naylor <jcnaylor(at)gmail(dot)com> writes:
> Using a single file also gave me another idea: Take value and category
> out of ScanKeyword, and replace them with an index into another array
> containing those, which will only be accessed in the event of a hit.
> That would shrink ScanKeyword to 4 bytes (offset, index), further
> increasing locality of reference. Might not be worth it, but I can try
> it after moving on to the core scanner.
I like that idea a *lot*, actually, because it offers the opportunity
to decouple this mechanism from all assumptions about what the
auxiliary data for a keyword is. Basically, we'd redefine
ScanKeywordLookup as having the API "given a string, return a
keyword index if it is a keyword, -1 if it isn't"; then the caller
would use the keyword index to look up the auxiliary data in a table
that it owns, and ScanKeywordLookup doesn't know about at all.
So that leads to a design like this: the master data is in a header
that's just like kwlist.h is today, except now we are thinking of
PG_KEYWORD as an N-argument macro not necessarily exactly 3 arguments.
The Perl script reads that, paying attention only to the first argument
of the macro calls, and outputs a file containing, say,
static const uint16 kw_offsets[] = { 0, 6, 15, ... };
static const char kw_strings[] =
"abort\0"
"absolute\0"
...
;
(it'd be a good idea to have a switch that allows specifying the
prefix of these constant names). Then ScanKeywordLookup has the
signature
int ScanKeywordLookup(const char *string_to_lookup,
const char *kw_strings,
const uint16 *kw_offsets,
int num_keywords);
and a file using this stuff looks something like
/* Payload data for keywords */
typedef struct MyKeyword
{
int16 value;
int16 category;
} MyKeyword;
#define PG_KEYWORD(kwname, value, category) {value, category},
static const MyKeyword MyKeywords[] = {
#include "kwlist.h"
};
/* String lookup table for keywords */
#include "kwlist_d.h"
/* Lookup code looks about like this: */
kwnum = ScanKeywordLookup(str,
kw_strings,
kw_offsets,
lengthof(kw_offsets));
if (kwnum >= 0)
... look into MyKeywords[kwnum] for info ...
Aside from being arguably better from the locality-of-reference
standpoint, this gets us out of the weird ifdef'ing you've got in
the v2 patch. The kwlist_d.h headers can be very ordinary headers.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2018-12-22 18:14:20 | Re: reducing the footprint of ScanKeyword (was Re: Large writable variables) |
Previous Message | Tom Lane | 2018-12-22 16:31:20 | Re: Joins on TID |