From: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
---|---|
To: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
Cc: | Tomas Vondra <tv(at)fuzzy(dot)cz>, Erik Rijkers <er(at)xs4all(dot)nl>, pgsql-hackers(at)postgresql(dot)org, pavel(dot)stehule(at)gmail(dot)com |
Subject: | Re: WIP: index support for regexp search |
Date: | 2012-11-26 19:49:23 |
Message-ID: | CAPpHfdvwGTEXHnxJXe-kne-Sr-DmJ0EiWJVYukgQR9Ophj3z=A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Nov 26, 2012 at 4:55 PM, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com
> wrote:
>
> Great, that top-level comment helped tremendously! I feel enlightened.
>
> I fixed some spelling, formatting etc. trivial stuff while reading through
> the patch, see attached. Below is some feedback on the details:
>
> * I don't like the PG_TRY/CATCH trick. It's not generally safe to catch an
> error, without propagating it further or rolling back the whole
> (sub)transation. It might work in this case, as you're only suppressing
> errors with the special sqlcode that are used in the same file, but it
> nevertheless feels naughty. I believe none of the limits that are being
> checked are strict; it's OK to exceed the limits somewhat, as long as you
> terminate the processing in a reasonable time, in case of pathological
> input. I'd suggest putting an explicit check for the limits somewhere, and
> not rely on ereport(). Something like this, in the code that recurses:
>
> if (trgmCNFA->arcsCount > MAX_RESULT_ARCS ||
> hash_get_num_entries(trgmCNFA-**>states) > MAX_RESULT_STATES)
> {
> trgmCNFA->overflowed = true;
> return;
> }
>
PG_TRY/CATCH trick is replaced with some number of if/return. Code becomes
a bit more complicated, but your notes does matter.
And then check for the overflowed flag at the top level.
>
> * This part of the high-level comment was not clear to me:
>
> * States of the graph produced in the first stage are marked with
>> "keys". Key is a pair
>> * of a "prefix" and a state of the original automaton. "Prefix" is a last
>> * characters. So, knowing the prefix is enough to know what is a trigram
>> when we read some new
>> * character. However, we can know single character of prefix or don't
>> know any
>> * characters of it. Each state of resulting graph have an "enter key"
>> (with that
>> * key we've entered this state) and a set of keys which are reachable
>> without
>> * reading any predictable trigram. The algorithm of processing each state
>> * of resulting graph are so:
>> * 1) Add all keys which achievable without reading of any predictable
>> trigram.
>> * 2) Add outgoing arcs labeled with trigrams.
>> * Step 2 leads to creation of new states and recursively processing
>> them. So,
>> * we use depth-first algorithm.
>>
>
> I didn't understand that. Can you elaborate? It might help to work through
> an example, with some ascii art depicting the graph.
>
This comment is extended with example.
> * It would be nice to add some comments to TrgmCNFA struct, explaining
> which fields are valid at which stages. For example, it seems that 'trgms'
> array is calculated only after building the CNFA, by getTrgmVector()
> function, while arcsCount is updated on the fly, while recursing in the
> getState() function.
>
TrgmCNFA comment is extended with this.
> * What is the representation used for the path matrix? Needs a comment.
>
Comments are added to PackedTrgmPaths and TrgmStatePath.
> * What do the getColorinfo()
getColorInfo comment now references to ColorInfo struct which have comment.
and scanColorMap() functions do?
scanColorMap comment now have description of colormap structure.
> What exactly does a color represent?
This is added to the top comment.
> What's the tradeoff in choosing MAX_COLOR_CHARS?
Comment is added to the macro.
------
With best regards,
Alexander Korotkov.
Attachment | Content-Type | Size |
---|---|---|
trgm-regexp-0.6.patch.gz | application/x-gzip | 14.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2012-11-26 19:54:19 | Re: WIP json generation enhancements |
Previous Message | Hannu Krosing | 2012-11-26 19:46:36 | Re: WIP json generation enhancements |