From: | "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> |
---|---|
To: | "Patches" <pgsql-patches(at)postgresql(dot)org> |
Subject: | tsearch refactorings |
Date: | 2007-09-03 11:46:10 |
Message-ID: | 46DBF402.4030301@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-patches |
I've been slowly reading through the tsearch code and refactoring things
as I go. Here's a patch of what I've done this far. It has grown larger
than I intended, but it has helped me a lot to understand the code.
I hope I didn't break anything; at least it passes regression tests. If
there's some changes that others don't agree with or feel uncomfortable
doing for 8.3, let me know and I can adjust the patch.
The usage of the QueryItem struct was very confusing. It was used for
both operators and operands. For operators, "val" was a single character
casted to a int4, marking the operator type. For operands, val was the
CRC-32 of the value. Other fields were used only either for operands or
for operators. The biggest change in the patch is that I broke the
QueryItem struct into QueryOperator and QueryOperand. Type was really
the only common field between them. QueryItem still exists, and is used
in the TSQuery struct as before, but it's now a union of the two. Many
other changes fell from that, like separation of pushval_asis function
into pushValue, pushOperator and pushStop.
Other changes include:
- Moved some structs that were for internal use only from header files
to the right .c-files.
- Moved tsvector parser to a new tsvector_parser.c file. Parser code was
about half of the size of tsvector.c, it's also used from tsquery.c, and
it has some data structures of its own, so it seems better to separate
it. Cleaned up the API so that TSVectorParserState is not accessed from
outside tsvector_parser.c.
- Separated enumerations (#defines, really) used for QueryItem.type
field and as return codes from gettoken_query. It was just accidental
code sharing.
- Removed ParseQueryNode struct used internally by makepol and friends.
push*-functions now construct QueryItems directly.
- Changed int4 variables to just ints for variables like "i" or "array
size", where the storage-size was not significant.
- Probably a lot of other stuff I can't remember right now
I have a few more things in mind I'm planning to look into:
- I'm not convinced that there's enough sanity checking in the input
functions. I think you can send queries into the server using the binary
protocol that you couldn't produce otherwise. For example, queries with
multiple VAL nodes, with no operator to connect them. Does that wreak
havoc in any of the functions? And the left-field of QueryOperator is an
int16, what happens if you have query that exceeds
that? And parse_query always produces trees that are in prefix notation,
so the left-field is really redundant, but using tsqueryrecv, you could
inject queries that are not in prefix notation; is there anything in the
code that depends on that?
- There's many internal intermediate representations of a query:
TSQuery, a QTNode-tree, NODE-tree (in tsquery_cleanup.c), prefix
notation stack of QueryItems (in parser), infix-tree. Could we remove
some of these?
- There's a lot of recursive functions. Are they all using
check_stack_depth?
- More commenting and documenting and little refactorings
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Attachment | Content-Type | Size |
---|---|---|
tsearch-refactor-2.patch.gz | application/x-gzip | 26.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2007-09-03 12:37:24 | Re: Lazy xid assignment V3 |
Previous Message | Florian G. Pflug | 2007-09-03 02:18:03 | Lazy xid assignment V3 |