From: | "Jehan-Guillaume (ioguix) de Rorthais" <ioguix(at)free(dot)fr> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org, David Fetter <david(at)fetter(dot)org>, Markus Wanner <markus(at)bluegap(dot)ch>, Michael Tharp <gxti(at)partiallystapled(dot)com> |
Subject: | providing tokenized version of parsed SQL script (was: nodeToString format and exporting the SQL parser) |
Date: | 2010-04-27 23:45:34 |
Message-ID: | 4BD7771E.8090805@free.fr |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 25/04/2010 03:02, Tom Lane wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Sat, Apr 24, 2010 at 8:07 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>>> Sounds useful to me, though as a function like suggested in a later
>>> email.
>
>> If tool-builders think this is useful, I have no problem with making
>> it available. It should be suitably disclaimed: "We reserve the right
>> to rip out the entire flex/yacc-based lexer and parser at any time and
>> replace them with a hand-coded system written in Prolog that emits
>> tokenization information only in ASN.1-encoded pig latin. If massive
>> changes in the way this function works - or its complete disappearance
>> - are going to make you grumpy, don't call it."
>
> I'm a bit concerned with the vagueness of the goals here. We started
> with a request to dump out node trees, ie, post-parsing representation;
> but the example use case of syntax highlighting would find that
> representation quite useless. (Example: foo::bar and CAST(foo AS bar)
> yield the same parse tree.)
Well, the tokenizer stuff was actually my understanding of the following
quote from Michael Tharp :
« ... making the internal SQL parser available to clients via a
C-language SQL function. ».
I thought Michael was trying to write a tokenizer based on node tree
returned by raw_parser. As it seems Michael is not even sure about what
he's trying to do, I prefer refocus a bit this thread
> A syntax highlighter might get some use
> out of the lexer-output token stream, but I'm afraid from the proposed
> output that people might be expecting more semantic information than
> the lexer can provide. The lexer doesn't, for example, have any clue
> that some keywords are commands and others aren't; nor any very clear
> understanding about the semantic difference between the tokens '='
> and ';'.
Exact, a proper tokenizer function should be able to give some (simple)
information about the type of each token. That is what I tried to define
in this draft with the "type" field :
=> SELECT pgtokenize($script$
SELECT 1;
UPDATE test SET "a"=2;
$script$);
type | pos | value | line
-------------+-----+----------+------
SQL_COMMAND | 1 | 'SELECT' | 1
CONSTANT | 8 | '1' | 1
DELIMITER | 9 | ';' | 1
SQL_COMMAND | 11 | 'UPDATE' | 2
IDENTIFIER | 18 | 'test' | 2
SQL_KEYWORD | 23 | 'SET' | 2
IDENTIFIER | 27 | '"a"' | 2
OPERATOR | 30 | '=' | 2
CONSTANT | 31 | '1' | 2
>
> Also, if all you want is the lexer, it's not that hard to steal psql's
> version and adapt it to your purposes. The lexer doesn't change very
> fast, and it's not that big either.
Stealing the lexer from psql is possible...for C application.
Don't know yet if we could port it to other languages easily and if a
simple lexer would really answer the use cases here.
>
> Anyway, it certainly wouldn't be hard for an add-on module to provide a
> SRF that calls the lexer (or parser) and returns some sort of tabular
> representation of the results. I'm just not sure how useful it'll be
> in the real world.
Well, I would prefer not to tell users of pgAdmin or phpPgAdmin that
they depend on a contrib module.
Moreover, PostgreSQL already expose a lot of informations about its
internal mechanisms, configuration, ddl etc. I think having a proper
tokenizer function is just a natural new functionality for core if possible.
Having dropped an eye here and there in the parser code, I am not sure
where I could get required info and mix them to produce something close
to my draft yet.
But I prefer to discussing first before spending too much time and
throwing any potential code after...
>
> regards, tom lane
- --
JGuillaume (ioguix) de Rorthais
http://www.dalibo.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAkvXdxgACgkQxWGfaAgowiJujQCglXpCYpFttwHOkmkCd92zMxnv
r00An1sjmRrR6u61VjCtXputcNBevHsz
=ri3i
-----END PGP SIGNATURE-----
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2010-04-28 00:13:09 | Re: testing HS/SR - 1 vs 2 performance |
Previous Message | Alvaro Herrera | 2010-04-27 23:45:15 | road.thepath no longer in pg_stats? |