From: | Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> |
---|---|
To: | Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> |
Cc: | craig(dot)ringer(at)2ndquadrant(dot)com, robertmhaas(at)gmail(dot)com, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: BUG: pg_stat_statements query normalization issues with combined queries |
Date: | 2016-12-27 17:16:45 |
Message-ID: | alpine.DEB.2.20.1612271718150.4911@lancre |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello Kyotaro-san,
> In nonterminal stmtmulti, setQueryLocation is added and used to
> calcualte and store the length of every stmt's. This needs an
> extra storage in bse_yy_extra_type
The extra storage is one int.
> and introduces a bit complicated stuff. But I think raw_parser can do
> that without such extra storage and labor,
How? The issue is that stmtmulti silently skip some ';' when empty
statements are found, so I need to keep track of that to know where to
stop, using the beginning location of the next statement, which is
probably your idea, does not work.
> then gram.y gets far simpler.
> The struct member to store the location and length is named
> 'qlocation', and 'qlength' in the added ParseNode. But many nodes
> already have 'location' of exactly the same purpopse. I don't see
> a necessity to differentiate them.
Alas, the "CreateTableSpaceStmt" struct already had a "char *location"
that I did not want to change... If I use "location", then I have to
change it, why not...
Another reason for the name difference is that qlocation/qlength
convention is slightly different from location when not set: location is
set manually to -1 when the information is not available, but this
convention is not possible for statements without changing all their
allocations and initializations (there are hundreds...), so the convention
I used for qlocation/qlength is that it is not set if qlength is zero,
which is the default value set by makeNode, so no change was needed...
Changing this point would mean a *lot* of possibly error prone changes in
the parser code everywhere statements are allocated...
> Putting the two above together, the following is my suggestion
> for the parser part.
>
> - Make all parse nodes have the following two members at the
> beginning. This unifies all parse node from the view of setting
> their location. Commenting about this would be better.
>
> | NodeTag type;
> | int location;
Currently only a few parse nodes have locations, those about variables and
constants that can be designated by an error message. I added the
information to about 100 statements, but for the many remaining parse
nodes the information does not exist and is not needed, so I would prefer
to avoid adding a field both unset and unused.
> - Remove struct ParseNode
"ParseNode" is needed to access the location and length of statements,
otherwise they are only "Node", which does not have a location.
> and setQueryLocation.
The function is called twice, I wanted to avoid duplicating code.
> stmtmulti sets location in the same manner to other kind of nodes, just
> doing '= @n'. base_yyparse() doesn't calculate statement length.
But...
> - Make raw_parser caluclate statement length then store it into
> each stmt scanning the yyextra.parsetree.
... I cannot calculate the statement length cleanly because of empty
statements. If I know where the last ';' is, then the length computation
must be when the reduction occurs, hence the function called from the
stmtmulti rule.
> What do you think about this?
I think that I do not know how to compute the length without knowing where
the last ';' was, because of how empty statements are silently skipped
around the stmtmulti/stmt rules, so I think that the length computation
should stay where it is.
What I can certainly do is:
- add more comments to explain why it is done like that
What I could do with some inputs from reviewers/committers is:
- rename the "char *location" field of the create table space statement
to "directory" or something else... but what?
- change qlocation/qlength to location/length...
However, then the -1 convention for location not set is not true, which
is annoying. Using this convention means hundreds of changes because every
statement allocation & initialization must be changed. This is
obviously possible, but is a much larger patch, which I cannot say
would be much better than this one, it would just be different.
What I'm reserved about:
- adding a location field to nodes that do not need it.
--
Fabien.
From | Date | Subject | |
---|---|---|---|
Next Message | Greg Stark | 2016-12-27 17:17:36 | Re: pg_stat_activity.waiting_start |
Previous Message | Claudio Freire | 2016-12-27 17:14:39 | Re: Vacuum: allow usage of more than 1GB of work mem |