From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | some notes about implementation of SQL/PSM in PostgreSQL |
Date: | 2010-11-15 07:59:46 |
Message-ID: | AANLkTi=fqsdx6fW5DiXmAeLe7TFO6c2vNTRocZdTvK06@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello
I try to implement SQL/PSM in PostgreSQL. I experimented with
rewriting a plpgsql - you can check this implementation
http://postgres.cz/index.php/SQL/PSM_Manual.
I though so we can have some common PL stack. But I don't think it
now. It means a some more uglification of current code or uglification
of our SQL/PSM implementation.
Why? PL/pgSQL language is extremely adjusted to PostgreSQL - it's some
combination of strong static language based on relative high dynamic
interpret.
For example:
* variables are static (left part of assign statement)
* assign statement and expressions are "pseudo static" - type is
specified when query is first evaluated, and cannot be changes later
* there are not a strong relation between left part and right part of
assign statement - PLpgSQL use a cast or IO cast when it's necessary
PL/pgSQL needs to know all objects before first using - it doesn't to
use a "reflection" on SQL level - see different a usage of plpgsql FOR
IN query and PL/SQL FOR IN query
Current implementation has some large advantages or disadvantages - I
like PLpgSQL and I am sure so it is impossible to change current
behave now. This is just recapitulation:
+ there is a less dependency between functions - a function has to
exists just in moment when it is exactly executed
+ there isn't a complication with recompilation of functions - all
work is transparent
+ current implementation is fast - use a three level for access to
variables scalar, row, record
+ there is zero relation between tables and functions
+ functions is a dynamic - respect a changes in search_path (sometimes)
+ language is relative rich and allow very comfortable work with dynamic SQL
+ there are not standard - we are able to enhance language
- current implementation is too complex - there are a three different
access to variables
- there is relative big overhead when functions is starting first time
- there are lot of SQL parsing
- current validation checking only syntax - it doesn't check a
references to SQL objects - so some errors can be found too late
- the zero relation between expression and variable means a lot of
casts and some IO casts
- older code can be quietly broken by ALTER TABLE ADD COLUMN statement
- there can be a collision between PL and SQL identifiers
- functions is too dynamic (search_path can be a security hole)
- there are a still some dependency issues - somewhere is necessary to
logout/loging
- isn't possible to call plpgsql function like coroutines - so there
are not possibility to effectively limit SRF functions
- there is issue with blind SQL optimalization - bad plans
- there is issue with slow varlena data type modifications in large loops
- there is issue with iteration over record
I don't thing so all these issues can be solved - some needs a
dependency between functions and it means a problem with updates on
production's servers. Solving a other can means a breaking
compatibility etc..
Because SQL/PSM is different language and because it's not implemented
yet, we can to address some issues - absolutely not all
* simpler interpret - there must not be a different access to scalar,
row or record variable
* language can be a more dynamic or more static - current level of
PLpgSQL is little bit strange - with large impact on interpret
complexity
* more static behave can accelerate a first start - SQL are analyzed once time
* we can use a simple interpret where is possible to store state in
every moment and call PSM function like couroutine (stack can be
stored inside fn_extra).
I am thinking about some low level PL language - like PLSPI or some
similar - SQL/PSM functions should be compiled to this low level
language. This language can be simply parsed (first start will be
faster) and can be simply serialized and deserialized - it can be
important for people who afraid about their source codes - they can
compile to PLPSM and they can distribute only this pcode.
There are a two issues of standard
1. it doesn't to know RETURN NEXT, and what I know all SQL/PSM
implementation doesn't support similar construct - so It's should not
be in pgPSM too.
2. there are very specific statement "FOR IN query" - the PL variables
are created automatically based on query result. This isn't possible
in PL/pgSQL, because we must not to analyze query in validation time.
This is real problem - it can have a impact on pg_dump, because it
needs to dump functions with empty body first, and then dump full
function again.
Because PL/pgSQL has a long history and we cannot to leave it, I am
inclined to think so better variant for implementation of SQL/PSM is
starting with a different runtime. Probably only one interesting part
that can be shared is expression evaluator. I am thinking, so this
moving this to SPI can be interesting for all PL.
Regards
Pavel Stehule
From | Date | Subject | |
---|---|---|---|
Next Message | Itagaki Takahiro | 2010-11-15 08:47:15 | Re: MULTISET and additional functions for ARRAY |
Previous Message | Heikki Linnakangas | 2010-11-15 07:15:08 | Latches with weak memory ordering (Re: max_wal_senders must die) |