From: | "Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com> |
---|---|
To: | "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)postgresql(dot)org, "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> |
Subject: | Re: Tuplestore trimming in window-functions patch |
Date: | 2008-12-27 13:26:23 |
Message-ID: | e08cc0400812270526g66eee3a6k839b39f6d75fee3c@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
2008/12/27 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> The last bit of performance-related hacking that seems to be needed in
> the window functions patch is to fix things so that we can trim old
> rows from the underlying tuplestore when they're no longer needed.
> In particular I think it's critical to be able to do this for the
> simple cases of lead() and lag() with constant offsets. This topic
> has been mentioned repeatedly in the discussions to date, but nothing
> much got done.
>
> The components we seem to be missing are:
>
> 1. lead/lag have to be able to tell whether their offset arguments are
> constants.
>
> 2. The WindowObject API has to be extended so that a window function can
> tell the executor that it will never again fetch a row before row X.
> (This has to be cheap enough that we can do it on each call.)
>
> 3. The tuplestore API itself seems a bit shy of a load --- what we do
> not have is a concept that a particular read pointer can move back and
> forth, but is guaranteed never to move back before some-defined-place.
> (This is clearly an oversight in my recent redesign of that API.)
>
> I don't think it's hard to fix #1. What I propose we do is extend the
> existing fmgr.h APIs that allow inquiry about the data types of function
> arguments -- say, something along the line of
> extern bool get_fn_arg_is_const(FmgrInfo *flinfo, int argnum)
> This would at bottom be an "IsA(arg_expr, Const)" type of test but it
> seems best to keep the details hidden from caller functions. (Note:
> I've already fixed the WindowAgg infrastructure to support
> get_fn_expr_argtype and friends --- this really isn't optional if you
> expect polymorphic window functions to be common, and the evidence of
> the built-in ones is that they are. It's pure luck that none of the
> built-in ones has had any need to explicitly find out what data type
> it's processing.)
>
> #2 is also easy given a solution to #3; the WinForgetBefore() call would
> just translate to some appropriate action on the tuplestore read pointer
> associated with the window function's WindowObject.
>
> So I think the problem really reduces to fixing the tuplestore API some
> more. One way to go at it would be to re-introduce an explicit notion
> of a "mark" pointer associated with (some or all) read pointers, and
> have a capability bit saying that a read pointer is allowed to go
> backward so long as it doesn't go before its mark pointer. However
> I really like the simplicity of the current API design where all read
> pointers are alike and there's no explicit notion of a mark pointer.
> We could keep that simplicity by having a capability bit that says
> "this pointer can move backward so long as it doesn't become the oldest
> one". In this design you still need a mark pointer to delimit how
> much tuplestore you want to keep, but there's no explicit link between
> any one mark pointer and any one active read pointer so far as
> tuplestore.c is aware. (Note this'd imply that a WindowObject contains
> both a mark pointer and a read pointer for its window function, and the
> WinForgetBefore() call actually moves the mark pointer.) A possible
> objection is that it's a bit more error-prone to not explicitly say
> which other read pointer is intended to be the movement fence for your
> read pointer; but I'm not convinced whether that's important or not.
Yeah, this is the best for window functions, but I'm afraid that it
affects too much to other nodes like materialize, and cte, etc. I
like the current design of tuplestore, too. So if we have choice not
to change tuplestore API, it is supposed to be better.
> Also, we could really do this without any tuplestore API changes at all,
> if we were willing to implement any backing-up of the read pointer as
> copying the read pointer from the mark pointer and then advancing the
> read pointer. However that could get pretty inefficient if there were
> any significant distance between the mark and read pointers.
>
> Another thing that I'm thinking is that it was a mistake to not
> explicitly expose tuplestore_trim() as part of the tuplestore API.
> It seems that the callers always have a much better idea than
> tuplestore.c of when it would be worth trying to trim the tuplestore.
> The current WF patch wants to put trim calls into such bizarre places
> as select_read_pointer, which is utterly counterintuitive and usually
> a waste of cycles too. If we changed the rule to be that we trim when
> the caller tells us to, we could do something saner --- I think that
> really WindowAgg only needs to trim once per input tuple. We'd have to
> add a line or two in some other places like mergejoin, but it'd really
> not be much of a problem.
I agree with tuplestore_trim() should be exposed as a part of API. I
have not seen the reason it isn't. As you pointed, I rewrote
select_read_pointer() so that trim() check works appropriately but
actually it makes the executer havier than before. If the executor
tells trim() at the best timing, I am glad.
Regards,
--
Hitoshi Harada
From | Date | Subject | |
---|---|---|---|
Next Message | Hitoshi Harada | 2008-12-27 13:48:21 | Re: Frames vs partitions: is SQL2008 completely insane? |
Previous Message | Fujii Masao | 2008-12-27 12:28:52 | pg_start_backup without checkpoint patch (a part of Synch Rep) |