From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Surafel Temesgen <surafel3000(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Adam Berlin <berlin(dot)ab(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: COPY FROM WHEN condition |
Date: | 2019-04-02 01:11:26 |
Message-ID: | 20190402011126.jvy5oun2nsdy4pqr@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2019-04-02 14:06:52 +1300, David Rowley wrote:
> On Tue, 2 Apr 2019 at 13:59, Andres Freund <andres(at)anarazel(dot)de> wrote:
> >
> > On 2019-04-02 13:41:57 +1300, David Rowley wrote:
> > > On Tue, 2 Apr 2019 at 05:19, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > > Thanks! I'm not quite clear whether you planning to continue working on
> > > > this, or whether this is a handoff? Either is fine with me, just trying
> > > > to avoid unnecessary work / delay.
> > >
> > > I can, if you've not. I was hoping to gauge if you thought the
> > > approach was worth pursuing.
> >
> > I think it's worth pursuing, with the caveats below. I'm going to focus
> > on docs the not-very-long rest of today, but I definitely could work on
> > this afterwards. But I also would welcome any help. Let me know...
>
> I'm looking now. I'll post something when I get it into some better
> shape than it us now.
Cool.
> > > > It still seems wrong to me to just perform a second hashtable search
> > > > here, givent that we've already done the partition dispatch.
> > >
> > > The reason I thought this was a good idea is that if we use the
> > > ResultRelInfo to buffer the tuples then there's no end to how many
> > > tuple slots can exist as the code in copy.c has no control over how
> > > many ResultRelInfos are created.
> >
> > To me those aren't contradictory - we're going to have a ResultRelInfo
> > for each partition either way, but there's nothing preventing copy.c
> > from cleaning up subsidiary data in it. What I was thinking is that
> > we'd just keep track of a list of ResultRelInfos with bulk insert slots,
> > and occasionally clean them up. That way we avoid the secondary lookup,
> > while also managing the amount of slots.
>
> The problem that I see with that is you can't just add to that list
> when the partition changes. You must check if the ResultRelInfo is
> already in the list or not since we could change partitions and change
> back again. For a list with just a few elements checking
> list_member_ptr should be pretty cheap, but I randomly did choose that
> we try to keep just the last 16 partitions worth of buffers. I don't
> think checking list_member_ptr in a 16 element list is likely to be
> faster than a hash table lookup, do you?
Why do we need that list membership check? If you append the
ResultRelInfo to the list when creating the ResultRelInfo's slots array,
you don't need to touch the list after a partition lookup - you know
it's a member if the ResultRelInfo has a slot array. Then you only need
to iterate the list when you want to drop slots to avoid using too much
memory - and that's also a sequential scan of the hash table in your
approach, right?
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Langote | 2019-04-02 01:26:32 | Re: Ordered Partitioned Table Scans |
Previous Message | David Rowley | 2019-04-02 01:06:52 | Re: COPY FROM WHEN condition |