From: | Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
---|---|
To: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
Subject: | Re: BitmapHeapScan streaming read user and prelim refactoring |
Date: | 2024-03-14 10:55:34 |
Message-ID: | CAFiTN-vx1H0tCckC2ueiNTj+ibLHMGF4HYZyv99neHZkAzbWbQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Mar 14, 2024 at 4:07 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
> > Yeah, that's a very valid point. So I think now Heikki/Melanie might
> > have got an answer to their question, about the thought process behind
> > serializing the snapshot for each scan node. And the same thing is
> > followed for BitmapHeapNode as well.
>
> I see. Thanks, understanding the thought process helps.
>
> So when a parallel table or index scan runs in the executor as part of a
> query, we could just use the active snapshot. But there are some other
> callers of parallel table scans that don't use the executor, namely
> parallel index builds. For those it makes sense to pass the snapshot for
> the scan independent of the active snapshot.
Right
> A parallel bitmap heap scan isn't really a parallel scan as far as the
> table AM is concerned, though. It's more like an independent bitmap heap
> scan in each worker process, nodeBitmapHeapscan.c does all the
> coordination of which blocks to scan. So I think that
> table_parallelscan_initialize() was the wrong role model, and we should
> still remove the snapshot serialization code from nodeBitmapHeapscan.c.
I think that seems right.
> Digging deeper into the question of whether es_snapshot ==
> GetActiveSnapshot() is a valid assumption:
>
> <deep dive>
>
> es_snapshot is copied from the QueryDesc in standard_ExecutorStart().
> Looking at the callers of ExecutorStart(), they all get the QueryDesc by
> calling CreateQueryDesc() with GetActiveSnapshot(). And I don't see any
> callers changing the active snapshot between the ExecutorStart() and
> ExecutorRun() calls either. In pquery.c, we explicitly
> PushActiveSnapshot(queryDesc->snapshot) before calling ExecutorRun(). So
> no live bug here AFAICS, es_snapshot == GetActiveSnapshot() holds.
>
> _SPI_execute_plan() has code to deal with the possibility that the
> active snapshot is not set. That seems fishy; do we really support SPI
> without any snapshot? I'm inclined to turn that into an error. I ran the
> regression tests with an "Assert(ActiveSnapshotSet())" there, and
> everything worked.
IMHO, we can call SPI_Connect() and SPI_Execute() from any C
extension, so I don't think there we can guarantee that the snapshot
must be set, do we?
> If es_snapshot was different from the active snapshot, things would get
> weird, even without parallel query. The scans would use es_snapshot for
> the visibility checks, but any functions you execute in quals would use
> the active snapshot.
>
> We could double down on that assumption, and remove es_snapshot
> altogether and use GetActiveSnapshot() instead. And perhaps add
> "PushActiveSnapshot(queryDesc->snapshot)" to ExecutorRun().
>
> </deep dive>
>
> In summary, this es_snapshot stuff is a bit confusing and could use some
> cleanup. But for now, I'd like to just add some assertions and a
> comments about this, and remove the snapshot serialization from bitmap
> heap scan node, to make it consistent with other non-parallel scan nodes
> (it's not really a parallel scan as far as the table AM is concerned).
> See attached patch, which is the same as previous patch with some extra
> assertions.
Maybe for now we can just handle this specific case to remove the
snapshot serializing for the BitmapHeapScan as you are doing in the
patch. After looking into the code your theory seems correct that we
are just copying the ActiveSnapshot while building the query
descriptor and from there we are copying into the Estate so logically
there should not be any reason for these two to be different.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2024-03-14 11:07:56 | Re: UUID v7 |
Previous Message | Heikki Linnakangas | 2024-03-14 10:53:48 | Re: Can Execute commands for different portals interleave? |