From: | Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | PgHacker <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com> |
Subject: | contrib/cache_scan (Re: What's needed for cache-only table scan?) |
Date: | 2014-01-14 15:06:42 |
Message-ID: | CADyhKSWORjVYOWxP7x6XMtSt4yUQErSDLf5EjVNM4a+U=1EfMA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello,
The attached patch is what we discussed just before the commit-fest:Nov.
It implements an alternative way to scan a particular table using on-memory
cache instead of the usual heap access method. Unlike buffer cache, this
mechanism caches a limited number of columns on the memory, so memory
consumption per tuple is much smaller than the regular heap access method,
thus it allows much larger number of tuples on the memory.
I'd like to extend this idea to implement a feature to cache data according to
column-oriented data structure to utilize parallel calculation processors like
CPU's SIMD operations or simple GPU cores. (Probably, it makes sense to
evaluate multiple records with a single vector instruction if contents of
a particular column is put as a large array.)
However, this patch still keeps all the tuples in row-oriented data format,
because row <=> column translation makes this patch bigger than the
current form (about 2KL), and GPU integration needs to link proprietary
library (cuda or opencl) thus I thought it is not preferable for the upstream
code.
Also note that this patch needs part-1 ~ part-3 patches of CustomScan
APIs as prerequisites because it is implemented on top of the APIs.
One thing I have to apologize is, lack of documentation and source code
comments around the contrib/ code. Please give me a couple of days to
clean-up the code.
Aside from the extension code, I put two enhancement on the core code
as follows. I'd like to have a discussion about adequacy of these enhancement.
The first enhancement is a hook on heap_page_prune() to synchronize
internal state of extension with changes of heap image on the disk.
It is not avoidable to hold garbage, increasing time by time, on the cache,
thus needs to clean up as vacuum process doing. The best timing to do
is when dead tuples are reclaimed because it is certain nobody will
reference the tuples any more.
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index f626755..023f78e 100644
--- a/src/backend/utils/time/tqual.c
bool marked[MaxHeapTuplesPerPage + 1];
} PruneState;
+/* Callback for each page pruning */
+heap_page_prune_hook_type heap_page_prune_hook = NULL;
+
/* Local functions */
static int heap_prune_chain(Relation relation, Buffer buffer,
OffsetNumber rootoffnum,
@@ -294,6 +297,16 @@ heap_page_prune(Relation relation, Buffer buffer, Transacti
onId OldestXmin,
* and update FSM with the remaining space.
*/
+ /*
+ * This callback allows extensions to synchronize their own status with
+ * heap image on the disk, when this buffer page is vacuumed.
+ */
+ if (heap_page_prune_hook)
+ (*heap_page_prune_hook)(relation,
+ buffer,
+ ndeleted,
+ OldestXmin,
+ prstate.latestRemovedXid);
return ndeleted;
}
The second enhancement makes SetHintBits() accepts InvalidBuffer to
ignore all the jobs. We need to check visibility of cached tuples when
custom-scan node scans cached table instead of the heap.
Even though we can use MVCC snapshot to check tuple's visibility,
it may internally set hint bit of tuples thus we always needs to give
a valid buffer pointer to HeapTupleSatisfiesVisibility(). Unfortunately,
it kills all the benefit of table cache if it takes to load the heap buffer
being associated with the cached tuple.
So, I'd like to have a special case handling on the SetHintBits() for
dry-run when InvalidBuffer is given.
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index f626755..023f78e 100644
--- a/src/backend/utils/time/tqual.c
+++ b/src/backend/utils/time/tqual.c
@@ -103,11 +103,18 @@ static bool XidInMVCCSnapshot(TransactionId xid,
Snapshot snapshot);
*
* The caller should pass xid as the XID of the transaction to check, or
* InvalidTransactionId if no check is needed.
+ *
+ * In case when the supplied HeapTuple is not associated with a particular
+ * buffer, it just returns without any jobs. It may happen when an extension
+ * caches tuple with their own way.
*/
static inline void
SetHintBits(HeapTupleHeader tuple, Buffer buffer,
uint16 infomask, TransactionId xid)
{
+ if (BufferIsInvalid(buffer))
+ return;
+
if (TransactionIdIsValid(xid))
{
/* NB: xid must be known committed here! */
Thanks,
2013/11/13 Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>:
> 2013/11/12 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
>> Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp> writes:
>>> So, are you thinking it is a feasible approach to focus on custom-scan
>>> APIs during the upcoming CF3, then table-caching feature as use-case
>>> of this APIs on CF4?
>>
>> Sure. If you work on this extension after CF3, and it reveals that the
>> custom scan stuff needs some adjustments, there would be time to do that
>> in CF4. The policy about what can be submitted in CF4 is that we don't
>> want new major features that no one has seen before, not that you can't
>> make fixes to previously submitted stuff. Something like a new hook
>> in vacuum wouldn't be a "major feature", anyway.
>>
> Thanks for this clarification.
> 3 days are too short to write a patch, however, 2 month may be sufficient
> to develop a feature on top of the scheme being discussed in the previous
> comitfest.
>
> Best regards,
> --
> KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>
--
KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>
Attachment | Content-Type | Size |
---|---|---|
pgsql-v9.4-custom-scan.part-4.v5.patch | application/octet-stream | 62.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Dimitri Fontaine | 2014-01-14 15:18:23 | extension_control_path |
Previous Message | Robert Haas | 2014-01-14 15:04:16 | Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance |