From: | "Gunnar \"Nick\" Bluth" <gunnar(dot)bluth(at)pro-open(dot)de> |
---|---|
To: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Cc: | gkokolatos(at)protonmail(dot)com |
Subject: | [PATCH] pg_stat_toast |
Date: | 2021-12-12 16:20:58 |
Message-ID: | a08b54fa-7b13-9531-6233-33a3d23773a8@pro-open.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello -hackers!
Please have a look at the attached patch, which implements some
statistics for TOAST.
The idea (and patch) have been lurking here for quite a while now, so I
decided to dust it off, rebase it to HEAD and send it out for review today.
A big shoutout to Georgios Kokolatos, who gave me a crash course in PG
hacking, some very useful hints and valueable feedback early this year.
I'd like to get some feedback about the general idea, approach, naming
etc. before refining this further.
I'm not a C person and I s**k at git, so please be kind with me! ;-)
Also, I'm not subscribed here, so a CC would be much appreciated!
Why gather TOAST statistics?
============================
TOAST is transparent and opaque at the same time.
Whilst we know that it's there and we know _that_ it works, we cannot
generally tell _how well_ it works.
What we can't answer (easily) are questions like e.g.
- how many datums have been externalized?
- how many datums have been compressed?
- how often has a compression failed (resulted in no space saving)?
- how effective is the compression algorithm used on a column?
- how much time did the DB spend compressing/decompressing TOAST values?
The patch adds some functionality that will eventually be able to answer
these (and probably more) questions.
Currently, #1 - #4 can be answered based on the view contained in
"pg_stats_toast.sql":
postgres=# CREATE TABLE test (i int, lz4 text COMPRESSION lz4, std text);
postgres=# INSERT INTO test SELECT
i,repeat(md5(i::text),100),repeat(md5(i::text),100) FROM
generate_series(0,100000) x(i);
postgres=# SELECT * FROM pg_stat_toast WHERE schemaname = 'public';
-[ RECORD 1 ]--------+----------
schemaname | public
reloid | 16829
attnum | 2
relname | test
attname | lz4
externalizations | 0
compressions | 100001
compressionsuccesses | 100001
compressionsizesum | 6299710
originalsizesum | 320403204
-[ RECORD 2 ]--------+----------
schemaname | public
reloid | 16829
attnum | 3
relname | test
attname | std
externalizations | 0
compressions | 100001
compressionsuccesses | 100001
compressionsizesum | 8198819
originalsizesum | 320403204
Implementation
==============
I added some callbacks in backend/access/table/toast_helper.c to
"pgstat_report_toast_activity" in backend/postmaster/pgstat.c.
The latter (and the other additions there) are essentially 1:1 copies of
the function statistics.
Those were the perfect template, as IMHO the TOAST activities (well,
what we're interested in at least) are very much comparable to function
calls:
a) It doesn't really matter if the TOASTed data was committed, as "the
damage is done" (i.e. CPU cycles were used) anyway
b) The information can (thus/best) be stored on DB level, no need to
touch the relation or attribute statistics
I didn't find anything that could have been used as a hash key, so the
PgStat_StatToastEntry
uses the shiny new
PgStat_BackendAttrIdentifier
(containing relid Oid, attr int).
For persisting in the statsfile, I chose the identifier 'O' (as 'T' was
taken).
What's working?
===============
- Gathering of TOAST externalization and compression events
- collecting the sizes before and after compression
- persisting in statsfile
- not breaking "make check"
- not crashing anything (afaict)
What's missing (yet)?
===============
- proper definition of the "pgstat_track_toast" GUC
- Gathering of times (for compression [and decompression?])
- improve "pg_stat_toast" view and include it in the catalog
- documentation (obviously)
- proper naming (of e.g. the hash key type, functions, view columns etc.)
- would it be necessary to implement overflow protection for the size &
time sums?
Thanks in advance & best regards,
--
Gunnar "Nick" Bluth
Eimermacherweg 106
D-48159 Münster
Mobil +49 172 8853339
Email: gunnar(dot)bluth(at)pro-open(dot)de
__________________________________________________________________________
"Ceterum censeo SystemD esse delendam" - Cato
Attachment | Content-Type | Size |
---|---|---|
0001-initial-patch-of-pg_stat_toast-for-hackers.patch | text/x-patch | 29.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2021-12-12 17:45:43 | Re: extended stats on partitioned tables |
Previous Message | Zhihong Yu | 2021-12-12 15:37:01 | Re: extended stats on partitioned tables |