From: | David Steele <david(at)pgmasters(dot)net> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Cc: | Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Dagfinn Ilmari Mannsåker <ilmari(at)ilmari(dot)org>, Jim Mlodgenski <jimmy76(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net> |
Subject: | Re: Large files for relations |
Date: | 2023-06-12 08:52:56 |
Message-ID: | 05cc050b-af2c-7504-c628-63bdcf104702@pgmasters.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 5/28/23 08:48, Thomas Munro wrote:
>
> Alright, since I had some time to kill in an airport, here is a
> starter patch for initdb --rel-segsize.
I've gone through this patch and it looks pretty good to me. A few things:
+ * rel_setment_size, we will truncate the K+1st segment to 0 length
rel_setment_size -> rel_segment_size
+ * We used a phony GUC with a custome show function, because we don't
custome -> custom
+ if (strcmp(endptr, "kB") == 0)
Why kB here instead of KB to match MB, GB, TB below?
+ int64 relseg_size; /* blocks per segment of large relation */
This will require PG_CONTROL_VERSION to be bumped -- but you are
probably waiting until commit time to avoid annoying conflicts, though I
don't think it is as likely as with CATALOG_VERSION_NO.
> Some random thoughts:
>
> Another potential option name would be --segsize, if we think we're
> going to use this for temp files too eventually.
I feel like temp file segsize should be separately configurable for the
same reason that we are leaving it as 1GB for now.
> Maybe it's not so beautiful to have that global variable
> rel_segment_size (which replaces REL_SEGSIZE everywhere).
Maybe not, but it is the way these things are done in general, .e.g.
wal_segment_size, so I don't think it will be too controversial.
> Another
> idea would be to make it static in md.c and call smgrsetsegmentsize(),
> or something like that. That could be a nice place to compute the
> "shift" value up front, instead of computing it each time in
> blockno_to_segno(), but that's probably not worth bothering with (?).
> BSR/LZCNT/CLZ instructions are pretty fast on modern chips. That's
> about the only place where someone could say that this change makes
> things worse for people not interested in the new feature, so I was
> careful to get rid of / and % operations with no-longer-constant RHS.
Right -- not sure we should be troubling ourselves with trying to
optimize away ops that are very fast, unless they are computed trillions
of times.
> I had to promote segment size to int64 (global variable, field in
> control file), because otherwise it couldn't represent
> --rel-segsize=32TB (it'd be too big by one). Other ideas would be to
> store the shift value instead of the size, or store the max block
> number, eg subtract one, or use InvalidBlockNumber to mean "no limit"
> (with more branches to test for it). The only problem I ran into with
> the larger type was that 'SHOW segment_size' now needs a custom show
> function because we don't have int64 GUCs.
A custom show function seems like a reasonable solution here.
> A C type confusion problem that I noticed: some code uses BlockNumber
> and some code uses int for segment numbers. It's not really a
> reachable problem for practical reasons (you'd need over 2 billion
> directories and VFDs to reach it), but it's wrong to use int if
> segment size can be set as low as BLCKSZ (one file per block); you
> could have more segments than an int can represent. We could go for
> uint32, BlockNumber or create SegmentNumber (which I think I've
> proposed before, and lost track of...). We can address that
> separately (perhaps by finding my old patch...)
I think addressing this separately is fine, though maybe enforcing some
reasonable minimum in initdb would be a good idea for this patch. For my
2c SEGSIZE == BLOCKSZ just makes very little sense.
Lastly, I think the blockno_to_segno(), blockno_within_segment(), and
blockno_to_seekpos() functions add enough readability that they should
be committed regardless of how this patch proceeds.
Regards,
-David
From | Date | Subject | |
---|---|---|---|
Next Message | Heikki Linnakangas | 2023-06-12 09:13:53 | Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG |
Previous Message | Fujii.Yuki@df.MitsubishiElectric.co.jp | 2023-06-12 08:51:30 | RE: Partial aggregates pushdown |