From: | José Luis Tallón <jltallon(at)adv-solutions(dot)net> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Cc: | Andres Freund <andres(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com> |
Subject: | Re: Sequence Access Method WIP |
Date: | 2014-12-03 14:50:00 |
Message-ID: | 547F2318.6080207@adv-solutions.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 12/03/2014 11:24 AM, Andres Freund wrote:
> On 2014-12-03 10:59:50 +0100, José Luis Tallón wrote:
>> snip]
> I don't think the WAL logging would need to change much in comparison to
> the current solution. We'd just add the page number to the WAL record.
>
> The biggest advantage would be to require fewer heavyweight locks,
> because the pg_sequence one could be a single fastpath lock. Currently
> we have to take the sequence's relation lock (heavyweight) and then the
> the page level lock (lwlock) for every single sequence used.
Got it, thank you for the explanation.
>> May I possibly suggest a file-per-schema model instead? This approach would
>> certainly solve the excessive i-node consumption problem that --I guess--
>> Andres is trying to address here.
> I don't think that really has any advantages.
Just spreading the I/O load, nothing more, it seems:
Just to elaborate a bit on the reasoning, for completeness' sake:
Given that a relation's segment maximum size is 1GB, we'd have
(1048576/8)=128k sequences per relation segment.
Arguably, not many real use cases will have that many sequences.... save
for *massively* multi-tenant databases.
The downside being that all that random I/O --- in general, it can't
really be sequential unless there are very very few sequences--- can't
be spread to other spindles. Create a "sequence_default_tablespace" GUC
+ ALTER SEQUENCE SET TABLESPACE, to use an SSD for this purpose maybe?
(I could take a shot at the patch, if deemed worthwhile)
>> [snip]
> The overhead of a file is much more than wasting the remainder of a
> page. Alone the requirement of separate fsyncs and everything is pretty
> bothersome. The generated IO patterns are also much worse...
Yes, you are right. I stand corrected.
> [snip]
> I think there's pretty much no chance of accepting more than one page
> per sequence
Definitively.
Thanks,
J.L.
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2014-12-03 14:54:22 | pgsql: Keep track of transaction commit timestamps |
Previous Message | Robert Haas | 2014-12-03 14:39:18 | Re: B-Tree support function number 3 (strxfrm() optimization) |