From: | Michael Banck <mbanck(at)gmx(dot)net> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
Cc: | Abhijit Menon-Sen <ams(at)toroid(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org, 成之焕 <zhcheng(at)ceresdata(dot)com> |
Subject: | Re: [PATCH] Exponential backoff for auth_delay |
Date: | 2024-03-04 19:42:55 |
Message-ID: | 65e62440.170a0220.f6a7c.6f7f@mx.google.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On Wed, Feb 21, 2024 at 10:26:11PM +0100, Tomas Vondra wrote:
> Thanks for the patch. I took a closer look at v3, so let me share some
> review comments. Please push back if you happen to disagree with some of
> it, some of this is going to be more a matter of personal preference.
Thanks! As my patch was based on a previous patch, some of decisions
were carry-overs I am not overly attached to.
> 1) I think it's a bit weird to have two options specifying amount of
> time, but one is in seconds and one in milliseconds. Won't that be
> unnecessarily confusing? Could we do both in milliseconds?
Alright, I changed that.
See below for a discussion about the GUCs in general.
> 2) The C code defines the GUC as auth_delay.exponential_backoff, but the
> SGML docs say <varname>auth_delay.exp_backoff</varname>.
Right, an oversight from the last version where the GUC name got changed
but I forgot to change the documentation, fixed.
> 3) Do we actually need the exponential_backoff GUC? Wouldn't it be
> sufficient to have the maximum value, and if it's -1 treat that as no
> backoff?
That is a good question, I guess that makes sense.
The next question then is: what should the default for (now)
auth_delay.max_milliseconds be in this case, -1? Or do we say that as
the default for auth_delay.milliseconds is 0 anyway, why would somebody
not want exponential backoff when they switch it on and keep it at the
current 10s/10000ms)?
I have not changed that for now, pending further input.
> 4) I think the SGML docs should more clearly explain that the delay is
> initialized to auth_delay.milliseconds, and reset to this value after
> successful authentication. The wording kinda implies that, but it's not
> quite clear I think.
Ok, I added some text to that end. I also added a not that
auth_delay.max_milliseconds will mean that the delay doubling never
stops.
> 4) I've been really confused by the change from
>
> if (status != STATUS_OK)
> to
> if (status == STATUS_ERROR)
>
> in auth_delay_checks, until I realized those two codes are not the only
> ones, and we intentionally ignore STATUS_EOF. I think it'd be good to
> mention that in a comment, it's not quite obvious (I only realized it
> because the e-mail mentioned it).
Yeah I agree, I tried to explain that now.
> 5) I kinda like the custom that functions in a contrib module start with
> a clear common prefix, say auth_delay_ in this case. Matter of personal
> preference, ofc.
Ok, I changed the functions to have an auth_delay_ prefix throughout..
> 6) I'm a bit skeptical about some acr_array details. Firstly, why would
> 50 entries be enough? Seems like a pretty low and arbitrary number ...
> Also, what happens if someone attempts to authenticate, fails to do so,
> and then disconnects and never tries again? Or just changes IP? Seems
> like the entry will remain in the array forever, no?
Yeah, that is how v3 of this patch worked. I have changed that now, see
below.
> Seems like a way to cause a "limited" DoS - do auth failure from 50
> different hosts, to fill the table, and now everyone has to wait the
> maximum amount of time (if they fail to authenticate).
Right, though the problem would only exist on authentication failures,
so it is really rather limited.
> I think it'd be good to consider:
>
> - Making the acr_array a hash table, and larger than 50 entries (I
> wonder if that should be dynamic / customizable by GUC?).
I would say a GUC should be overkill for this as this would mostly be an
implementation detail.
More generally, I think now that entries are expired (see below), this
should be less of a concern, so I have not changed this to a hash table
for now but doubled MAX_CONN_RECORDS to 100 entries.
> - Make sure the entries are eventually expired, based on time (for
> example after 2*max_delay?).
I went with 5*max_milliseconds - the AuthConnRecord struct now has a
last_failed_auth timestamp member; if we increase the delay for a host,
we check if any other host expired in the meantime and remove it if so.
> - It would be a good idea to log something when we get into the "full
> table" and start delaying everyone by max_delay_seconds. (How would
> anyone know that's what's happening, seems rather confusing.)
Right, I added a log line for that. However, I made it LOG instead of
WARNING as I don't think the client would ever see it, would he?
Attached is v4 with the above changes.
Cheers,
Michael
Attachment | Content-Type | Size |
---|---|---|
v4-0001-Add-optional-exponential-backoff-to-auth_delay-co.patch | text/x-diff | 10.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2024-03-04 19:47:09 | Re: Add system identifier to backup manifest |
Previous Message | Robert Haas | 2024-03-04 18:37:39 | Re: incremental backup mishandles XLOG_DBASE_CREATE_FILE_COPY |