Re: [PoC] Delegating pg_ident to a third party

From: Jacob Champion <pchampion(at)vmware(dot)com>
To: "sfrost(at)snowman(dot)net" <sfrost(at)snowman(dot)net>
Cc: "peter(dot)eisentraut(at)enterprisedb(dot)com" <peter(dot)eisentraut(at)enterprisedb(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PoC] Delegating pg_ident to a third party
Date: 2022-01-04 23:56:02
Message-ID: d34a1f6719c27cfc2eb242b4be00524211817111.camel@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 2022-01-03 at 19:42 -0500, Stephen Frost wrote:
> * Jacob Champion (pchampion(at)vmware(dot)com) wrote:
> >
> > That last point was my motivation for the authn_id patch [1] -- so that
> > auditing could see the actual user _and_ the generic role. The
> > information is already there to be used, it's just not exposed to the
> > stats framework yet.
>
> While that helps, and I generally support adding that information to the
> logs, it's certainly not nearly as good or useful as having the actual
> user known to the database.

Could you talk more about the use cases for which having the "actual
user" is better? From an auditing perspective I don't see why
"authenticated as jacob(at)example(dot)net, logged in as admin" is any worse
than "logged in as jacob".

> > Forcing one role per individual end user is wasteful and isn't really
> > making good use of the role-based system that you already have.
> > Generally speaking, when administering hundreds or thousands of users,
> > people start dividing them up into groups as opposed to dealing with
> > them individually. So I don't think new features should be taking away
> > flexibility in this area -- if one role per user already works well for
> > you, great, but don't make everyone do the same.
>
> Using the role system we have to assign privileges certainly is useful
> and sensible, of course, though I don't see where you've actually made
> an argument for why one role per individual is somehow wasteful or
> somehow takes away from the role system that we have for granting
> rights.

I was responding more to your statement that "Being able to have roles
and memberships automatically created is much more the direction that
I'd say we should be going in". It's not that one-role-per-user is
inherently wasteful, but forcing role proliferation where it's not
needed is. If all users have the same set of permissions, there doesn't
need to be more than one role. But see below.

> I'm also not suggesting that we make everyone do the same
> thing, indeed, later on I was supportive of having an external system
> provide the mapping. Here, I'm just making the point that we should
> also be looking at automatic role/membership creation.

Gotcha. Agreed; that would open up the ability to administer role
privileges externally too, which would be cool. That could be used in
tandem with something like this patchset.

> > > I'd go a step further and suggest that the way to do this is with a
> > > background worker that's started up and connects to an LDAP
> > > infrastructure and listens for changes, allowing the system to pick up
> > > on new roles/memberships as soon as they're created in the LDAP
> > > environment. That would then be controlled by appropriate settings in
> > > postgresql.conf/.auto.conf.
> >
> > This is roughly what you can already do with existing (third-party)
> > tools, and that approach isn't scaling out in practice for some of our
> > existing customers. The load on the central server, for thousands of
> > idle databases dialing in just to see if there are any new users, is
> > huge.
>
> If you're referring specifically to cron-based tools which are
> constantly hammering on the LDAP servers running the same queries over
> and over, sure, I agree that that's creating load on the LDAP
> infrastructure (though, well, it was kind of designed to be very
> scalable for exactly that kind of load, no? So I'm not really sure why
> that's such an issue..).

I don't have hands-on experience here -- just going on what I've been
told via field/product teams -- but it seems to me that there's a big
difference between asking an LDAP server to give you information on a
user at the time that user logs in, and asking it to give a list of
_all_ users to every single Postgres instance you have on a regular
timer. The latter is what seems to be problematic.

> That's also why I specifically wasn't
> suggesting that and was instead suggesting that we have something that's
> connected to one of the (hopefully, many, many) LDAP servers and is
> doing change monitoring, allowing changes to be pushed down to PG,
> rather than cronjobs constantly running the same queries and re-checking
> things over and over. I appreciate that that's also not free, but I
> don't believe it's nearly as bad as the cron-based approach and it's
> certainly something that an LDAP infrastructure should be really rather
> good at.

I guess I'd have to see an implementation -- I was under the impression
that persistent search wasn't widely implemented?

> > > All that said, I do see how having the ability to call out to another
> > > system for mappings may be useful, so I'm not sure that we shouldn't
> > > consider this specific change and have it be specifically just for
> > > mappings, in which case pg_ident seems appropriate.
> >
> > Yeah, this PoC was mostly an increment on the functionality that
> > already existed. The division between what goes in pg_hba and what goes
> > in pg_ident is starting to blur with this patchset, though, and I think
> > Peter's point is sound.
>
> This part I tend to disagree with- pg_ident for mappings and for ways to
> call out to other systems to provide those mappings strikes me as
> entirely appropriate and doesn't blur the lines and that's really what
> this patch seems to be primarily about. Peter noted that there might be
> other things we want to do and argued that those might not be
> appropriate in pg_ident, which I tend to agree with, but I don't think
> we need to invent something entirely new for mappings when we have
> pg_ident already.

The current patchset here has pieces of what is usually contained in
HBA (the LDAP host/port/base/filter/etc.) effectively moved into
pg_ident, while other pieces (TLS settings) remain in the HBA and the
environment. That's what I'm referring to. If that is workable for you
in the end, that's fine, but for me it'd be much easier to maintain if
the mapping query and the LDAP connection settings for that mapping
query were next to each other.

> When it comes to the question of "how to connect to an LDAP server for
> $whatever", it seems like it'd be nice to be able to configure that once
> and reuse that configuration. Not sure I have a great suggestion for
> how to do that. The approach this patch takes of adding options to
> pg_hba for that, just like other options in pg_hba do, strikes me as
> pretty reasonable.

Right. That part seems less reasonable to me, given the current format
of the HBA. YMMV.

> I would advocate for other methods to work when it comes to
> authenticating to LDAP from PG though (such as GSSAPI, in particular,
> of course...).

I can take a look at the Cyrus requirements for the GSSAPI mechanism.
Might be tricky to add tests for it, though. Any others you're
interested in?

> > > I certainly don't think we should have this be limited to LDAP auth-
> > > such an external mapping ability is suitable for any authentication
> > > method that supports a mapping (thinking specifically of GSSAPI, of
> > > course..). Not sure if that's what was meant above but did want to
> > > make sure that was clear.
> >
> > You can't use usermaps with LDAP auth yet, so no, that's not what I
> > meant. (I have another patch for that feature in commitfest, which
> > would allow these two things to be used together.)
>
> Yes, I'm aware of the other patch, just wanted to make sure the intent
> is for this to work for all map-supporting auth methods. Figured that
> was the case but the examples in the prior email had me concerned and
> just wanted to make sure.

Correct. The new tests use cert auth, for instance.

Thanks,
--Jacob

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-01-04 23:58:12 Re: PoC: using sampling to estimate joins / complex conditions
Previous Message Andres Freund 2022-01-04 23:55:50 Re: using extended statistics to improve join estimates