Re: PostgreSQL CHARACTER VARYING vs CHARACTER VARYING (Length)

From: Tim Cross <theophilusx(at)gmail(dot)com>
To: pgsql-admin(at)lists(dot)postgresql(dot)org
Subject: Re: PostgreSQL CHARACTER VARYING vs CHARACTER VARYING (Length)
Date: 2020-04-29 05:30:03
Message-ID: 87pnbqsvtw.fsf@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-hackers


Rui DeSousa <rui(at)crazybean(dot)net> writes:

>> On Apr 28, 2020, at 10:29 PM, raf <raf(at)raf(dot)org> wrote:
>>
>> Rui DeSousa wrote:
>>
>>>> On Apr 28, 2020, at 7:43 PM, raf <raf(at)raf(dot)org> wrote:
>>>>
>>>> I just use "text" for everything. It's less typing. :-)
>>>
>>> Ugh, I see it as sign that the designers of the schema didn’t fully
>>> think about the actual requirements or care about them and it usually
>>> shows.
>>
>> You are mistaken. I care a lot. That's why I
>> future-proof designs whenever possible by
>> not imposing arbitrarily chosen limits that
>> appear to suit current requirements.
>>
>> In other words, I know I'm not smart enough
>> to predict the future so I don't let that
>> fact ruin my software. :-)
>>
>> cheers,
>> raf
>>
>
> Arbitrarily? What’s a cusip, vin, ssn? Why would you put a btree index on a text field? Because it’s not.
>
> What you’re advocating is a NoSQL design — defer your schema design. Letting the application code littered in multiple places elsewhere define what a cusip, etc. is.

I think the key term in this thread is 'arbitrary'. When implementing a
schema design, it should reflect the known constraints inherent in the
model, but it should avoid imposing arbitrary constraints if none exist
or cannot be determined.

So, if you know that a customer ID field has a current limitation of 50
characters, then use a definition which reflects that. It may be that at
some point in the future, this will be increased, but then again, it may
not and that bit of information provides useful information for
application developers and helps with consistency across APIs. Without
some guideline, different developers will impose different values,
leading to maintenance issues and bugs down the track.

On the other hand, imposing an arbitrary limitation, based on little
more than a guess by the designer, can cause enormous problems. As an
example, I was working on an identity management system where there was
a constraint of 8 characters on the username and password. This was an
arbitrary limit based on what was common practice, but was not a
limitation imposed by any of the systems the IAM system interacted with.
It was recognised that both fields were too small and needed to be
increased. The easy solution would have been to make these fields text.
However, that would cause a problem with some of the systems we needed
to integrate with because either they had a limit on username size or
they had a limit on password size. There were also multiple different
APIs which needed to work with this system and when we performed
analysis, they had varying limits on both fields.

What we did was look at all the systems we had to integrate with and
found the maximum supported username and password lengths for each
system and set the fields to have the maximum length supported by the
systems with the shortest lengths. Having that information in the
database schema also informed those developing other interfaces what the
maximums were. It is quite likely these limits would be increased in the
future and the database definition would need to be increased - in fact,
some years after going into production, exactly this occurred with the
password field when a different encryption algorithm was adopted which
did not have the previous character limitation and the client wanted to
encourage users to use pass phrases rather than a word.

The point is, just using text for all character fields loses information
and results in your model and schema being less expressive. Providing
this information is sometimes critical in ensuring limits are maintained
and provides useful documentation about the model that developers can
use. However, imposing limits based on little more than a guess is
usually a bad idea and if you cannot find any reason to impose a limit,
then don't. I disagree with approaches which claim using text everywhere
is easier and future proofing. In reality, it is just pushing the
problem out for someone else to deal with. The best way to future proof
your application is to have a clear well defined data model that fits
the domain and is well documented and reflected in your database schema.

--
Tim Cross

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message David G. Johnston 2020-04-29 05:32:43 Re: PostgreSQL CHARACTER VARYING vs CHARACTER VARYING (Length)
Previous Message Rui DeSousa 2020-04-29 05:26:03 Re: PostgreSQL CHARACTER VARYING vs CHARACTER VARYING (Length)

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2020-04-29 05:32:43 Re: PostgreSQL CHARACTER VARYING vs CHARACTER VARYING (Length)
Previous Message Rui DeSousa 2020-04-29 05:26:03 Re: PostgreSQL CHARACTER VARYING vs CHARACTER VARYING (Length)