Re: text fields and performance for ETL

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Grega Jesih <Grega(dot)Jesih(at)actual-it(dot)si>
Cc: "grega(dot)jesih(at)gmail(dot)com" <grega(dot)jesih(at)gmail(dot)com>, Pg Docs <pgsql-docs(at)lists(dot)postgresql(dot)org>
Subject: Re: text fields and performance for ETL
Date: 2021-11-03 16:00:20
Message-ID: CAKFQuwbaVh+-gzTCnD654mqbdkGmjDKETV2M3oBtCO-f2mKzsA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs

On Wed, Nov 3, 2021 at 8:35 AM Grega Jesih <Grega(dot)Jesih(at)actual-it(dot)si> wrote:

> Dear David,
>
> > The goal in our docs is to point out that using an arbitrary length
> specification is not required in PostgreSQL.
>
> Then it should say so. Because unexperienced reader then uses this limited
> focus and generalizes.
>

It does (but maybe it could be improved, see the FAQ entry linked below for
more detail).

>
>
> It is for the very same reason that progreammers don't perceive the need
> to limit the string size to its realistic size that ETL processes
>
> are slowed down.
>

Given the number of people I find agreeing with "just use text" and the
general lack of people making good arguments for using varchar(n) I'm
inclined to believe the status quo best reflects the majority of usage in
the wild.

For example a currency where 3-char encoding is used, the field should be
> char(3) and not text.
>

The char data type has its own problems (see the other FAQ entry linked
below)

> > The documentation assumes that the sizes involved here are reasonable
> for such behavior
>
> On the contrary. When you say "performance is the same.." then you make a
> wrong impression it is an unversal case. But, if fact it depends.
>

As far as PostgreSQL itself is concerned there is no "it depends". That is
all we are claiming here. And, frankly, generalization is correct in
probably 90% of situations. Maybe there is room for improvement, in
documentation that is usually the case, do you have a concrete suggestion
to offer?

> When you include such table into some dataflow from server A to server B,
> this process will get slowed down. I explained why.
> It is important to teach people that update the model to use logical
> values that make sense for such cases.
>

Teaching data modelling techniques isn't really a goal for our
documentation. We aim to inform how the PostgreSQL works.

> IF you teach instead "ah it is not important, you may use text", then you
> actually ignore a part of informatic team that provides the usage of this
> data.
>

In most cases (how wide should a name field be) there is no good length to
choose. For the currency abbreviation example I would add a "check
length(currency_name) = 3" constraint alongside a unique constraint on the
lookup table - but the data type would still just be text. I can also
enforce all uppercase and letters only for the symbol in the formal
constraint while the char(3) will happily allow a value of "u6D". In
either case the actual performance of processing that text field (input and
output) would be the same in PostgreSQL. So if servers A and B are both
PostgreSQL you are simply incorrect. If they are not then the example is
largely out-of-scope for our documentation.

(I refer to SSIS in this context) that provide a very fast dataflow in
> case there is a known record size.
>

varchar(n) says nothing about the minimum size allowed which means it does
nothing to help for "known record size". For that you need, and have,
actual constraints.

If anything the documentation lacks in making these points clear by the
very evidence of two FAQ entries covering the topic.

https://wiki.postgresql.org/wiki/Don%27t_Do_This#Don.27t_use_varchar.28n.29_by_default
https://wiki.postgresql.org/wiki/Don%27t_Do_This#Don.27t_use_char.28n.29

David J.

In response to

Browse pgsql-docs by date

  From Date Subject
Next Message Tom Lane 2021-11-03 17:37:28 Re: text fields and performance for ETL
Previous Message Grega Jesih 2021-11-03 15:35:51 RE: text fields and performance for ETL