Re: Feature Request: Extending PostgreSQL's Identifier Length Limit

From: Aleksander Alekseev <aleksander(at)timescale(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: David HJ <chuxiongzhong(at)gmail(dot)com>
Subject: Re: Feature Request: Extending PostgreSQL's Identifier Length Limit
Date: 2024-07-18 09:25:01
Message-ID: CAJ7c6TOurV4uA5Yz=aJ-ae4czL_zdFNqxbu47eyVrYFefrWoog@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi David,

> As a long-time PostgreSQL user, I've increasingly run into issues with the 63-byte limit for identifiers, particularly table names. This limit, while historically sufficient, is becoming a significant pain point in modern database design and usage.

I can understand your pain. Unfortunately there are a number of
complications involved.

Take pg_class catalog table [1] as an example and its column `relname`
of type `name` [2]. On disk it is stored as an fixed-sized array of
chars:

```
typedef struct nameData
{
char data[NAMEDATALEN];
} NameData;
typedef NameData *Name;
```

Why not use TEXT? Mostly for performance reasons. In general case TEXT
data can be TOASTed. When using TEXT one should do an additional call
of heap_deform_tuple().

Using NAME allows the code to interpret tuple data as is, e.g.:
```
typedef struct FormData_phonebook
{
int32 id;
NameData name;
int32 phone;
} FormData_phonebook;

typedef FormData_phonebook* Form_phonebook;

/* ... */

while ((tup = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
Form_phonebook record = (Form_phonebook) GETSTRUCT(tup);

if(strcmp(record->name.data, name->data) == 0)
{
found_phone = record->phone;
break;
}
}
```

So if you change NAME definition several things will happen:

1. You have to rebuild catalog tables. This can't be done in-place
because larger tuples may not fit into pages. Note that page size may
also vary depending on how PostgreSQL was compiled. Note that the
indexes will also be affected.

2. You will break all the extensions that use NAME and the
corresponding heap_* and index_* APIs.

3. The performance will generally decrease - many existing
applications will just waste memory or do unnecessary work due to
extra calls to heap_deform_tuple().

If (1) is doable in theory, I don't think (2) and (3) are something we
do in this project.

On top of that there is are relatively simple workarounds for the situation:

1. An application may have a function like shorten_name(x) = substr(x,
1, 50) || '_' || substr(md5(x), 1, 8). So instead of `SELECT * FROM x`
you just do `SELECT * FROM shorten_name(x)`.

2. You may fork the code and enlarge NAMEDATALEN. This is not
recommended and not guaranteed to work but worth a try.

This makes me think that solving the named limitation isn't worth the effort.

Personally I'm not opposed to the idea in general but you will need to
come up with a specific RFC that explains how exactly you propose to
solve named problems.

[1]: https://www.postgresql.org/docs/current/catalog-pg-class.html
[2]: https://www.postgresql.org/docs/current/datatype-character.html

--
Best regards,
Aleksander Alekseev

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nitin Motiani 2024-07-18 09:35:26 Re: long-standing data loss bug in initial sync of logical replication
Previous Message Ashutosh Bapat 2024-07-18 09:23:58 Re: Add mention of execution time memory for enable_partitionwise_* GUCs