Re: Inconsistent behavior on Array & Is Null?

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Joe Conway <mail(at)joeconway(dot)com>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Inconsistent behavior on Array & Is Null?
Date: 2004-04-03 19:33:37
Message-ID: 873c7kkga6.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Joe Conway <mail(at)joeconway(dot)com> writes:

> Greg Stark wrote:
> > This really ought to work, it obviously shouldn't allow you to set a[5] and
> > then surreptitiously move it to a[1]. But nor should it generate an error,
> > since I may well have a specific meaning for a[5] and may be planning to fill
> > in a[1]..a[4] later.
> > The logical thing to do, I think, is pre-fill the entries a[1]..a[4] with
> > null. This could be implemented by actually storing the NULLs or else storing
> > some notation that's used to adjust the base of the index to save space.
>
> I agree. I had always envisioned something exactly like that once we supported
> NULL elements. As far as the implementation goes, I think it would be very
> similar to tuples -- a null bitmask that would exist if any elements are NULL.

Well you might still want to store an internal "all indexes below this are
null". That way update foo set a[1000]=1 doesn't require storing even a bitmap
for the first 999 elements. Though might make maintaining the bitmap kind of a
pain. Maintaining the bitmap might be kind of a pain anyways though because
unlike tuples the array size isn't constant.

> A related question is how to deal with non-existing array elements. Until now,
> you could do:

I would have to think about it some more, but my first reaction is that
looking up [0] should generate an error if there can never be a valid entry at
[0]. But looking up indexes above the highest index should return NULL.

There are two broad use cases I see for arrays. Using them to represent tuples
where a[i] means something specific for each i, and using them to represent
sets where order doesn't matter.

In the former case I might want to initialize my column to an empty array and
set only the relevant columns as needed. In that case returning NULL for
entries that haven't been set yet whether they're above the last entry set or
below is most consistent.

In the latter case you really don't want to be looking up anything past the
end and don't want to be storing NULLs at all. So it doesn't really matter
what the behaviour is for referencing elements past the end, but you might
conceivably want to write code like "while (e = a[i++]) ...".

Incidentally I'm using both of these models in my current project.

I use text[] to represent localized strings, str[1] is always English and
str[2] is always French. When I need to expand to more languages I'll add
str[3] for Spanish or whatever else. It would be a problem if I stored
something in str[2] and then found it in str[1] later. And it could be a bit
awkward to have to prefill str[3] everywhere in the whole database when the
time comes. Having it just silently return NULL would be more convenient.

I also use arrays for sets in a cache table. In that case there would never be
NULLs and the arrays are variable sized. Sometimes with thousands of entries.
The purpose of the cache table is to speed things up so storing the arrays
densely is important.

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joe Conway 2004-04-03 19:42:09 Re: Better support for whole-row operations and composite
Previous Message Andrew Dunstan 2004-04-03 16:08:30 Re: [HACKERS] Function to kill backend