Quick Links

Re: substring on bit(n) and bytea types is slow

From:	Evgeny Morozov <evgeny(dot)morozov+list+pgsql(at)shift-technology(dot)com>
To:	Arjen Nienhuis <a(dot)g(dot)nienhuis(at)gmail(dot)com>
Cc:	pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject:	Re: substring on bit(n) and bytea types is slow
Date:	2016-03-02 09:48:07
Message-ID:	CALtd4uUVwaVnDeRkgV8cHM+C95OoPgYninnzxOrwm0DVDZV=Uw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On 2 March 2016 at 00:33, Arjen Nienhuis <a(dot)g(dot)nienhuis(at)gmail(dot)com> wrote:

>
> On Feb 29, 2016 22:26, "Evgeny Morozov" <
> evgeny(dot)morozov+list+pgsql(at)shift-technology(dot)com> wrote
> > SELECT substring(bitarray from (32 * (n - 1) + 1) for 32) -- bitarray is
> a column of type bit(64000000)
> > FROM array_test_bit
> > JOIN generate_series(1, 10000) n ON true;
>
> Substring on a bit string is not optimized for long TOASTed values.
> Substring on text is optimized for that. The current code fetches the whole
> 8MB from the table every time.
>
I see, thanks. Is there a better way to pack a large number of integers
efficiently with reasonable read/write performance?

I tried arrays bit varying, which seemed perfect, but in practice when I
stored 4M integers in it, each one taking as few bits as possible, the
table takes 13MB - same as if I just store all of them as bit(24). In fact,
an array of 4M bit(10) integers also takes 13MB. bit(8) takes only 0.7 MB.
bit(9) is where things get weird: for integer 1 to 4M it takes 13MB, but if
I multiple them by 2 (i.e. store 4M even integers) it takes 0.7MB! So there
must be some kind of compression going on there, but I don't understand how
it works.

Browse pgsql-general by date

	From	Date	Subject
Next Message	Vitaly Burovoy	2016-03-02 10:09:14	Re: How to ensure that a stored function always returns TRUE or FALSE?
Previous Message	Alexander Farber	2016-03-02 09:47:56	Re: Does RAISE EXCEPTION rollback previous commands in a stored function?