Quick Links

substring on bit(n) and bytea types is slow

From:	Evgeny Morozov <evgeny(dot)morozov+list+pgsql(at)shift-technology(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	substring on bit(n) and bytea types is slow
Date:	2016-02-29 19:50:31
Message-ID:	CALtd4uXPx1pGvsgmAJ9er=JFwY_emUGumHXHi9=nk22FDP8w-w@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Hi,

Queries like this:

SELECT substring(bitarray from (32 * (n - 1) + 1) for 32) -- bitarray is a
column of type bit(64000000)
FROM array_test_bit
JOIN generate_series(1, 10000) n ON true;

SELECT substring(bytearr from (8 * (n - 1) + 1) for 8) -- bytearr is a
column of type bytea
FROM array_test_bytea
JOIN generate_series(1, 10000) n ON true;

...are really slow. These take over a minute each and a postgres backend
process uses 100% of a CPU while the query runs. The same thing in SQL
Server 2014 (using varbinary(max) columns) runs fast - about 20 seconds for
4 million rows. Are byte/bit arrays just inherently slow in Postgres? Or is
substring the wrong function to use for them?

The context is that I want to efficiently store many integers. The obvious
answer is integer[], but most of my integers can fit into less than 32
bits, so I'd like to see if I can pack them more efficiently.

Regards,

Evgeny Morozov

Responses

Re: substring on bit(n) and bytea types is slow at 2016-03-01 23:33:15 from Arjen Nienhuis

Browse pgsql-general by date

	From	Date	Subject
Next Message	Geoff Winkless	2016-02-29 20:10:26	Re: multicolumn index and setting effective_cache_size using human-readable-numbers
Previous Message	Alvaro Herrera	2016-02-29 19:44:52	Re: bloated postgres data folder, clean up