From: | Tomas Vondra <tv(at)fuzzy(dot)cz> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: jsonb and nested hstore |
Date: | 2014-03-11 22:58:30 |
Message-ID: | 531F9516.8080808@fuzzy.cz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
I've spent a few hours stress-testing this a bit - loading a mail
archive with ~1M of messages (with headers stored in a jsonb column) and
then doing queries on that. Good news - no crashes or any such issues so
far. The queries that I ran manually seem to return sane results.
The only problem I ran into is with limited index row size with GIN
indexes. I understand it's not a bug, but I admit I haven't realized I
might run into it in this case ...
The data I used for testing is just a bunch of e-mail messages, with
headers stored as jsonb, so each row has something like this in
"headers" column:
{
"from" : "John Doe <john(at)example(dot)com>",
"to" : ["Jane Doe <jane(at)example(dot)com>", "Jack Doe <jack(at)example(dot)com>"],
"cc" : ...,
"bcc" : ...,
... various other headers ...
}
The snag is that some of the header values may be very long, exceeding
the limit of 1352 bytes and causing errors like this:
ERROR: index row size 1416 exceeds maximum 1352 for index "gin_idx"
A good example of such header is "dkim-signature" which basically
contains the whole message digitally signed with DKIM. The signature
tends to be long and non-compressible, thanks to the signature.
I'm wondering what's the best way around this, because I suspect many
new users (especially those attracted by jsonb and GIN improvements)
will run into this. Maybe not immediately, but eventully they'll try to
insert a jsonb with long value, and it will fail ...
With btree indexes on text I would probably create an index on
substr(column,0,1000) or something like that, but doing that with JSON
seems a bit strange.
I assume we need to store the actual values in the GIN index (so a hash
is not sufficient), right?
GIST indexes work, but with that I have to give up the significant
performance gains that we got thanks to Alexander's GIN patches.
regards
Tomas
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2014-03-11 23:41:00 | Re: db_user_namespace a "temporary measure" |
Previous Message | Tomas Vondra | 2014-03-11 22:09:24 | Re: GIN improvements part2: fast scan |