Re: BUG #12202: json operator ->>with offset

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Matt Freeman <matt(at)nonuby(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #12202: json operator ->>with offset
Date: 2014-12-11 20:06:57
Message-ID: 20141211200657.GO1768@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Matt Freeman wrote:

> SELECT blob->>'firstName' FROM couchcontacts OFFSET 10000 LIMIT 10
>
> Takes upwards of 10 seconds!! Noted inefficiencies of offset aside why is
> this presumably causing 10,010 json decode ops? As the projection has no
> side-effects I don't understand the reason this can't be fast?

With respect to the tuples emitted, an Offset node runs later than
whatever is emitting the tuples in the first place. So the expressions
in the target list (blob->>'firstName' in this example) are executed for
all tuples, not just the ones past the 10000 offset. I guess we could
have an optimization to skip evaluating expressions for tuples that are
not going to be emitted, if none of them are volatile (bonus points if
only the volatile ones are evaluated, I guess). I don't think we have
that today; it certainly doesn't sound trivial.

This is not a JSON thing: you would probably see the same behavior with
expensive expressions involving any other type.

As a workaround you could grab the rows you want in a subselect in FROM,
and evaluate the expression in the outer query:

SELECT blob->>'firstName' FROM
(SELECT blob
FROM couchcontacts
ORDER BY something
OFFSET 10000 LIMIT 10) f;

The OFFSET in the subquery prevents the optimizer from flattening this
into a simple scan.

Note that if you don't have an ORDER BY clause, there is no guarantee
about which rows are going to be returned; if you have many of these
queries running concurrently, two executions in the same session could
get completely disparate result sets (i.e. if you first ask for OFFSET
9990 and then OFFSET 10000, you might not get consecutive rows as you
probably expect), for example because of synchronized seqscans.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Andrew Sackville-West 2014-12-11 21:28:55 Re: regression, deadlock in high frequency single-row UPDATE
Previous Message Tom Lane 2014-12-11 19:49:40 Re: BUG #12203: No password dialog when access the postgresql server