Shaun Thomas <sthomas(at)optionshouse(dot)com> wrote:
> So, we know we have a way of doing a loose index scan with CTEs:
>
> http://wiki.postgresql.org/wiki/Loose_indexscan
I tried this on a table in production with 23 million rows for a
column with 45 distinct values which is the high-order column of a
four-column index. This ran in 445 ms first time and 2 ms on the
second and subsequent tries. The equivalent SELECT DISTINCT ran in
30 seconds first time, and got down to 11.5 seconds after a few
runs. So roughly two orders of magnitude faster with a cold cache
and three orders of magnitude faster with a warm cache.
That sure would be a nice optimization to have in the planner.
> But that got me wondering. The planner knows from pg_stats that
> col1 could have low cardinality. If that's the case, and a WHERE
> clause uses a two column index, and col2 is specified, why can't
> it walk each individual bucket in the two-column index, and use
> col2? So I forced such a beast with a CTE:
>
> WITH RECURSIVE t AS (
> SELECT min(col1) AS col1
> FROM tablename
> UNION ALL
> SELECT (SELECT min(col1)
> FROM tablename
> WHERE col1 > t.col1)
> FROM t
> WHERE t.col1 IS NOT NULL
> )
> SELECT p.*
> FROM t
> JOIN tablename p USING (col1)
> where p.col2 = 12345
>
> I ask, because while the long-term fix would be to re-order the
> index to (col2, col1), this seems like a situation the planner
> could easily detect and compensate for. In our particular example,
> execution time went from 160ms to 2ms with the CTE rewrite.
Well, that'd be the icing on the cake. I'd be overjoyed to get the
cake. :-)
-Kevin