Re: Planner cost adjustments

From: Francisco Olarte <folarte(at)peoplecall(dot)com>
To: Daniel Begin <jfd553(at)hotmail(dot)com>
Cc: Bill Moran <wmoran(at)potentialtech(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Melvin Davidson <melvin6925(at)gmail(dot)com>
Subject: Re: Planner cost adjustments
Date: 2015-06-11 14:04:50
Message-ID: CA+bJJbzMgRCKcy7ZPpEYUv2ng6=a83GncirQ8gy=3_BrvgsZgw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Daniel:

On Thu, Jun 11, 2015 at 2:38 PM, Daniel Begin <jfd553(at)hotmail(dot)com> wrote:
.....
> The remaining problem seems related to the statistics of some large tables.
> On one hand, I might increase the statistic target for these tables to 500,
> or even to 1000 and look at the results (but I have doubts it will help). On
> the other hand, I could simply force enable_seqscan to OFF for queries
> dealing with them.
>
> Still not sure about the best solution but the problem is definitely
> narrower :-)

One somehow unrelated point. IIRC your problems where related to
queries doing joins with selected sets of indexed IDs on a smaller
table which then have to be looked up on some very large tables. ( I'm
not able to follow too well which is which, users on changesets, I'm a
bit lost ). Given your runtimes are always high ( in the seconds
range, so it seems wire speed / latencies are not too much of an issue
) and that selectivity estimates on huge tables are always problematic
and may be thwarting your plans you may be able to get faster results
splitting your query.

If I read your plans correctly, that would be selecting your 600
users in one query and then preparing the changeset query for a single
user_id, which should be indexed, and looping it for every user. Given
current machines can easily send-receive 600 queries in a second it
may lead to a simpler solution. This mean you're using the DB as a
somehow inteligent plain old indexed file, but sometimes this is the
simpler approach ( heck, some of my code uses algorithms from the tape
era as they were the faster way I could do it ).

I needed to do this in one of my programs, the optimizer kept
selecting bad plans so I did the first query, held the results in
memory, and then prepared and executed in a loop from the app, my
query was selecting a couple thousand values from submillion table,
and joining with a multimillion one, getting a couple hundreds matches
per original value. Splitting it made the queries on the big table
always run indexed and fast ( and as a side bonus avoided duplicating
the parts of the first record in the wire a hundred times, which was
nice since the short table was wide and I only needed 3 short fields
from the second one, and that made the first query run at wire speed
and the second at disk speed ).

Francisco Olarte.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Marc Mamin 2015-06-11 14:05:11 Re: select count(*);
Previous Message asmariyaz23 2015-06-11 14:00:37 Installing Postgres manually GCC error and libmpfr.so.4 not found