Re: Query hitting empty tables taking 48 minutes

From: Robert Creager <robert(at)logicalchaos(dot)org>
To: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Cc: pgsql-general(at)postgresql(dot)org, Robert Creager <robert(at)logicalchaos(dot)org>
Subject: Re: Query hitting empty tables taking 48 minutes
Date: 2018-06-07 22:21:55
Message-ID: 0F6EE4A2-A97E-48B4-BF80-BC801238F102@logicalchaos.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


Re-sending from a subscribed address (sigh).

On Jun 7, 2018, at 4:18 PM, Robert <domain(at)logicalchaos(dot)org> wrote:

> On Jun 7, 2018, at 2:15 PM, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at <mailto:laurenz(dot)albe(at)cybertec(dot)at>> wrote:
>
> Robert Creager wrote:
>> I have a system running FreeBSD 11.1-STABLE, PostgreSQL 9.6.8,Java OpenJDK 1.8.0_131,
>> jdbc 9.3-1104-jdbc41 which is exhibiting very bizarre behavior.
>> A query is executing against a couple of tables that have 1 and 0 records in them.
>> ds3.job_entry has 0 rows, ds3.blob has 1 row.
>> If I execute the query manually via command line, it executes fine. There are no other active queries, no locks.
>> The system is idle, in our Dev Test group, so I can leave it this way for a bit of time.
>> The general software setup is in production and I’ve seen nothing like this before.
>> Even a system with 300M ds3.blob entries executes this query fine.
>>
>> Jun 7 17:24:21 blackpearl postgres[10670]: [7737-1] db=tapesystem,user=Administrator,app=[unknown],client=127.0.0.1 LOG: duration: 2903612.206 ms execute fetch from S_2037436/C_2037437: SELECT * FROM ds3.blob WHERE EXISTS (SELECT * FROM ds3.job_entry WHERE blob_id = ds3.blob.id <http://ds3.blob.id/> AND (job_id = $1))
>>
>> tapesystem=# explain analyze SELECT * FROM ds3.blob WHERE EXISTS (SELECT * FROM ds3.job_entry WHERE blob_id = ds3.blob.id <http://ds3.blob.id/> AND (job_id = 'b51357cd-e07a-4c87-a50b-999c347a5c71'));
>> QUERY PLAN
>> ---------------------------------------------------------------------------------------------------------------------------------------
>> Nested Loop (cost=0.84..6.89 rows=1 width=77) (actual time=0.044..0.044 rows=0 loops=1)
>> -> Index Scan using job_entry_job_id_idx on job_entry (cost=0.42..2.44 rows=1 width=16) (actual time=0.042..0.042 rows=0 loops=1)
>> Index Cond: (job_id = 'b51357cd-e07a-4c87-a50b-999c347a5c71'::uuid)
>> -> Index Scan using blob_pkey on blob (cost=0.42..4.44 rows=1 width=77) (never executed)
>> Index Cond: (id = job_entry.blob_id)
>> Planning time: 0.388 ms
>> Execution time: 0.118 ms
>>
>>
>> pid client_port runtime query_start datname state query usename
>> 10670 11211 0 years 0 mons 0 days 0 hours 43 mins 28.852273 secs 2018-06-07 17:24:22.026384 tapesystem active SELECT * FROM ds3.blob WHERE EXISTS (SELECT * FROM ds3.job_entry WHERE blob_id = ds3.blob.id <http://ds3.blob.id/> AND (job_id = $1)) Administrator
>>
>>
>> From the system with 300M ds3.blob entries, which works fine (along with every other system in house):
>>
>> QUERY PLAN
>> Nested Loop (cost=0.57..738052.90 rows=164379 width=75) (actual time=1.001..1947.029 rows=50000 loops=1)
>> -> Seq Scan on job_entry (cost=0.00..10039.50 rows=164379 width=16) (actual time=0.871..56.442 rows=50000 loops=1)
>> Filter: (job_id = 'ef27d2fa-2727-424e-8f44-da9e33a5ca05'::uuid)
>> Rows Removed by Filter: 60001
>> -> Index Scan using blob_pkey on blob (cost=0.57..4.42 rows=1 width=75) (actual time=0.037..0.037 rows=1 loops=50000)
>> Index Cond: (id = job_entry.blob_id)
>> Planning time: 6.330 ms
>> Execution time: 1951.564 ms
>
> The JDBC query is using a cursor since you are using setFetchSize(10000).
>
> I can see two reasons why the plan might be different:
>
> 1. It is a prepared statement that has been executed often enough
> for the generic plan to kick in, and the generic plan is bad.

Tom brought up that, see my response to him. It is a bad plan, but on a table with no entries.

> 2. PostgreSQL chooses a fast startup plan because a cursor is used,
> and that plan performs much worse.

The parameterized plan Tom had me look at starts with sequence scans, which would be bad on the job_entry table (topping out at ~9M entries worst case), horrendous on the blob table (open ended size, testing with 300M entries on one of our systems).

> To see if 1. causes the problem, you could set
> m_statement.setPrepareThreshold(0);
> and see if that makes a difference.

If I can keep this machine and spin some code, I could try this. And, if this is a JDBC/app problem, re-starting the app would ‘fix’ the problem I’m guessing.

> To test if 2. has an influence, try setting cursor_tuple_fraction to 1.0
> and see if it makes a difference.
>
> You can capture the execution plan of the bad statement by using auto_explain,
> that would certainly shed more light on the problem.

I’ll look at enabling that setting. I’d be unable to do it easily without restarting postgres (via changing the postgresql.conf file) without spinning new code (possible, not ideal).

Best,
Robert

In response to

Browse pgsql-general by date

  From Date Subject
Next Message David G. Johnston 2018-06-07 22:58:07 Re: Query hitting empty tables taking 48 minutes
Previous Message Ahmed, Nawaz 2018-06-07 22:10:02 RE: Service pgpool