Re: updating statistics on slow running query

From: Eric Ramirez <eric(dot)ramirez(dot)sv(at)gmail(dot)com>
To: desmodemone <desmodemone(at)gmail(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: updating statistics on slow running query
Date: 2014-11-10 22:52:18
Message-ID: CA+_68f6r4Yn9TbgvZ3ci4oDzGLmGzp9BGswdK0+q05epDrYg-Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi Matteo,
Thanks for your suggestions, I just run some test with ILIKE and LIKE, and
ILIKE is consistently slower so I think I will keep the Lower functions.
As per your suggestion, I have switched indexes to use GIN type index,
they seem to build/read a bit faster, still the Recheck task continues to
happen in the query plan though. I have removed the Gender column from the
query since is not relevant in my tests. With all this playing around it
looks like the stats are now a bit more accurate.
The query went down to 9 seconds, ideally I would like to get to execute in
2 seconds..., any thoughts on what else I could try?
Thanks again,
Eric

=# explain (analyse on,buffers on)select T.form_id from
TAR_MVW_targeting_record AS T where T.status NOT IN ('ANULLED') AND
LOWER(T.household_member_last_name) LIKE LOWER('%tu%') group by T.form_id;

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------
HashAggregate (cost=557677.27..561360.83 rows=368356 width=8) (actual
time=10172.672..10410.068 rows=786669 loops=1)
Buffers: shared hit=304998
-> Bitmap Heap Scan on tar_mvw_targeting_record t
(cost=80048.06..552677.27 rows=2000002 width=8) (actual
time=2481.418..9564.280 rows
=999933 loops=1)
Recheck Cond: ((status)::text <> 'ANULLED'::text)
Filter: (lower((household_member_last_name)::text) ~~ '%tu%'::text)
Rows Removed by Filter: 9000079
Buffers: shared hit=304998
-> Bitmap Index Scan on tar_mvw_targeting_record_lower_idx4
(cost=0.00..79548.06 rows=10000012 width=0) (actual time=2375.399..2
375.399 rows=10000012 loops=1)
Buffers: shared hit=7369
Total runtime: 10475.240 ms

On Mon, Nov 10, 2014 at 1:57 PM, desmodemone <desmodemone(at)gmail(dot)com> wrote:

>
>
> 2014-11-10 18:43 GMT+01:00 Eric Ramirez <eric(dot)ramirez(dot)sv(at)gmail(dot)com>:
>
>>
>> Hi,
>> I have created a sample database with test data to help benchmark our
>> application. The database has ten million records, and is running on a
>> dedicated server(postgres 9.3) with 8GB of RAM. Our queries are pretty
>> slow with this amount of data and is my job to get them to run to at
>> acceptable speed. First thing that I notice was that the planner's row
>> estimates are off by a large number or records (millions) I have updated
>> the statistics target but didn't seem to make a difference. The relevant
>> output follows.
>> Am I looking in the wrong place, something else I should be trying?
>> Thanks in advance for your comments/suggestions,
>> Eric.
>>
>>
>> =# show work_mem;
>> work_mem
>> ----------
>> 1GB
>> (1 row)
>> =# show effective_cache_size;
>> effective_cache_size
>> ----------------------
>> 5GB
>> (1 row)
>>
>> =#ALTER TABLE TAR_MVW_TARGETING_RECORD ALTER COLUMN
>> household_member_first_name SET STATISTICS 5000;
>> =# vacuum analyse TAR_MVW_TARGETING_RECORD;
>>
>> =# \d tar_mvw_targeting_record;
>> Table "public.tar_mvw_targeting_record"
>> Column | Type | Modifiers
>> -----------------------------+-----------------------+-----------
>> household_member_id | bigint |
>> form_id | bigint |
>> status | character varying(64) |
>> gender | character varying(64) |
>> household_member_first_name | character varying(64) |
>> household_member_last_name | character varying(64) |
>>
>> Indexes:
>> "tar_mvw_targeting_record_form_id_household_member_id_idx" UNIQUE,
>> btree (form_id, household_member_id)
>> "tar_mvw_targeting_record_lower_idx" gist
>> (lower(household_member_first_name::text) extensions.gist_trgm_ops)
>> WHERE status::text <> 'ANULLED'::text
>> "tar_mvw_targeting_record_lower_idx1" gist
>> (lower(household_member_last_name::text) extensions.gist_trgm_ops)
>> WHERE status::text <> 'ANULLED'::text
>>
>>
>> =# explain (analyse on,buffers on)select T.form_id from
>> TAR_MVW_targeting_record AS T where T.status NOT IN ('ANULLED') AND
>> LOWER(T.household_member_last_name) LIKE LOWER('%tu%') AND
>> T.gender='FEMALE' group by T.form_id;
>>
>> QUERY PLAN
>>
>>
>> -------------------------------------------------------------------------------------------------------------------------------------------
>> -------------------------------
>> HashAggregate (cost=450994.35..452834.96 rows=184061 width=8) (actual
>> time=11932.959..12061.206 rows=442453 loops=1)
>> Buffers: shared hit=307404 read=109743
>> -> Bitmap Heap Scan on tar_mvw_targeting_record t
>> (cost=110866.33..448495.37 rows=999592 width=8) (actual
>> time=3577.301..11629.132 row
>> s=500373 loops=1)
>> Recheck Cond: ((lower((household_member_last_name)::text) ~~
>> '%tu%'::text) AND ((status)::text <> 'ANULLED'::text))
>> Rows Removed by Index Recheck: 9000079
>> Filter: ((gender)::text = 'FEMALE'::text)
>> Rows Removed by Filter: 499560
>> Buffers: shared hit=307404 read=109743
>> -> Bitmap Index Scan on tar_mvw_targeting_record_lower_idx1
>> (cost=0.00..110616.43 rows=2000002 width=0) (actual time=3471.142..3
>> 471.142 rows=10000012 loops=1)
>> Index Cond: (lower((household_member_last_name)::text) ~~
>> '%tu%'::text)
>> Buffers: shared hit=36583 read=82935
>> Total runtime: 12092.059 ms
>> (12 rows)
>>
>> Time: 12093.107 ms
>>
>> p.s. this plan was ran three times, first time took 74 seconds.
>>
>>
>>
> Hello Eric,
> did you try with gin index instead ? so you could
> avoid, if possible, the recheck condition (almost the gin index is not
> lossy ), further if you always use a predicate like "gender=" , you could
> think to partition the indexes based on that predicate (where status NOT IN
> ('ANULLED') and gender='FEMALE', in the other case it wil be where status
> NOT IN ('ANULLED') and gender='MALE' ) . Moreover you could avoid also the
> "lower" operator and try use directly the ilike , instead of "like".
>
> CREATE INDEX tar_mvw_targeting_record_idx02 ON tar_mvw_targeting_record USING gin ( status gin_trgm_ops) where status NOT IN ('ANULLED') and gender='FEMALE' ;
> CREATE INDEX tar_mvw_targeting_record_idx03 ON tar_mvw_targeting_record USING gin ( status gin_trgm_ops) where status NOT IN ('ANULLED') and gender='MALE' ;
>
>
> explain (analyse on,buffers on) select T.form_id from
> TAR_MVW_targeting_record AS T where T.status NOT IN ('ANULLED') AND
> T.household_member_last_name ilike LOWER('%tu%') AND T.gender='FEMALE'
> group by T.form_id;
>
>
> I hope it works
>
> have a nice day
>
>
> --
> Matteo Durighetto
>
> - - - - - - - - - - - - - - - - - - - - - - -
>
> Italian PostgreSQL User Group <http://www.itpug.org/index.it.html>
> Italian Community for Geographic Free/Open-Source Software
> <http://www.gfoss.it>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Josh Berkus 2014-11-10 23:51:12 Re: Performance bug in prepared statement binding in 9.2?
Previous Message Alvaro Herrera 2014-11-10 21:40:01 Re: Lock pileup causes server to stall