Quick Links

Re: wildcard text filter switched to boolean column, performance is way worse

From:	Mike Broers <mbroers(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: wildcard text filter switched to boolean column, performance is way worse
Date:	2015-07-07 16:10:16
Message-ID:	CAB9893h90mR4-HFLFp+k=HCDWFxvrbxs08vDN37k82D+gSDHAA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Thanks, very informative! I'll experiment with work_mem settings and report
back.

On Tue, Jul 7, 2015 at 11:02 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Mike Broers <mbroers(at)gmail(dot)com> writes:
> > I had a query that was filtering with a wildcard search of a text field
> for
> > %SUCCESS%. The query took about 5 seconds and was running often so I
> wanted
> > to improve it. I suggested that the engineers include a new boolean
> column
> > for successful status. They implemented the requested field, but the
> query
> > that filters on that new column runs very long (i kill it after letting
> it
> > run for about an hour). Can someone help me understand why that is the
> > case and how to resolve it?
>
> It's hashing the subplan output in the first case and not the second:
>
> > Seq Scan on lead (cost=130951.81..158059.21 rows=139957 width=369)
> (actual
> > time=4699.619..4699.869 rows=1 loops=1)
> > Filter: ((NOT (hashed SubPlan 1)) AND (("ReferenceNumber")::text <>
> > ''::text) AND ((NOT (hashed SubPlan 2)) OR (NOT (hashed SubPlan 3))))
> ^^^^^^^^^^^^^^^^
> vs
>
> > Seq Scan on lead (cost=85775.78..9005687281.12 rows=139957 width=369)
> > Filter: ((NOT (hashed SubPlan 1)) AND (("ReferenceNumber")::text <>
> > ''::text) AND ((NOT (hashed SubPlan 2)) OR (NOT (SubPlan 3))))
> ^^^^^^^^^
>
> Presumably, the new more-accurate rows count causes the planner to realize
> that the hash table will exceed work_mem so it doesn't choose to hash ...
> but for your situation, you'd rather it did, because what you're getting
> instead is a Materialize node that spills to disk (again, because the data
> involved exceeds work_mem) and that's a killer for this query. You should
> be able to get back the old behavior if you raise work_mem enough.
>
> Another idea you might think about is changing the OR'd IN conditions
> to a single IN over a UNION ALL of the subselects. I'm not really sure if
> that would produce a better plan, but it's worth trying if it wouldn't
> require too much app-side contortion.
>
> regards, tom lane
>

In response to

Re: wildcard text filter switched to boolean column, performance is way worse at 2015-07-07 16:02:49 from Tom Lane

Responses

Re: wildcard text filter switched to boolean column, performance is way worse at 2015-07-07 16:28:17 from Mike Broers

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Vitalii Tymchyshyn	2015-07-07 16:27:43	Re: New server: SSD/RAID recommendations?
Previous Message	Tom Lane	2015-07-07 16:02:49	Re: wildcard text filter switched to boolean column, performance is way worse