Re: Poor performance on simple queries compared to sql server express

From: "Adam Ma'ruf" <adam(dot)maruf(at)gmail(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Poor performance on simple queries compared to sql server express
Date: 2013-08-27 04:06:39
Message-ID: CAOy5j_P73Eb2q_XS+kHEGyWNSXMyKk0dkqgdXHCpjeF4CtPjvw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi

Thanks for the response. I reran the query but first ran the statement you
provided and set working mem to 2gb. It ended up taking 133s and group
aggregate was still used

Here are the values you asked for:
# - Planner Method Configuration -

#enable_bitmapscan = on
#enable_hashagg = on
#enable_hashjoin = on
#enable_indexscan = on
#enable_indexonlyscan = on
#enable_material = on
#enable_mergejoin = on
#enable_nestloop = on
#enable_seqscan = on
#enable_sort = on
#enable_tidscan = on

# - Planner Cost Constants -

#seq_page_cost = 1.0 # measured on an arbitrary scale
#random_page_cost = 4.0 # same scale as above
#cpu_tuple_cost = 0.01 # same scale as above
#cpu_index_tuple_cost = 0.005 # same scale as above
#cpu_operator_cost = 0.0025 # same scale as above
#effective_cache_size = 6000MB

The output of select * from pg_statistics is large...should I attach it as
a separate file (not sure if that's allowed on these mailing lists)

The data is ~2.5gb, I can't think of any place I can upload it. I can
provide the columns and data type. it's a subset of public data from
usaspending.gov

column_name, datatype, ordinal
position, nullable?
idx integer 1 YES obligatedamount double precision 2 YES
baseandexercisedoptionsvalue double precision 3 YES
baseandalloptionsvalue double
precision 4 YES maj_fund_agency_cat character varying 5 YES
contractingofficeagencyid character varying 6 YES contractingofficeid
character
varying 7 YES fundingrequestingagencyid character varying 8 YES
fundingrequestingofficeid character varying 9 YES signeddate date 10 YES
effectivedate date 11 YES currentcompletiondate date 12 YES
ultimatecompletiondate date 13 YES lastdatetoorder character varying 14 YES
typeofcontractpricing character varying 15 YES multiyearcontract character
varying 16 YES vendorname character varying 17 YES dunsnumber character
varying 18 YES parentdunsnumber character varying 19 YES psc_cat character
varying 20 YES productorservicecode character varying 21 YES
principalnaicscode character varying 22 YES piid character varying 23 YES
modnumber character varying 24 YES fiscal_year character varying 25 YES
idvpiid character varying 26 YES extentcompeted character varying 27 YES
numberofoffersreceived double precision 28 YES competitiveprocedures character
varying 29 YES solicitationprocedures character varying 30 YES
evaluatedpreference character varying 31 YES firm8aflag character varying
32 YES sdbflag character varying 33 YES
issbacertifiedsmalldisadvantagedbusiness character varying 34 YES
womenownedflag character varying 35 YES veteranownedflag character varying
36 YES minorityownedbusinessflag character varying 37 YES data_source text
38 YES psc_cd character varying 39 YES

On Mon, Aug 26, 2013 at 9:40 AM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:

> On 26 Srpen 2013, 15:02, Adam Ma'ruf wrote:
> > Sure
> >
> > I just upgraded to 9.2.4. The query is:
> > SELECT quebec_four
> > , sierra
> > , SUM(dollaramount) as dollaramount
> > FROM alpha_quebec_echo
> > GROUP BY quebec_four
> > , sierra
> >
> > alpha_quebec_echo has 5,409,743 rows and 39 columns. Quebec_four and
> > sierra are both varchar, dollar amount is a floating point field. It has
> > no indexes (but neither did the mssql express table). Any other details
> > you need?
> >
> > Thanks,
> > A
>
> Hi,
>
> It's quite clear why the query is so slow - the plan is using on-disk sort
> with ~5M rows, and that's consuming a lot of time (almost 120 seconds).
>
> I'm wondering why it chose the sort in the first place. I'd guess it'll
> choose hash aggregate, which does not require sorted input.
>
> Can you try running "set enable_sort = false" and then explain of the
> query?
>
> If that does not change the plan to "HashAggregate" instead of
> "GroupAggregate", please check and post values of enable_* and cost_*
> variables.
>
> Another question is why it's doing the sort on disk and not in memory. The
> explain you've posted shows it requires ~430MB on disk, and in my
> experience it usually requires ~3x that much to do the in-memory sort.
>
> I see you've set work_mem=4GB, is that correct? Can you try with a lower
> value - say, 1 or 2GB? I'm not sure how this works on Windows, though.
> Maybe there's some other limit (and SQL Server is not hitting it, because
> it's native Windows application).
>
> Can you prepare a testcase (table structure + data) and post it somewhere?
> Or at least the structure, if it's not possible to share the data.
>
> Also, output from "select * from pg_settings" would be helpful.
>
> Tomas
>
>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Дмитрий Дегтярёв 2013-08-27 07:57:20 Cpu usage 100% on slave. s_lock problem.
Previous Message Tomas Vondra 2013-08-26 13:40:15 Re: Poor performance on simple queries compared to sql server express