Re: query performance with hstore vs. non-hstore

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: "Huang, Suya" <Suya(dot)Huang(at)au(dot)experian(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: query performance with hstore vs. non-hstore
Date: 2014-09-01 06:21:58
Message-ID: CAFj8pRD+R7=Oka-e_Bq0qnErwUu6TPYcT9gLPWPehkt=3_-LDg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi

In this use case hstore should not help .. there is relative high overhead
related with unpacking hstore -- so classic schema is better.

Hstore should not to replace well normalized schema - it should be a
replace for some semi normalized structures as EAV.

Hstore can have some profit from TOAST .. comprimation, less system data
overhead, but this advantage started from some length of data. You should
to see this benefit on table size. When table with HStore is less than
without, then there is benefit of Hstore. Last benefit of Hstore are
indexes over tuple (key, value) .. but you don't use it.

Regards

Pavel

2014-09-01 8:10 GMT+02:00 Huang, Suya <Suya(dot)Huang(at)au(dot)experian(dot)com>:

> Hi ,
>
>
>
> I’m tweaking table layout to get better performance of query. One table
> doesn’t use hstore but expand all metrics of cha_type to different rows.
> The other table has hstore for metrics column as cha_type->metrics so it
> has less records than the first one.
>
>
>
> I would be expecting the query on seconds table has better performance
> than the first one. However, it’s not the case at all. I’m wondering if
> there’s something wrong with my execution plan? With the hstore table, the
> optimizer has totally wrong estimation on row counts at hash aggregate
> stage and it takes 34 seconds on hash-join,25 seconds on hash-aggregate, 10
> seconds on sort. However, with non-hstore table, it takes 17 seconds on
> hash join, 18 seconds on hashaggregate and 2 seconds on sort.
>
>
>
> Can someone help me to explain why this is happening? And is there a way
> to fine-tune the query?
>
>
>
> Table structure
>
>
>
> dev=# \d+ weekly_non_hstore
>
> Table "test.weekly_non_hstore"
>
> Column | Type | Modifiers | Storage | Stats target |
> Description
>
>
> ----------+------------------------+-----------+----------+--------------+-------------
>
> date | date | | plain | |
>
> ref_id | character varying(256) | | extended | |
>
> cha_typel | text | | extended | |
>
> visits | double precision | | plain | |
>
> pages | double precision | | plain | |
>
> duration | double precision | | plain | |
>
> Has OIDs: no
>
> Tablespace: "tbs_data"
>
>
>
> dev=# \d+ weekly_hstore
>
> Table "test.weekly_hstore"
>
> Column | Type | Modifiers | Storage | Stats target |
> Description
>
>
> ----------+------------------------+-----------+----------+--------------+-------------
>
> date | date | | plain | |
>
> ref_id | character varying(256) | | extended | |
>
> visits | hstore | | extended | |
>
> pages | hstore | | extended | |
>
> duration | hstore | | extended | |
>
> Has OIDs: no
>
> Tablespace: "tbs_data"
>
>
>
> dev=# select count(*) from weekly_non_hstore;
>
> count
>
> ----------
>
> 71818882
>
> (1 row)
>
>
>
>
>
> dev=# select count(*) from weekly_hstore;
>
> count
>
> ---------
>
> 1292314
>
> (1 row)
>
>
>
>
>
> Query
>
> dev=# explain analyze select cha_type,sum(visits) from weekly_non_hstore
> a join seg1 b on a.ref_id=b.ref_id group by cha_type order by sum(visits)
> desc;
>
>
> QUERY PLAN
>
>
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Sort (cost=3674073.37..3674431.16 rows=143115 width=27) (actual
> time=47520.637..47969.658 rows=3639539 loops=1)
>
> Sort Key: (sum(a.visits))
>
> Sort Method: quicksort Memory: 391723kB
>
> -> HashAggregate (cost=3660386.70..3661817.85 rows=143115 width=27)
> (actual time=43655.637..44989.202 rows=3639539 loops=1)
>
> -> Hash Join (cost=12029.58..3301286.54 rows=71820032 width=27)
> (actual time=209.789..26477.652 rows=36962761 loops=1)
>
> Hash Cond: ((a.ref_id)::text = (b.ref_id)::text)
>
> -> Seq Scan on weekly_non_hstore a (cost=0.00..1852856.32
> rows=71820032 width=75) (actual time=0.053..8858.594 rows=71818882 loops=1)
>
> -> Hash (cost=7382.59..7382.59 rows=371759 width=47)
> (actual time=209.189..209.189 rows=371759 loops=1)
>
> Buckets: 65536 Batches: 1 Memory Usage: 28951kB
>
> -> Seq Scan on seg1 b (cost=0.00..7382.59
> rows=371759 width=47) (actual time=0.014..64.695 rows=371759 loops=1)
>
> Total runtime: 48172.405 ms
>
> (11 rows)
>
>
>
> Time: 48173.569 ms
>
>
>
> dev=# explain analyze select cha_type, sum(visits) from (select
> (each(visits)).key as cha_type,(each(visits)).value::numeric as visits from
> weekly_hstore a join seg1 b on a.ref_id=b.ref_id )foo group by cha_type
> order by sum(visits) desc;
>
> QUERY
> PLAN
>
>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Sort (cost=7599039.89..7599040.39 rows=200 width=64) (actual
> time=70424.561..70986.202 rows=3639539 loops=1)
>
> Sort Key: (sum((((each(a.visits)).value)::numeric)))
>
> Sort Method: quicksort Memory: 394779kB
>
> -> HashAggregate (cost=7599030.24..7599032.24 rows=200 width=64)
> (actual time=59267.120..60502.647 rows=3639539 loops=1)
>
> -> Hash Join (cost=12029.58..2022645.24 rows=371759000
> width=184) (actual time=186.140..34619.879 rows=36962761 loops=1)
>
> Hash Cond: ((a.ref_id)::text = (b.ref_id)::text)
>
> -> Seq Scan on weekly_hstore a (cost=0.00..133321.14
> rows=1292314 width=230) (actual time=0.107..416.741 rows=1292314 loops=1)
>
> -> Hash (cost=7382.59..7382.59 rows=371759 width=47)
> (actual time=185.742..185.742 rows=371759 loops=1)
>
> Buckets: 65536 Batches: 1 Memory Usage: 28951kB
>
> -> Seq Scan on seg1 b (cost=0.00..7382.59
> rows=371759 width=47) (actual time=0.016..62.123 rows=371759 loops=1)
>
> Total runtime: 71177.675 ms
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Huang, Suya 2014-09-01 06:54:34 Re: query performance with hstore vs. non-hstore
Previous Message Huang, Suya 2014-09-01 06:10:35 query performance with hstore vs. non-hstore