Re: query performance with hstore vs. non-hstore

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: "Huang, Suya" <Suya(dot)Huang(at)au(dot)experian(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: query performance with hstore vs. non-hstore
Date: 2014-09-01 07:06:42
Message-ID: CAFj8pRBVq5XJr4ouJoh96pTaSwJThn7dWt7e6UG4N8cr89H0rg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

2014-09-01 8:54 GMT+02:00 Huang, Suya <Suya(dot)Huang(at)au(dot)experian(dot)com>:

> Thank you Pavel.
>
>
>
> The cost of unpacking hstore comparing to non-hstore could be calculated
> by:
>
> Seq scan on hstore table + hash join with seg1 table:
>
> Hstore: 416.741+ 34619.879 =~34 seconds
>
> Non-hstore: 8858.594 +26477.652 =~ 34 seconds
>
>
>
> The subsequent hash-aggregate and sort operation should be working on the
> unpacked hstore rows which has same row counts as non-hstore table.
> however, timing on those operations actually makes the big difference.
>

>
> I don’t quite get why…
>

These values can be messy -- timing in EXPLAIN ANALYZE has relative big
impact but different for some methods

try to watch complete time for EXPLAIN (ANALYZE, TIMING OFF)

>
>
> Thanks,
>
> Suya
>
>
>
> *From:* Pavel Stehule [mailto:pavel(dot)stehule(at)gmail(dot)com]
> *Sent:* Monday, September 01, 2014 4:22 PM
> *To:* Huang, Suya
> *Cc:* pgsql-performance(at)postgresql(dot)org
> *Subject:* Re: [PERFORM] query performance with hstore vs. non-hstore
>
>
>
> Hi
>
> In this use case hstore should not help .. there is relative high overhead
> related with unpacking hstore -- so classic schema is better.
>
> Hstore should not to replace well normalized schema - it should be a
> replace for some semi normalized structures as EAV.
>
> Hstore can have some profit from TOAST .. comprimation, less system data
> overhead, but this advantage started from some length of data. You should
> to see this benefit on table size. When table with HStore is less than
> without, then there is benefit of Hstore. Last benefit of Hstore are
> indexes over tuple (key, value) .. but you don't use it.
>
> Regards
>
> Pavel
>
>
>
> 2014-09-01 8:10 GMT+02:00 Huang, Suya <Suya(dot)Huang(at)au(dot)experian(dot)com>:
>
> Hi ,
>
>
>
> I’m tweaking table layout to get better performance of query. One table
> doesn’t use hstore but expand all metrics of cha_type to different rows.
> The other table has hstore for metrics column as cha_type->metrics so it
> has less records than the first one.
>
>
>
> I would be expecting the query on seconds table has better performance
> than the first one. However, it’s not the case at all. I’m wondering if
> there’s something wrong with my execution plan? With the hstore table, the
> optimizer has totally wrong estimation on row counts at hash aggregate
> stage and it takes 34 seconds on hash-join,25 seconds on hash-aggregate, 10
> seconds on sort. However, with non-hstore table, it takes 17 seconds on
> hash join, 18 seconds on hashaggregate and 2 seconds on sort.
>
>
>
> Can someone help me to explain why this is happening? And is there a way
> to fine-tune the query?
>
>
>
> Table structure
>
>
>
> dev=# \d+ weekly_non_hstore
>
> Table "test.weekly_non_hstore"
>
> Column | Type | Modifiers | Storage | Stats target |
> Description
>
>
> ----------+------------------------+-----------+----------+--------------+-------------
>
> date | date | | plain | |
>
> ref_id | character varying(256) | | extended | |
>
> cha_typel | text | | extended | |
>
> visits | double precision | | plain | |
>
> pages | double precision | | plain | |
>
> duration | double precision | | plain | |
>
> Has OIDs: no
>
> Tablespace: "tbs_data"
>
>
>
> dev=# \d+ weekly_hstore
>
> Table "test.weekly_hstore"
>
> Column | Type | Modifiers | Storage | Stats target |
> Description
>
>
> ----------+------------------------+-----------+----------+--------------+-------------
>
> date | date | | plain | |
>
> ref_id | character varying(256) | | extended | |
>
> visits | hstore | | extended | |
>
> pages | hstore | | extended | |
>
> duration | hstore | | extended | |
>
> Has OIDs: no
>
> Tablespace: "tbs_data"
>
>
>
> dev=# select count(*) from weekly_non_hstore;
>
> count
>
> ----------
>
> 71818882
>
> (1 row)
>
>
>
>
>
> dev=# select count(*) from weekly_hstore;
>
> count
>
> ---------
>
> 1292314
>
> (1 row)
>
>
>
>
>
> Query
>
> dev=# explain analyze select cha_type,sum(visits) from weekly_non_hstore
> a join seg1 b on a.ref_id=b.ref_id group by cha_type order by sum(visits)
> desc;
>
>
> QUERY PLAN
>
>
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Sort (cost=3674073.37..3674431.16 rows=143115 width=27) (actual
> time=47520.637..47969.658 rows=3639539 loops=1)
>
> Sort Key: (sum(a.visits))
>
> Sort Method: quicksort Memory: 391723kB
>
> -> HashAggregate (cost=3660386.70..3661817.85 rows=143115 width=27)
> (actual time=43655.637..44989.202 rows=3639539 loops=1)
>
> -> Hash Join (cost=12029.58..3301286.54 rows=71820032 width=27)
> (actual time=209.789..26477.652 rows=36962761 loops=1)
>
> Hash Cond: ((a.ref_id)::text = (b.ref_id)::text)
>
> -> Seq Scan on weekly_non_hstore a (cost=0.00..1852856.32
> rows=71820032 width=75) (actual time=0.053..8858.594 rows=71818882 loops=1)
>
> -> Hash (cost=7382.59..7382.59 rows=371759 width=47)
> (actual time=209.189..209.189 rows=371759 loops=1)
>
> Buckets: 65536 Batches: 1 Memory Usage: 28951kB
>
> -> Seq Scan on seg1 b (cost=0.00..7382.59
> rows=371759 width=47) (actual time=0.014..64.695 rows=371759 loops=1)
>
> Total runtime: 48172.405 ms
>
> (11 rows)
>
>
>
> Time: 48173.569 ms
>
>
>
> dev=# explain analyze select cha_type, sum(visits) from (select
> (each(visits)).key as cha_type,(each(visits)).value::numeric as visits from
> weekly_hstore a join seg1 b on a.ref_id=b.ref_id )foo group by cha_type
> order by sum(visits) desc;
>
> QUERY
> PLAN
>
>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Sort (cost=7599039.89..7599040.39 rows=200 width=64) (actual
> time=70424.561..70986.202 rows=3639539 loops=1)
>
> Sort Key: (sum((((each(a.visits)).value)::numeric)))
>
> Sort Method: quicksort Memory: 394779kB
>
> -> HashAggregate (cost=7599030.24..7599032.24 rows=200 width=64)
> (actual time=59267.120..60502.647 rows=3639539 loops=1)
>
> -> Hash Join (cost=12029.58..2022645.24 rows=371759000
> width=184) (actual time=186.140..34619.879 rows=36962761 loops=1)
>
> Hash Cond: ((a.ref_id)::text = (b.ref_id)::text)
>
> -> Seq Scan on weekly_hstore a (cost=0.00..133321.14
> rows=1292314 width=230) (actual time=0.107..416.741 rows=1292314 loops=1)
>
> -> Hash (cost=7382.59..7382.59 rows=371759 width=47)
> (actual time=185.742..185.742 rows=371759 loops=1)
>
> Buckets: 65536 Batches: 1 Memory Usage: 28951kB
>
> -> Seq Scan on seg1 b (cost=0.00..7382.59
> rows=371759 width=47) (actual time=0.016..62.123 rows=371759 loops=1)
>
> Total runtime: 71177.675 ms
>
>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Huang, Suya 2014-09-02 00:53:27 Re: query performance with hstore vs. non-hstore
Previous Message Huang, Suya 2014-09-01 06:54:34 Re: query performance with hstore vs. non-hstore