Re: Queries on very big table

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Job <Job(at)colliniconsulting(dot)it>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Queries on very big table
Date: 2017-01-06 20:34:26
Message-ID: CAHyXU0zXuw01aSVpTyx0KiKdKg60UZNJ920n7Q0pX_FNaNHQVw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Jan 2, 2017 at 5:23 AM, Job <Job(at)colliniconsulting(dot)it> wrote:
> Hello guys and very good new year to everybody!
>
> We are now approaching some queries and statistics on very big table (about 180 millions of record).
> The table is partitioned by day (about ~3 Gb of data for every partition/day).
> We use Postgresql 9.6.1
>
> I am experiencing quite important slowdown on queries.
> I manually made a "vacuum full" and a "reindex" on every partition in order to clean free space and reorder records.
>
> I have a BRIN index on timestamp and index on other field (btree)
>
> Starting by a simple query: explain analyze select count(domain) from webtraffic_archive:
>
> QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Finalize Aggregate (cost=3220451.94..3220451.95 rows=1 width=8) (actual time=36912.624..36912.624 rows=1 loops=1)
> -> Gather (cost=3220451.52..3220451.93 rows=4 width=8) (actual time=36911.600..36912.614 rows=5 loops=1)
> Workers Planned: 4
> Workers Launched: 4
> -> Partial Aggregate (cost=3219451.52..3219451.53 rows=1 width=8) (actual time=36906.804..36906.804 rows=1 loops=5)
> -> Append (cost=0.00..3094635.41 rows=49926443 width=0) (actual time=4.716..31331.229 rows=39853988 loops=5)
> -> Parallel Seq Scan on webtraffic_archive (cost=0.00..0.00 rows=1 width=0) (actual time=0.001..0.001 rows=0 loops=5)
> -> Parallel Seq Scan on webtraffic_archive_day_2016_01_01 (cost=0.00..10.47 rows=47 width=0) (actual time=0.000..0.000 rows=0 loops=5)
> -> Parallel Seq Scan on webtraffic_archive_day_2016_12_16 (cost=0.00..213728.26 rows=3498026 width=0) (actual time=4.713..2703.458 rows=2798421 loops=5)
> -> Parallel Seq Scan on webtraffic_archive_day_2016_12_17 (cost=0.00..201379.39 rows=3247739 width=0) (actual time=6.334..2364.726 rows=2598191 loops=5)
> -> Parallel Seq Scan on webtraffic_archive_day_2016_12_18 (cost=0.00..176248.86 rows=2824986 width=0) (actual time=7.437..2014.812 rows=2259989 loops=5)
> -> Parallel Seq Scan on webtraffic_archive_day_2016_12_19 (cost=0.00..177493.33 rows=2866433 width=0) (actual time=9.951..2145.958 rows=2293146 loops=5)
> -> Parallel Seq Scan on webtraffic_archive_day_2016_12_20 (cost=0.00..120271.83 rows=1960883 width=0) (actual time=0.011..372.092 rows=1568706 loops=5)
> -> Parallel Seq Scan on webtraffic_archive_day_2016_12_21 (cost=0.00..276391.94 rows=4485294 width=0) (actual time=5.386..3111.589 rows=3588235 loops=5)
> -> Parallel Seq Scan on webtraffic_archive_day_2016_12_22 (cost=0.00..287611.68 rows=4630668 width=0) (actual time=6.598..3335.834 rows=3704535 loops=5)
> -> Parallel Seq Scan on webtraffic_archive_day_2016_12_23 (cost=0.00..249047.61 rows=4014361 width=0) (actual time=7.206..2628.884 rows=3211489 loops=5)
> -> Parallel Seq Scan on webtraffic_archive_day_2016_12_24 (cost=0.00..192008.70 rows=3097370 width=0) (actual time=9.870..1882.826 rows=2477896 loops=5)
> -> Parallel Seq Scan on webtraffic_archive_day_2016_12_25 (cost=0.00..87385.16 rows=1405616 width=0) (actual time=0.018..427.248 rows=1124493 loops=5)
> -> Parallel Seq Scan on webtraffic_archive_day_2016_12_26 (cost=0.00..88262.80 rows=1436080 width=0) (actual time=0.014..277.327 rows=1148864 loops=5)
> -> Parallel Seq Scan on webtraffic_archive_day_2016_12_27 (cost=0.00..222607.43 rows=3557243 width=0) (actual time=8.497..1232.210 rows=2845795 loops=5)
> -> Parallel Seq Scan on webtraffic_archive_day_2016_12_28 (cost=0.00..210414.76 rows=3365676 width=0) (actual time=0.033..548.878 rows=2692541 loops=5)
> -> Parallel Seq Scan on webtraffic_archive_day_2016_12_29 (cost=0.00..185065.72 rows=2955872 width=0) (actual time=0.031..498.079 rows=2364697 loops=5)
> -> Parallel Seq Scan on webtraffic_archive_day_2016_12_30 (cost=0.00..149139.55 rows=2382656 width=0) (actual time=0.011..501.351 rows=1906124 loops=5)
> -> Parallel Seq Scan on webtraffic_archive_day_2016_12_31 (cost=0.00..166991.89 rows=2664288 width=0) (actual time=0.041..437.631 rows=2131431 loops=5)
> -> Parallel Seq Scan on webtraffic_archive_day_2017_01_01 (cost=0.00..79197.29 rows=1260930 width=0) (actual time=0.018..254.124 rows=1008744 loops=5)
> -> Parallel Seq Scan on webtraffic_archive_day_2017_01_02 (cost=0.00..11378.74 rows=272274 width=0) (actual time=0.017..34.352 rows=130691 loops=5)
> Planning time: 313.907 ms
> Execution time: 36941.700 ms
>
> Other more complex queries are slower.
>
> How can i improve it?
> Records number can raise up until 1.000 millions.
> Do i need a third-part tool for big data?

What storage do you have?

merlin

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Alban Hertroys 2017-01-06 21:56:09 Re: Postgres 9.6.1 big slowdown by upgrading 8.4.22
Previous Message Pierre Ducroquet 2017-01-06 16:00:12 Re: PostgreSQL not reusing free space in table ?