Re: strange sql behavior

From: Yu Nie <nieyu04(at)gmail(dot)com>
To: melvin6925 <melvin6925(at)gmail(dot)com>
Cc: Bill Moran <wmoran(at)potentialtech(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: strange sql behavior
Date: 2016-02-01 20:01:02
Message-ID: CAPQGz4Juzxq4Y9fnuqO0FQXNKZTBVZHV7Vf4t2G6jTf6-u2H1Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thanks, Bill and Melvin!

Just some quick note/answers before I absorb all the information provided
by Bill.

1. I don't expect many users running queries against the tables, especially
for the small table - since I just created it this morning, and nobody know
about it except myself.

2. The setting in the config:

shared_buffers = 512MB # min 128kB
work_mem = 128MB # min 64kB

3. I am running a Windows 7 with 24 GB RAM. and my postgresal is 9.4.

4. here is the query I ran:

SELECT n.nspname,
s.relname,
c.reltuples::bigint,
n_tup_ins,
n_tup_upd,
n_tup_del,
date_trunc('second', last_vacuum) as last_vacuum,
date_trunc('second', last_autovacuum) as last_autovacuum,
date_trunc('second', last_analyze) as last_analyze,
date_trunc('second', last_autoanalyze) as last_autoanalyze
,
round( current_setting('autovacuum_vacuum_threshold')::integer +
current_setting('autovacuum_vacuum_scale_factor')::numeric * C.reltuples)
AS av_threshold
FROM pg_stat_all_tables s
JOIN pg_class c ON c.oid = s.relid
JOIN pg_namespace n ON (n.oid = c.relnamespace)
WHERE s.relname NOT LIKE 'pg_%'
AND s.relname NOT LIKE 'sql_%'
AND s.relname IN ('data2013_01w', 'data2011_01')
ORDER by 1, 2;

I copied the result from PGAdmin directly, here it is again.:

public;"data2011_01";784236864;784236885;0;0;"";"";"";"2016-01-19
17:31:08-06";156847423
public;"data2013_01w";300786432;300786444;0;0;"";"";"2016-02-01
08:57:24-06";"2016-02-01 04:01:04-06";60157336

On Mon, Feb 1, 2016 at 1:45 PM, melvin6925 <melvin6925(at)gmail(dot)com> wrote:

> Thanks Bill.
> Also, it's very important to include the headers with the queries!
>
> Marco,
> There is no top secret information that is requested, so please do not
> edit the output.
>
>
>
> Sent via the Samsung Galaxy S® 6, an AT&T 4G LTE smartphone
> -------- Original message --------
> From: Bill Moran <wmoran(at)potentialtech(dot)com>
> Date: 2/1/2016 14:41 (GMT-05:00)
> To: Yu Nie <nieyu04(at)gmail(dot)com>
> Cc: Melvin Davidson <melvin6925(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
> Subject: Re: [GENERAL] strange sql behavior
>
>
> Came a little late to the thread, see many comments inline below:
>
> On Mon, 1 Feb 2016 13:16:13 -0600
> Yu Nie <nieyu04(at)gmail(dot)com> wrote:
>
> > Thanks a lot for your reply. I ran the query you suggested and here are
> > the results
> >
> > Large table: "public";"data2011_01";784236885;0;0;"";"";"";"2016-01-19
> > 17:31:08-06";156847423
> > Small table: "public";"data2013_01w";300786444;0;0;"";"";"2016-02-01
> > 08:57:24-06";"2016-02-01 04:01:04-06";60157336
>
> You didn't do Melvin's query correctly. He specified 11 columns to
> select, but you only show 10. Since you don't show the query you
> actually ran, we don't know which of the numeric columns is missing.
>
> More information inline below:
>
> >
> > On Mon, Feb 1, 2016 at 1:00 PM, Melvin Davidson <melvin6925(at)gmail(dot)com>
> > wrote:
> >
> > > One thing to look at is the last time both tables were
> vacuumed/analyzed.
> > >
> > > SELECT n.nspname,
> > > s.relname,
> > > c.reltuples::bigint,
> > > n_tup_ins,
> > > n_tup_upd,
> > > n_tup_del,
> > > date_trunc('second', last_vacuum) as last_vacuum,
> > > date_trunc('second', last_autovacuum) as last_autovacuum,
> > > date_trunc('second', last_analyze) as last_analyze,
> > > date_trunc('second', last_autoanalyze) as last_autoanalyze
> > > ,
> > > round( current_setting('autovacuum_vacuum_threshold')::integer +
> > > current_setting('autovacuum_vacuum_scale_factor')::numeric *
> C.reltuples)
> > > AS av_threshold
> > > FROM pg_stat_all_tables s
> > > JOIN pg_class c ON c.oid = s.relid
> > > JOIN pg_namespace n ON (n.oid = c.relnamespace)
> > > WHERE s.relname NOT LIKE 'pg_%'
> > > AND s.relname NOT LIKE 'sql_%'
> > > AND s.relname IN ' "your_small_table", "your_large_table"'
> > > ORDER by 1, 2;
> > >
> > >
> > > Also, please confirm the indexes for both tables are using the same
> method
> > > (btree?).
> > >
> > > On Mon, Feb 1, 2016 at 1:35 PM, Yu Nie <nieyu04(at)gmail(dot)com> wrote:
> > >
> > >> Hi there,
> > >>
> > >> Recently I am working with a large amount of taxis GIS data and had
> > >> encountered some weird performance issues. I am hoping someone in
> this
> > >> community can help me figure it out.
> > >>
> > >> The taxi data were loaded in 5 minute block into a table. I have two
> > >> separate such tables, one stores a month of data with about 700
> million
> > >> rows, another stores about 10 days of data with about 300 million
> rows.
> > >> The two tables have the exactly same schema and indexes. There are two
> > >> indexes: one on taxiid (text), and the other on the time stamp (date
> > >> time). In order to process the data, I need to get all points for a
> single
> > >> taxis; to do that, I use something like:
> > >> select * from table1 where taxiid = 'SZB00S41' order by time;
> > >> What puzzled me greatly is that this query runs consistently much
> faster
> > >> for the large table than for the small table, which seems to
> contradict
> > >> with intuition. At the end of message you may find explain (analyze
> > >> buffer) results of two particular queries for the same taxiid (one
> for each
> > >> table). You can see that it took much longer (more than 20 times) to
> get
> > >> 20k rows from the small table than to get 44 k rows from the large
> table.
> > >> Interestingly it seems that the planner does expect about 1/3 work
> for the
> > >> small table query - yet for some reason, it took much longer to fetch
> the
> > >> rows from the small table. Why there is such a huge performance
> between
> > >> the two seemingly identical queries executed on two different tables?
> > >>
> > >> Is is because the data on the second table is on some mysteriously
> > >> "broken part" of the disk? what else could explain such a bizarre
> > >> behavior? Your help is greatly appreciated.
> > >>
> > >> The above behavior is consistent through all queries. Another issue
> I
> > >> identified is that for the large table, the query can use the shared
> buffer
> > >> more effectively. For example, after you query one taxiid and
> immediately
> > >> following that query run the same query for another taxi whose id
> ranks
> > >> right behind the first id, then shared hit buffers would be quite
> high (and
> > >> the query would run much faster); this however never works for the
> small
> > >> table.
> > >>
> > >> Thanks a lot!
> > >>
> > >> Best, Marco
> > >>
> > >>
> > >> Results for the small table: it took 141 seconds to finish. The
> planning
> > >> time is 85256.31
> > >>
> > >> "Sort (cost=85201.05..85256.31 rows=22101 width=55) (actual
> > >> time=141419.499..141420.025 rows=20288 loops=1)"
> > >> " Sort Key: "time""
> > >> " Sort Method: quicksort Memory: 3622kB"
> > >> " Buffers: shared hit=92 read=19816"
> > >> " -> Bitmap Heap Scan on data2013_01w (cost=515.86..83606.27
> > >> rows=22101 width=55) (actual time=50.762..141374.777 rows=20288
> loops=1)"
> > >> " Recheck Cond: ((taxiid)::text = 'SZB00S41'::text)"
> > >> " Heap Blocks: exact=19826"
> > >> " Buffers: shared hit=92 read=19816"
>
> ^^
> Note that despite this table being smaller, Postgres had to read 19816
> blocks from disk, which is 2.5x more than the larger table.
>
> > >> " -> Bitmap Index Scan on data2013_01w_ixtaxiid
> > >> (cost=0.00..510.33 rows=22101 width=0) (actual time=26.053..26.053
> > >> rows=20288 loops=1)"
> > >> " Index Cond: ((taxiid)::text = 'SZB00S41'::text)"
> > >> " Buffers: shared hit=4 read=78"
> > >> "Planning time: 0.144 ms"
> > >> "Execution time: 141421.154 ms"
> > >>
> > >> Results for the large table: it took 5 seconds to finish. The
> planning
> > >> time is 252077.10
> > >> "Sort (cost=251913.32..252077.10 rows=65512 width=55) (actual
> > >> time=5038.571..5039.765 rows=44204 loops=1)"
> > >> " Sort Key: "time""
> > >> " Sort Method: quicksort Memory: 7753kB"
> > >> " Buffers: shared hit=2 read=7543"
> > >> " -> Bitmap Heap Scan on data2011_01 (cost=1520.29..246672.53
> > >> rows=65512 width=55) (actual time=36.935..5017.463 rows=44204
> loops=1)"
> > >> " Recheck Cond: ((taxiid)::text = 'SZB00S41'::text)"
> > >> " Heap Blocks: exact=7372"
> > >> " Buffers: shared hit=2 read=7543"
> > >> " -> Bitmap Index Scan on data2011_01_ixtaxiid
> > >> (cost=0.00..1503.92 rows=65512 width=0) (actual time=35.792..35.792
> > >> rows=44204 loops=1)"
>
> ^^
> Note that the larger table took LONGER to do the index work than the
> smaller table, which probably means there's nothing wrong with your
> disks or anything ... that's the behavior I would expect.
>
> > >> " Index Cond: ((taxiid)::text = 'SZB00S41'::text)"
> > >> " Buffers: shared hit=2 read=171"
> > >> "Planning time: 0.127 ms"
> > >> "Execution time: 5042.134 ms"
>
> So, what I'm seeing, is that Postgres is able to figure out _which_ rows
> to fetch faster on the small table than the large table, which is what
> you would expect, since a smaller index should be faster than a large one.
>
> However, when it goes to actually fetch the row data, it takes
> significantly longer on the small table, despite the fact that it's
> only fetching 1/3 as many rows. It is, however, doing 2.5x as many
> disk reads to get those rows: For the large table, it reads 61MB from
> disk, but it reads 160MB to get all the data for the smaller table.
>
> How the data was inserted into each table could lead to similar data
> being clustered on common pages on the large table, while it's spread
> across many more pages on the small table.
>
> That still doesn't explain it all, though. 2.5x the disk
> activity normally wouldn't equate to 28x the time required. Unless
> you're disks are horrifically slow? Does this server have a lot of
> other activity against the disks? I.e. are other people running
> queries that you would have to contend with, or is the server a VM
> sharing storage with other VMs, or even a combined use server that
> has to share disk access with (for example) a web or mail server
> as well? Is the performance difference consistently ~28x?
>
> Other things: what is shared_buffers set to? The queries would seem
> to indicate that this server has less than 1M of those two tables
> cached in memory at the time you ran those queries, which seems to
> suggest that either you've got shared_buffers set very low, or that
> there are a lot of other tables that other queries are accessing at
> the time you're running these. Perhaps installing pg_buffercache to
> have a look at what's using your shared_buffers would be helpful.
>
> --
> Bill Moran
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message melvin6925 2016-02-01 20:12:16 Re: strange sql behavior
Previous Message David G. Johnston 2016-02-01 19:56:05 Re: Check constraints and function volatility categories