From: | Mark Liberman <mliberman(at)mixedsignals(dot)com> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Help in avoiding a query 'Warm-Up' period/shared buffer cache |
Date: | 2006-01-06 02:15:36 |
Message-ID: | 200601051815.36181.mliberman@mixedsignals.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Thursday 05 January 2006 15:12, Qingqing Zhou wrote:
> "Mark Liberman" <mliberman(at)mixedsignals(dot)com> wrote
>
> > First run, after a night of inactivity:
> >
> > -> Bitmap Index Scan on
> > 1min_events_file_id_begin_idx (cost=0.00..37.85 rows=3670 width=0)
> > (actual time=313.468..313.468 rows=11082
> > loops=1)
> > Index Cond: (file_id = 137271)
> > Total runtime: 313.643 ms
> >
> > Second run, after that:
> >
> > -> Bitmap Index Scan on
> > 1min_events_file_id_begin_idx (cost=0.00..37.85 rows=3670 width=0)
> > (actual time=2.106..2.106 rows=11082 loops=1)
> > Index Cond: (file_id = 137271)
> > Total runtime: 2.276 ms
>
> It is clear that the first query takes longer time because of the IO time
> of index 1min_events_file_id_begin_idx (see 313.468 vs. 2.106). I am afraid
> currently there is no easy solution for this situation, unless you could
> predicate which part of relation/index your query will use, then you can
> preload or "warm-up" cache for it.
>
> Regards,
> Qingqing
Thanks Qingqing,
this actually helped me determine that the compound index,
1min_events_file_id_begin_idx, is not the proper index to use as it is based
on file_id and begin_time - the later of which is not involved in the where
clause. It is only using that index to "filter" out the listed file_id.
Now, my follow-up question / assumption. I am assuming that the IO time is
so long on that index because it has to read the entire index (for that
file_id) into memory (because it cannot just scan the rows with a certain
date range because we are not using begin_time in the where clause).
But, if I replaced that compound index with the proper compound index of
file_id / end_time, it would give similar performance results to the scan on
1min_events_end_idx (which was < 1 ms). E.g. the latest rows that were
updated are more likely to be in the cache - and it is smart enough to only
read the index rows that it needs.
Alternatively, I could create a single index on file_id (and rely upon the new
bitmap scan capabilities in 1.2). But, I fear that, although this will be
smaller than the erroneous compound index on file_id / begin_time, it will
still display the same behavior in that it will need to read all rows from
that index for the appropriate file_id - and since the data goes back every
minute for 60 days, that IO might be large.
Obviously, I will be testing this - but it might take a few days, as I haven't
figure out how to simulate the "period of inactivity" to get the data flushed
out of the cache ... so I have to run this each morning. But, any
confirmation / corrections to my assumptions are greatly appreciated. E.g. is
the compound index the way to go, or the solo index on file_id?
Thanks,
Mark
From | Date | Subject | |
---|---|---|---|
Next Message | David Lang | 2006-01-06 02:50:22 | Re: Help in avoiding a query 'Warm-Up' period/shared buffer |
Previous Message | Jim C. Nasby | 2006-01-06 01:08:22 | Re: improving write performance for logging |