Re: How to measure query time - with warm up and cached data

From: Neto pr <netoprbr9(at)gmail(dot)com>
To: "Peter J(dot) Holzer" <hjp-pgsql(at)hjp(dot)at>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: How to measure query time - with warm up and cached data
Date: 2018-01-22 00:54:18
Message-ID: CA+TZvYJpiLDZ88rd0gp6WviX_ti1NRcDjG3xw6OfL4+_DBS40A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

2018-01-21 13:53 GMT-08:00 Peter J. Holzer <hjp-pgsql(at)hjp(dot)at>:

> On 2018-01-21 12:45:54 -0800, Neto pr wrote:
> > I need to know the actual execution time of a query, but considering
> that the
> > data is already cached. I also need to make sure that cached data from
> other
> > queries is cleared.
> > I believe that in order to know the real time of a query it will be
> necessary
> > to "warm up" the data to be inserted in cache.
> >
> > Below are the steps suggested by a DBA for me:
> >
> > Step 1- run ANALYZE on all tables involved before the test;
> > Step 2- restart the DBMS (to clear the DBMS cache);
> > Step 3- erase the S.O. cache;
>
> Did you mean "OS cache" (operating system cache)?
>
>
Yes, Operating System cache... S.O. = Sistema Operacional in portuguese,
it was a translation error!!

To restart the DBMS and clear the cache of O.S. I execute this commands in
linux Debian8.

/etc/init.d/pgsql stop
sync

echo "clear cache !!!!!!"

echo 3 > /proc/sys/vm/drop_caches
/etc/init.d/pgsql start

> Step 4- execute at least 5 times the same query.
> >
> > After the actual execution time of the query, it would have to take the
> time of
> > the query that is in the "median" among all.
>
> If you do this, clearing the caches before the tests will probably have
> little
> effekt. The first query will fill the cache with the data needed for
> your query (possibly evicting other data) and the next 4 will work on
> the cached data.

Yes, I believe that the first execution can be discarded, because the data
is accommodating in the cache ... the ideal is considered only the others
after the first one.

> Whether the cache was empty or full before the first
> query will make little difference to the median, because the first query
> will almost certainly be discarded as an outlier.
>
> Flushing out caches is very useful if you want to measure performance
> without caches (e.g. if you want to determine what the performance
> impact of a server reboot is).
>
>
> > Example:
> >
> > Execution 1: 07m 58s
> > Execution 2: 14m 51s
> > Execution 3: 17m 59s
> > Execution 4: 17m 55s
> > Execution 5: 17m 07s
>
> Are these real measurements or did you make them up? They look weird.
> Normally the first run is by far the slowest, then the others are very
> similar, sometimes with a slight improvement (especially between the 2nd
> and 3rd). But in your case it is just the opposite.
>

Yes, they are real information from TPC-H query 9.
I can not understand why in several tests I have done here, the first
execution is executed faster, even without indexes, and theoretically
without cache.

If someone wants to see the execution plans and other information the
worksheet with results is at the following link:
https://sites.google.com/site/eletrolareshop/repository/Result_80gb-SSD-10_exec_v4.ods

I thought it was because my CPU was working with variance .. but I
configured the BIOS it as " OS Control" and in " Performance" CPU mode in
Linux Debian8. See below:
-----------------------------------------------------------

user1(at)hp110deb8:~/Desktop$ cpufreq-info | grep 'current CPU fr'
current CPU frequency is 2.80 GHz.
current CPU frequency is 2.80 GHz.
current CPU frequency is 2.80 GHz.
current CPU frequency is 2.80 GHz.
--------------------------------------

Apparently the processor is not working variably now.
Any idea why the first execution can be faster in many cases?

Best Regards
Neto

> > [cleardot]
>
> Sending Webbugs to a mailinglist?
>
> hp
>
> --
> _ | Peter J. Holzer | we build much bigger, better disasters now
> |_|_) | | because we have much more sophisticated
> | | | hjp(at)hjp(dot)at | management tools.
> __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Michael Paquier 2018-01-22 01:07:44 Re: Best non-networked front end for postgresql
Previous Message Peter J. Holzer 2018-01-21 21:53:04 Re: How to measure query time - with warm up and cached data