Re: Using 'WITH SELECT' Results Do Not Match 'SELECT FROM <table>' Results

From: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
To: Jerry Regan <jerry(dot)regan(at)concertoglobalresources(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Using 'WITH SELECT' Results Do Not Match 'SELECT FROM <table>' Results
Date: 2017-07-03 23:21:46
Message-ID: 71a59bcf-000a-c56c-2748-fc6c66b29038@aklaver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 07/02/2017 10:33 AM, Jerry Regan wrote:
> For reasons beyond my control, I am using Postgresql 9.4 on a MacBookPro
> (development system). I use pgadminIII and psql for clients (I tried and
> didn’t like the pgadmin4 UI; pg_dump, pg_restore also seem to be dumbed
> down).
>
> My question:
> I have some performance test results in table cor_duration_report. One
> column, c_entered_ion is of type timestamptz, another, c_scenario, is of
> type text. I want to calculate the difference between succeeding
> c_entered_ion rows to learn the rate at which entry events occur. In
> cor_duration_report, c_entered_ion columns are NOT in ascending sort order.
>
> For a first attempt, I created another table cor_temp_gap as:
>
> CREATE TABLE cor_temp_gap
> (
> c_id serial NOT NULL,
> c_entered_ion timestamp with time zone NOT NULL,
> c_scenario text NOT NULL
> )
> WITH (
> OIDS=FALSE
> );
>
> and loaded it with:
>
> INSERT into cor_temp_gap (c_entered_ion, c_scenario) SELECT
> c_entered_ion, c_scenario from cor_duration_report order by
> c_entered_ion;
>
> The c_id column is loaded with the default value - the next sequence value.
>
> I then generated my report with:
>
> select count( gap ) as gaps, sum(gap) as sum,
> mode() within group (order by gap) as mode,
> percentile_disc(0.5) within group (order by gap) as median,
> avg( gap::integer ) as mean,
> min( gap ) as min,
> max( gap ) as max
> from ( select extract( epoch from ( f.c_entered_ion -
> s.c_entered_ion)::interval) * 1000 as gap
> from cor_temp_gap s, cor_temp_gap f
> where s.c_scenario = '20170628tc04'
> and s.c_id+1 = f.c_id ) vals;
>
>
> This seems to give me the results I want:
>
> gaps | sum | mode | median | mean
> | min | max
> ------+---------+------+--------+-----------------------+-----+--------
> 307412 | 6872207 | 1 | 8 | 22.3550381897908995
> | 0 | 10846
>
>
> The min value of zero is accurate. The mode value of 1 is reasonable, as
> is the median value of 8. Using a totally different method, the mean
> value is accurate, as is gaps (there are 307,413 rows in the table).
>
> I do know enough sql to believe my cor_temp_gap table could probably be
> replace by a ‘WITH SELECT….’
>
> I attempted this:
>
> with cor_entry_time as ( select nextval('cor_temp_select_c_id_seq')
> as c_id, c_entered_ion, c_scenario
> from cor_duration_report where c_scenario = '20170628tc04' order by
> c_entered_ion )
> select count( gap ) as gaps,
> sum(gap::integer) as sum,
> mode() within group (order by gap) as mode,
> percentile_disc(0.5) within group (order by gap) as median,
> avg( gap::integer ) as mean,
> min( gap::integer ) as min,
> max( gap::integer ) as max
> from ( select extract( epoch from ( f.c_entered_ion -
> s.c_entered_ion)::interval) * 1000 as gap
> from cor_entry_time s, cor_entry_time f
> where s.c_id+1 = f.c_id ) vals;

I used this site to reformat the above:

http://sqlformat.darold.net/

WITH cor_entry_time AS (
SELECT
nextval('cor_temp_select_c_id_seq') AS c_id,
c_entered_ion,
c_scenario
FROM
cor_duration_report
WHERE
c_scenario = '20170628tc04'
ORDER BY
c_entered_ion
)
SELECT
count(gap) AS gaps,
sum(gap::INTEGER) AS SUM,
MODE ()
WITHIN
GROUP (
ORDER BY
gap) AS MODE,
percentile_disc (0.5)
WITHIN
GROUP (
ORDER BY
gap) AS median,
avg(gap::INTEGER) AS mean,
min(gap::INTEGER) AS MIN,
max(gap::INTEGER) AS MAX
FROM (
SELECT
extract(EPOCH
FROM (f.c_entered_ion - s.c_entered_ion)::interval) * 1000 AS gap
FROM
cor_entry_time s,
cor_entry_time f
WHERE
s.c_id + 1 = f.c_id) vals;

Still have not figured out everything that is going on above, but it
gave me a fighting chance:)

>
>
> which returned:
>
> gaps | sum | mode | median | mean
> | min | max
> --------+---------+------+--------+---------------------+----------+-------
> 307412 | 6867802 | 0 | 461 | 22.3407088857949592 |
> -6871881 | 59791
>
> The results should match but obviously they don’t. the ‘mode’, ‘median’,
> ‘min’ and ‘max’ columns are clearly different. The ‘sum’ and ‘mean’
> columns are similar but are also different. Only ‘gaps’ is the same.
> There should be no negative numbers at all, assuming my c_entered_ion
> column is in ascending order. Wouldn’t the 'order by c_entered_ion’ in
> the ‘WITH’ select do that?

I believe you are going to have to move the ORDER BY to outside the
WITH. If I am following correctly:

FROM (
SELECT
extract(EPOCH
FROM (f.c_entered_ion - s.c_entered_ion)::interval) * 1000 AS gap
FROM
cor_entry_time s,
cor_entry_time f
WHERE
s.c_id + 1 = f.c_id
ORDER BY
s.c__entered_ion
) vals;

You can remove the:

ORDER BY
c_entered_ion

in the WITH.

>
> To me, the differences between loading ‘cor_temp_gaps’ and my ‘WITH’
> select should not cause the difference. The differences are in how
> ‘c_id’ is generated and the fact that selecting only ‘c_scenario =
> ‘20170628tc04’ has been moved from calculating the interval to the
> ‘WITH’ select. I have also tried the ‘WITH SELECT’ approach without
> moving that test and received the same results.
>
> My suspicion is that in the failing approach, my sequence is being
> assigned before the sort whereas when I load ‘cor_temp_gap’, and c_id
> defaults to a sequence then c_id is generated AFTER c_entered_ion is put
> in sort order.
>
> If my suspicion is right, how do accomplish the same thing in the ‘WITH
> SELECT’ case?
>
> If it is wrong, what am I doing wrong?
>
> Thanks for any insights you may be able to provide!
>
> /s/jr
> Consultant
> Concerto GR
> Mobile: 612.208.6601
>
> Concerto - a composition for orchestra and a soloist
>

--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jerry Regan 2017-07-04 00:20:02 Re: Using 'WITH SELECT' Results Do Not Match 'SELECT FROM <table>' Results
Previous Message rajan 2017-07-03 17:39:57 Re: have trouble understanding xmin and xmax with update operations from two different sessions