| From: | m_lists(at)yahoo(dot)it |
|---|---|
| To: | Alvaro Herrera <alvherre(at)commandprompt(dot)com> |
| Cc: | pgsql-general <pgsql-general(at)postgresql(dot)org> |
| Subject: | Re: Performance problem with low correlation data |
| Date: | 2009-07-10 07:18:16 |
| Message-ID: | 903539.75439.qm@web24615.mail.ird.yahoo.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-general |
> > testinsert contains t values between '2009-08-01' and '2009-08-09', and ne_id
> from 1 to 20000. But only 800 out of 20000 ne_id have to be read; there's no
> need for a table scan!
> > I guess this is a reflection of the poor "correlation" on ne_id; but, as I
> said, I don't really think ne_id is so bad correlated.
> > In fact, doing a "select ne_id, t from testinsert limit 100000" I can see
> that data is laid out pretty much by "ne_id, t", grouped by day (that is, same
> ne_id for one day, then next ne_id and so on until next day).
> > How is the "correlation" calculated? Can someone explain to me why, after the
> procedure above,correlation is so low???
>
> Did you run ANALYZE after the procedure above?
Yes I did; the correlation on that column stays low.
Of course, I didn't expect a correlation = 1, since data is layed out (pretty much) like this:
(ne_id1) (t1 day1)
(ne_id1) (t2 day1)
...
(ne_id1) (tn day1)
(ne_id2) (t1 day1)
(ne_id2) (t2 day1)
...
(ne_id2) (tn day1)
...
(pretty much all the ne_ids)
(ne_id1) (t1 day2)
(ne_id1) (t2 day2)
...
(ne_id1) (tn day2)
(ne_id2) (t1 day2)
(ne_id2) (t2 day2)
...
(ne_id2) (tn day2)
... and so on
so I ne_id is not strictly incrementing, but it is pretty much the same (sequencially) for a whole whole day...
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Simon Riggs | 2009-07-10 09:20:46 | Re: Overhead of union versus union all |
| Previous Message | Bruce Momjian | 2009-07-10 03:15:17 | Re: Overhead of union versus union all |