From: | Victor Yegorov <vyegorov(at)gmail(dot)com> |
---|---|
To: | Adarsh Sharma <eddy(dot)adarsh(at)gmail(dot)com> |
Cc: | "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Join Bad Performance on different data types |
Date: | 2014-03-05 10:23:36 |
Message-ID: | CAGnEbogBaLbxfQPOhy_FUc8ZuLuM5Xtd+SJO3D5vygeyfRW7tw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
2014-03-05 6:53 GMT+02:00 Adarsh Sharma <eddy(dot)adarsh(at)gmail(dot)com>:
> Anyhow working on finding the root cause.
I would do the following:
1. Check your `default_statistics_target`, like:
SHOW default_statistics_target;
2. Your previously posted `EXPLAIN ANALYZE` with `set enable_seqscan=off`
shows
a significant skew in the estiamted number of rows. I would really bump
up the
`default_statistics_target` higher (modify your `postgresql.conf` and
reload),
up to 500, re-analyzed all tables and checked the performance again.
If you don't want to do it for the whole cluster, then increase for
individual columns.
I think increasing it up to 500 (or higher) for the columns containing
arrays will be
beneficial in any case:
ALTER TABLE graph5 ALTER detail_id SET STATISTICS 1000;
ALTER TABLE graph3 ALTER id_list SET STATISTICS 1000;
ALTER TABLE graph2 ALTER targeting_ids SET STATISTICS 1000;
Of course, you need to `ANALYZE` (or `VACUUM ANALYZE`) the tables after
this change.
3. Next, I've reformatted your query the following way (easier for me):
SELECT DISTINCT
glt.id || ':' || gtt.name as id_type,
glt.name,latitude,longitude,radius,
latitude || ',' || longitude as latlon
FROM graph5 td JOIN graph6 gtt ON gtt.id=td.entity_type_id JOIN
graph2 gcr ON gcr.id=ANY(td.detail_id) JOIN graph3 gtd ON gtd.id=gcr.t_ids
JOIN graph1 glt ON glt.id=ANY(gtd.id_list);
I've ordered joins (well, tables) in the way they depend on each other.
I would try out to move the first join into the CTE in order to isolate
it's results
and help optimizer to better estiamte the rows. Something like this:
WITH first AS (
SELECT td.*,gtt.*
FROM graph5 td
JOIN graph6 gtt ON gtt.id=td.entity_type_id
)
SELECT DISTINCT
glt.id || ':' || first.name as id_type,
glt.name,latitude,longitude,radius,
latitude || ',' || longitude as latlon
FROM first
JOIN graph2 gcr ON gcr.id=ANY(first.detail_id)
JOIN graph3 gtd ON gtd.id=gcr.t_ids
JOIN graph1 glt ON glt.id=ANY(gtd.id_list)
4. Try disabling materialization, like `set enable_material=off`.
--
Victor Y. Yegorov
From | Date | Subject | |
---|---|---|---|
Next Message | Israel Brewster | 2014-03-05 18:22:08 | Mysterious DB reset |
Previous Message | Raschick, Hartmut | 2014-03-05 10:15:27 | too-may-open-files log file entries when vauuming under solaris |