From: | Vladimir Ryabtsev <greatvovan(at)gmail(dot)com> |
---|---|
To: | pgsql-performance(at)lists(dot)postgresql(dot)org |
Subject: | Why Postgres doesn't use TID scan? |
Date: | 2018-12-18 01:16:08 |
Message-ID: | CAMqTPq=hNg0GYFU0X+xmuKy8R2ARk1+A_uQpS+Mnf71MYpBKzg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
I want to clean a large log table by chunks. I write such a query:
delete from categorization.log
where ctid in (
select ctid from categorization.log
where timestamp < now() - interval '2 month'
limit 1000
)
But I am getting the following weird plan:
[Plan 1]
Delete on log (cost=74988058.17..77101421.77 rows=211334860 width=36)
-> Merge Semi Join (cost=74988058.17..77101421.77 rows=211334860
width=36)
Merge Cond: (log.ctid = "ANY_subquery".ctid)
-> Sort (cost=74987967.33..76044641.63 rows=422669720 width=6)
Sort Key: log.ctid
-> Seq Scan on log (cost=0.00..8651368.20 rows=422669720
width=6)
-> Sort (cost=90.83..93.33 rows=1000 width=36)
Sort Key: "ANY_subquery".ctid
-> Subquery Scan on "ANY_subquery" (cost=0.00..41.00
rows=1000 width=36)
-> Limit (cost=0.00..31.00 rows=1000 width=6)
-> Seq Scan on log log_1
(cost=0.00..11821391.10 rows=381284367 width=6)
Filter: ("timestamp" < (now() - '2
mons'::interval))
And it takes infinity to complete (with any number in LIMIT from 1 to 1000).
However if I extract CTIDs manually:
select array_agg(ctid) from (
select ctid from s.log
where timestamp < now() - interval '2 month'
limit 5
) v
and substitute the result inside the DELETE query, it does basic TID scan
and completes in just milliseconds:
explain
delete from s.log
where ctid =
any('{"(3020560,1)","(3020560,2)","(3020560,3)","(3020560,4)","(3020560,5)"}'::tid[])
[Plan 2]
Delete on log (cost=0.01..20.06 rows=5 width=6)
-> Tid Scan on log (cost=0.01..20.06 rows=5 width=6)
TID Cond: (ctid = ANY
('{"(3020560,1)","(3020560,2)","(3020560,3)","(3020560,4)","(3020560,5)"}'::tid[]))
In case the table's definition helps:
CREATE TABLE s.log
(
article_id bigint NOT NULL,
topic_id integer NOT NULL,
weight double precision NOT NULL,
cat_system character varying(50) NOT NULL,
lang character varying(5) NOT NULL,
is_final boolean NOT NULL,
comment character varying(50),
"timestamp" timestamp without time zone DEFAULT now()
)
Number of rows ~ 423M
n_live_tup = 422426725
last_vacuum = 2018-10-22
Postgres version(): PostgreSQL 10.3 (Ubuntu 10.3-1.pgdg14.04+1) on
x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4,
64-bit
Why does this query want to use Seq Scan and Sort on a 423M rows table?
How to fix this (reduce it to Plan 2)?
--
Vlad
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2018-12-18 01:40:37 | Re: Why Postgres doesn't use TID scan? |
Previous Message | Mariel Cherkassky | 2018-12-16 11:58:45 | Re: pgbench results arent accurate |