Re: Incorrect number of rows inserted into partitioned table

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Роман Осипов <osipovromanvladimirovich(at)gmail(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Incorrect number of rows inserted into partitioned table
Date: 2023-04-10 22:04:39
Message-ID: 23169.1681164279@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

=?UTF-8?B?0KDQvtC80LDQvSDQntGB0LjQv9C+0LI=?= <osipovromanvladimirovich(at)gmail(dot)com> writes:
> There is an insertion not of the amount specified in *limit*,, but a little
> more or less than it.

I think you're getting burnt by the fact that a rule is a macro,
combined with the fact that your query is underspecified:

insert into book_inherit_test select * from book_test limit 1000000;

This doesn't constrain *which* 1000000 rows of book_test get inserted.
If we EXPLAIN it we get:

=# explain insert into book_inherit_test select * from book_test limit 1000000;
QUERY PLAN
------------------------------------------------------------------------------------------------------
Insert on book_inherit_test (cost=0.00..32353.00 rows=0 width=0)
-> Subquery Scan on "*SELECT*" (cost=0.00..32353.00 rows=617176 width=31)
Filter: ((("*SELECT*".releaseyear >= 1900) AND ("*SELECT*".releaseyear < 1951)) IS NOT TRUE)
-> Limit (cost=0.00..17353.00 rows=1000000 width=31)
-> Seq Scan on book_test (cost=0.00..34706.00 rows=2000000 width=31)

Insert on book_inherit_test_before_1950 (cost=0.00..32353.00 rows=0 width=0)
-> Subquery Scan on "*SELECT*" (cost=0.00..32353.00 rows=382824 width=31)
Filter: (("*SELECT*".releaseyear >= 1900) AND ("*SELECT*".releaseyear < 1951))
-> Limit (cost=0.00..17353.00 rows=1000000 width=31)
-> Seq Scan on book_test (cost=0.00..34706.00 rows=2000000 width=31)
(11 rows)

from which we can see that the insertions into book_inherit_test and
those into book_inherit_test_before_1950 are driven off completely
separate scans of book_test. And that table is big enough that
the synchronize_seqscans feature kicks in, meaning that indeed this
will scan two different million-row subsets of book_test, producing
results fundamentally unlike what you expected.

I get stable results after disabling synchronize_seqscans, but
a more correct way would be to add ORDER BY to fully determine which
rows of book_test are considered.

Advice: don't use a RULE for this sort of thing. If you really want to,
you can get the same effects more predictably with a trigger.
But as David said, the whole thing looks like a poor reimplementation
of partitioning.

regards, tom lane

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Роман Осипов 2023-04-11 05:48:23 Re: Incorrect number of rows inserted into partitioned table
Previous Message Роман Осипов 2023-04-10 21:58:15 Re: Incorrect number of rows inserted into partitioned table