Re: json datatype and table bloat?

From: Gregory Haase <haaseg(at)onefreevoice(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: John R Pierce <pierce(at)hogranch(dot)com>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: json datatype and table bloat?
Date: 2013-11-01 18:18:38
Message-ID: CAHA6QFRADbCMhyJKnrQxmyzVNa1kDmy8E2F0qTUJ2PsGJeXnMg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I spent some more time on this today, and I realized that the issue isn't
that there are records in the toast table. The issue is that there are NO
records in the toast table. Apparently, all the json we are inserting are
too small to get toasted.

I setup a separate benchmark locally:
create table test_json_stats
(
test_json_stats_id serial,
json_data json,
insert_timestamp timestamp default now() not null
);

created a file called "json bench" with the following:
BEGIN;
insert into test_json_stats (json_data) values ('{"counters": [ {
"first":"1","second":"2"}, { "first":"3","second":"4"}, {
"first":"5","second":"6"}, { "first":"7","second":"8"}, {
"first":"9","second":"10"}, { "first":"11","second":"12"}, {
"first":"13","second":"14"}, { "first":"15","second":"16"}, {
"first":"17","second":"18"}, { "first":"19","second":"20"}, {
"first":"21","second":"22"}, { "first":"23","second":"24"}, {
"first":"25","second":"26"}, { "first":"27","second":"28"}, {
"first":"29","second":"30"}, { "first":"31","second":"32"}, {
"first":"33","second":"34"}, { "first":"35","second":"36"}, {
"first":"37","second":"38"}, { "first":"39","second":"40"}, {
"first":"41","second":"42"}, { "first":"43","second":"44"}, {
"first":"45","second":"46"} ] }');
END;

Then ran pgbench:
pgbench -c 5 -t 2000000 -f json_bench greg

vacuum vebose shows the test_json_stats table has over a million pages and
the toast table exists with zero pages:
INFO: vacuuming "public.test_json_stats"
INFO: "test_json_stats": found 0 removable, 0 nonremovable row versions in
0 out of 1010011 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.01u sec elapsed 0.01 sec.
INFO: vacuuming "pg_toast.pg_toast_51822"
INFO: index "pg_toast_51822_index" now contains 0 row versions in 1 pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: "pg_toast_51822": found 0 removable, 0 nonremovable row versions in
0 out of 0 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM

The json_data column is not accounted for in pg_stats:
select attname, null_frac, avg_width from pg_stats where tablename =
'test_json_stats';
attname | null_frac | avg_width
--------------------+-----------+-----------
test_json_stats_id | 0 | 4
insert_timestamp | 0 | 8
(2 rows)

So I'm not sure if I'd actually qualify this as a "bug", but it appears
that there is no way to currently get stats on a json data type.

I subsequently inserted a very large json into the table that consumed 2
pages in pg_toast_51822, but there still doesn't appear to me any way to
get stats on the column.

Greg Haase

On Tue, Oct 29, 2013 at 2:55 PM, Gregory Haase <haaseg(at)onefreevoice(dot)com>wrote:

> Tom is correct: Vacuum verbose shows that their is an associated toast
> table. Neither the check_postgres.pl script or the query on
> http://wiki.postgresql.org/wiki/Show_database_bloat appear to take this
> into consideration. Both rely on null_frac and avg_width from pg_stats to
> estimate how big the table should be. I'm not sure how you would factor the
> toast table into that estimate.
>
> -G
>
>
> On Tue, Oct 29, 2013 at 2:05 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> Gregory Haase <haaseg(at)onefreevoice(dot)com> writes:
>> > I've isolated the problem to the json field not showing up in pg_stats,
>> > which affects the calculation of the avg row size in the bloat query.
>>
>> > I'm not sure if this is a json issue or some other kind of issue.
>>
>> Possibly your "bloat query" is failing to consider the toast table
>> associated with this table? If the json values are large they'd
>> mostly be in the toast table not the main table.
>>
>> (It's unfortunate that VACUUM FULL doesn't tell you about what's
>> in the toast table. I'd try just VACUUM VERBOSE here, without the
>> FULL, to get more info.)
>>
>> regards, tom lane
>>
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2013-11-01 18:50:27 Re: json datatype and table bloat?
Previous Message Frank Church 2013-11-01 16:28:58 How can I run a PostgreSQL database outside /var/run/postgresql?