From: | Pierre Ducroquet <pierre(dot)ducroquet(at)people-doc(dot)com> |
---|---|
To: | pgsql-general <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: PostgreSQL not reusing free space in table ? |
Date: | 2017-01-06 16:00:12 |
Message-ID: | 4898200.U00KjWIJPS@laptop-pierred |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Friday, January 6, 2017 9:00:06 AM CET Merlin Moncure wrote:
> On Fri, Jan 6, 2017 at 4:09 AM, Pierre Ducroquet
>
> <pierre(dot)ducroquet(at)people-doc(dot)com> wrote:
> > Hi
> >
> > Running PostgreSQL 9.4, I am running in the following issue.
> > On a huge table, I have to remove the content (set to '') of a column that
> > makes for 99% of the weight of the table. Since the production can not be
> > stopped, I will not be able to do a VACUUM FULL nor any direct rewrite of
> > the table, so I tried the following trick to drop the content while
> > reducing the table weight. (I re-indented it for this email)
> >
> > ppr=500
> > for i in `seq 0 60` ; do
> >
> > psql $MYDB -c "UPDATE $TABLE
> >
> > SET content = ''::bytea
> > WHERE ctid = ANY(ARRAY(
> >
> > SELECT ('(' || p.i || ',' || s.i || ')')::tid
> > FROM generate_series(`$i * $ppr`, `($i + 1) *
> > $ppr`) p(i),
> >
> > generate_series(0, 2048) AS s(i)
> >
> > ));"
> >
> > psql $MYDB -c "VACUUM $TABLE;"
> >
> > done
> >
> > This reduced my table from 1200MB down to 240MB, 29000 pages, while I
> > expected the final result to be even lower. I did a copy of the table,
> > and it needed only 30MB, 3800 pages, so there was still something wrong.
> > I did a simple (and slow) query to get the number of rows per page on both
> > table. The new one had slightly more rows per page, but this was in no way
> > enough to justify such a gap. Then I saw that after page 2080 the table
> > had, for the following 27000 pages, between 0 and 5 rows per page. So
> > vacuum could not remove the pages because there were gaps.
> > I figured out that, after a vacuum, updating the rows at the end of the
> > table should be put back at the beginning of the free space and thus
> > after an update of the rows of the last 27000 pages the table could be
> > reduced to the optimal size.
> > But when I raw the following, I was very surprised (NB: I know this query
> > does a full scan, I won't do it on a 100GB table, I promise)
> >
> > VACUUM VERBOSE $TABLE;
> >
> > SELECT count(*) FROM $TABLE WHERE ctid > '(29000,0)';
> > => 29 rows
> > UPDATE $TABLE SET content = ''::bytea WHERE ctid > '(29000,0)';
> > => 29 rows updated
> > SELECT count(*) FROM $TABLE WHERE ctid > '(29000,0)';
> > ==> 29 rows again !
> >
> > So instead of filling huge gaps (I've got for instance an 80 pages gap,
> > from id 2082 to id 2160), Pg put the data back at the end of the table.
> >
> >
> > I'm quite puzzled by this situation. I believed PostgreSQL would use the
> > free space to put the new data, so my update loop would give me a clean,
> > «packed» table.
> >
> > What behaviour did I miss here ? How can I get PostgreSQL to use that free
> > space without falling back to a vacuum full ? (And without using tools
> > like
> > pg_repack if possible, because this seems like not needed in that
> > situation)
> hm. Maybe HOT is involved? Does your column have an index on it?
>
No, the column is a non-indexed bytea.
From | Date | Subject | |
---|---|---|---|
Next Message | Merlin Moncure | 2017-01-06 20:34:26 | Re: Queries on very big table |
Previous Message | Job | 2017-01-06 15:56:13 | R: Postgres 9.6.1 big slowdown by upgrading 8.4.22 |