From: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Rick Otten <rottenwindfish(at)gmail(dot)com>, pgsql-performance(at)lists(dot)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com> |
Subject: | Re: dsa_allocate() faliure |
Date: | 2018-01-29 20:52:43 |
Message-ID: | CAEepm=0Q5P2jM9hdZ6vkoKKzXce-9Oi9GtCdWVPBYumC1G7+mw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-performance |
On Tue, Jan 30, 2018 at 5:37 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Rick Otten <rottenwindfish(at)gmail(dot)com> writes:
>> I'm wondering if there is anything I can tune in my PG 10.1 database to
>> avoid these errors:
>
>> $ psql -f failing_query.sql
>> psql:failing_query.sql:46: ERROR: dsa_allocate could not find 7 free pages
>> CONTEXT: parallel worker
>
> Hmm. There's only one place in the source code that emits that message
> text:
>
> /*
> * Ask the free page manager for a run of pages. This should always
> * succeed, since both get_best_segment and make_new_segment should
> * only return a non-NULL pointer if it actually contains enough
> * contiguous freespace. If it does fail, something in our backend
> * private state is out of whack, so use FATAL to kill the process.
> */
> if (!FreePageManagerGet(segment_map->fpm, npages, &first_page))
> elog(FATAL,
> "dsa_allocate could not find %zu free pages", npages);
>
> Now maybe that comment is being unreasonably optimistic, but it sure
> appears that this is supposed to be a can't-happen case, in which case
> you've found a bug.
This is probably the bug fixed here:
https://www.postgresql.org/message-id/E1eQzIl-0004wM-K3%40gemulon.postgresql.org
That was back patched, so 10.2 will contain the fix. The bug was not
in dsa.c itself, but in the parallel query code that mixed up DSA
areas, corrupting them. The problem comes up when the query plan has
multiple Gather nodes (and a particular execution pattern) -- is that
the case here, in the EXPLAIN output? That seems plausible given the
description of a 50-branch UNION. The only workaround until 10.2
would be to reduce max_parallel_workers_per_gather to 0 to prevent
parallelism completely for this query.
--
Thomas Munro
http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2018-01-29 21:05:31 | Re: [HACKERS] datetime.h defines like PM conflict with external libraries |
Previous Message | Adam Brightwell | 2018-01-29 20:45:39 | Re: PATCH: Exclude unlogged tables from base backups |
From | Date | Subject | |
---|---|---|---|
Next Message | Rick Otten | 2018-01-29 21:35:53 | Re: dsa_allocate() faliure |
Previous Message | Tom Lane | 2018-01-29 16:37:09 | Re: dsa_allocate() faliure |