From: | Thom Brown <thom(at)linux(dot)com> |
---|---|
To: | Anastasia Lubennikova <lubennikovaav(at)gmail(dot)com> |
Cc: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Index-only scans for GIST |
Date: | 2014-10-27 11:33:50 |
Message-ID: | CAA-aLv5cArNV6PtAfkqpro3prV0Y4_r1L47Yie2KMK_C=5K2Kg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 18 August 2014 09:05, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> wrote:
> On 08/17/2014 07:15 PM, Anastasia Lubennikova wrote:
>
>> 2014-08-07 0:30 GMT+04:00 Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>:
>>
>> * I'm getting two regression failures with this (opr_sanity and join).
>>
>>>
>>>
>> opr_sanity failure is corrected.
>> But there is remain question with join.
>> I check the latest version of my github repo and there's no fail in join
>> regression test
>> All 145 tests passed.
>> To tell the truth, I don't understand which changes could led to this
>> failure.
>> Could you show me regression diffs?
>>
>
> Sure, here you go. It seems like a change in a plan. At a quick glance it
> seems harmless: the new plan is identical except that the left and right
> side of a join have been reversed. But I don't understand either why this
> patch would change that, so it needs to be investigated.
>
> * The regression test queries that use LIMIT are not guaranteed to always
>>
>>> return the same rows, hence they're not very good regression test cases.
>>> I'd suggest using more restricting WHERE clauses, so that each query only
>>> returns a handful of rows.
>>>
>>
>> Thank you for comment, I rewrote wrong queries. But could you explain why
>> LIMIT queries may return different results? Is it happens because of
>> different query optimization?
>>
>
> Imagine that you have a table with two rows, A and B. If you run a query
> like "SELECT * FROM table LIMIT 1", the system can legally return either
> row A or B, because there's no ORDER BY.
>
> Now, admittedly you have a similar problem even without the LIMIT - the
> system can legally return the rows in either order - but it's less of an
> issue because at least you can quickly see from the diff that the result
> set is in fact the same, the rows are just in different order. You could
> fix that by adding an ORDER BY to all test queries, but we haven't done
> that in the regression suite because then we would not have any test
> coverage for cases where you don't have an ORDER BY. As a compromise, test
> cases are usually written without an ORDER BY, but if e.g. the buildfarm
> starts failing because of differences in the result set order across
> platforms, then we add an ORDER BY to make it stable.
>
> * I think it's leaking memory, in GIST scan context. I tested this with a
>>
>>> variant of the regression tests:
>>>
>>> insert into gist_tbl select box(point(0.05*i, 0.05*i), point(0.05*i,
>>> 0.05*i)),
>>> point(0.05*i, 0.05*i) FROM generate_series(0,
>>> 10000000) as i;
>>> CREATE INDEX gist_tbl_point_index ON gist_tbl USING gist (p);
>>>
>>> set enable_seqscan=off;
>>> set enable_bitmapscan=off;
>>>
>>> explain analyze select p from gist_tbl where p <@ box(point(0,0),
>>> point(9999999,9999999)) and length(p::text) < 10;
>>>
>>> while the final query runs, 'top' shows constantly increasing memory
>>> usage.
>>>
>>
>> I don't think it's memory leak. After some increasing, memory using remain
>> the same. It works similar without using indexonlyscan.
>>
>
> No, it's definitely a leak caused by the patch. Test with the attached
> patch, which prints out to the server log the amount of memory used by the
> GiST scan memory context every 10000 rows. It clearly shows increasing
> memory usage all the way to the end of the query.
>
> It's cleaned up at the end of the query, but that's not good enough
> because for a large query you might accumulate gigabytes of leaked memory
> until the query has finished. If you (manually) apply the same patch to git
> master, you'll see that the memory usage stays consistent and small.
>
Hi Anastasia,
Do you have time to address the issues brought up in Heikki's review? It
would be good if we could your work into PostgreSQL 9.5.
Thanks
Thom
From | Date | Subject | |
---|---|---|---|
Next Message | Thom Brown | 2014-10-27 11:37:17 | Re: KNN searches support for SP-GiST [GSOC'14] |
Previous Message | Stephen Frost | 2014-10-27 11:27:38 | Re: Directory/File Access Permissions for COPY and Generic File Access Functions |