From: | Petr Jelinek <petr(at)2ndquadrant(dot)com> |
---|---|
To: | Michael Paquier <michael(dot)paquier(at)gmail(dot)com> |
Cc: | Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz> |
Subject: | Re: TABLESAMPLE patch |
Date: | 2015-01-10 19:59:15 |
Message-ID: | 54B18493.5030303@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 09/01/15 09:27, Michael Paquier wrote:
>
> Some comments about the 1st patch:
> 1) Nitpicking:
> + * Block sampling routines shared by ANALYZE and TABLESAMPLE.
> TABLESAMPLE is not added yet. You may as well mention simplify this
> description with something like "Sampling routines for relation
> blocks".
>
Changed.
> 2) file_fdw and postgres_fdw do not compile correctly as they still
> use anl_random_fract. This function is still mentioned in vacuum.h as
> well.
Gah, didn't notice this, fixed. And also since the Vitter's reservoir
sampling methods are used by other component than just analyze, I moved
those to sampling.c/h.
>
> 3) Not really an issue of this patch, but I'd think that this comment
> should be reworked:
> + * BlockSampler is used for stage one of our new two-stage tuple
> + * sampling mechanism as discussed on pgsql-hackers 2004-04-02 (subject
> + * "Large DB"). It selects a random sample of samplesize blocks out of
> + * the nblocks blocks in the table. If the table has less than
> + * samplesize blocks, all blocks are selected.
>
I changed the wording slightly, it's still not too great though.
> 4) As a refactoring patch, why is the function providing a random
> value changed? Shouldn't sample_random_fract be consistent with
> anl_random_fract?
>
Yeah I needed that for TABLESAMPLE and it should not really affect the
randomness but you are correct it should be part of second patch of the
patch-set.
> 5) The seed numbers can be changed to RAND48_SEED_0, RAND48_SEED_1 and
> RAND48_SEED_2 instead of being hardcoded?
> Regards,
>
Removed this part from the first patch as it's not needed there anymore.
In second patch which implements the TABLESAMPLE itself I changed the
implementation of random generator because when I looked at the code
again I realized the old one would produce wrong results if there were
multiple TABLESAMPLE statements in same query or multiple cursors in
same transaction.
In addition to the above changes I added test for cursors and test for
the issue with random generator I mentioned above. Also fixed some typos
in comments and function name. And finally I added note to docs saying
that same REPEATABLE might produce different results in subsequent queries.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachment | Content-Type | Size |
---|---|---|
0001-separate-block-sampling-functions-v2.patch | text/x-diff | 21.9 KB |
0002-tablesample-v5.patch | text/x-diff | 98.9 KB |
0003-tablesample-ddl-v2.patch | text/x-diff | 47.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Christoph Berg | 2015-01-10 21:09:28 | Re: libpq 9.4 requires /etc/passwd? |
Previous Message | Tom Lane | 2015-01-10 19:53:13 | Sloppy SSPI error reporting code |