Re: analyze foreign tables

From: richard coleman <rcoleman(dot)ascentgl(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Pgsql-admin <pgsql-admin(at)lists(dot)postgresql(dot)org>
Subject: Re: analyze foreign tables
Date: 2023-08-03 16:16:14
Message-ID: CAGA3vBtr=LjXtugfL5352PiRe0vrUm+jPpZTN=u8NT48BLf3yA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgadmin-hackers pgadmin-support pgsql-admin

Jeff,

In my experience the overhead is directly related to the size and the
complexity of the tables in the query.

A simple query only referencing a small < 5M table with only a primary key,
it isn't noticeable.

A typical complicated query referencing numerous tables ranging in sizes
from <5M to > 2.5T each with a primary key and 0 - 30 indices (per table),
it can add 30 seconds to many minutes to each run of the query. When some
of these queries are re-run constantly it becomes untenable. In one case,
manually running analyze on the foreign tables and then running the query
it returns in about 30 sec. Setting use_remote_estimate = true made each
run return in about 5 minutes.

Unfortunately, manually running analyze on each foreign table in the schema
(500+ tables, ranging from < 1M to > 3T) takes more than a day to
complete. On the server hosting the tables, the auto analyze is running
constantly, as expected. All of the clusters are sitting at PostgreSQL 15.

Hence my desire to find a more performant, less resource intensive way to
pass the continuously updated statistics of these tables to the other
PostgreSQL clusters holding the foreign table pointers to them.

I know it's anecdotal, but I hope it helps anyway.

rik.

On Thu, Aug 3, 2023 at 10:19 AM Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:

> On Tue, Aug 1, 2023 at 9:47 AM richard coleman <
> rcoleman(dot)ascentgl(at)gmail(dot)com> wrote:
>
>>
>> use_remote_estimate isn't really a solution as it adds way too much
>> overhead and processing time to every query run.
>>
>
> Maybe this is the thing which should be addressed. Can you quantify what
> you see here? How much overhead is being added for each query? Is this
> principally processing time, or network latency?
>
>
> Since these tables are being continuously analyzed in the database that
>> hosts the data, is there some way that they statistics could be easily
>> passed through the foreign server mechanism to the remote database that's
>> calling the query?
>>
>
> Since FDW can cross version boundaries, it is hard to see how this would
> work. Maybe something could be done for the special case of where the
> versions match. I think collations/encoding would be a problem, though.
>
>
>> What I am hoping for is either:
>>
>> 2. add the ability to automatically run analyze on foreign tables just as
>> they are currently run on local tables.
>>
>
> That wouldn't work because communication is always initiated on the wrong
> side. But it should be fairly easy to script something outside of the
> database which would connect to both, and poll the "foreign"
> pg_stat_all_tables.last_autovacuum and initiate a local ANALYZE for each
> table which was recently autoanalyzed on the foreign side.
>
> Cheers,
>
> Jeff
>

In response to

Browse pgadmin-hackers by date

  From Date Subject
Next Message Pravesh Sharma 2023-08-04 06:27:41 [pgadmin-org/pgadmin4] 00011b: Fixed issue where cycle syntax was not added in SQ...
Previous Message Jeff Janes 2023-08-03 14:19:19 Re: analyze foreign tables

Browse pgadmin-support by date

  From Date Subject
Next Message Frank Gunseor 2023-08-03 23:13:15
Previous Message Jeff Janes 2023-08-03 14:19:19 Re: analyze foreign tables

Browse pgsql-admin by date

  From Date Subject
Next Message Sushil Shirodkar 2023-08-03 17:20:43 Re. Backup
Previous Message Jeff Janes 2023-08-03 14:19:19 Re: analyze foreign tables