Re: ETL - sql orchestrator is stuck when there is not sleep() between queries

From: Allan Barrielle <allan(dot)barrielle(at)gmail(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org, Pierrick Martin <pierrick(at)mart1(dot)dev>
Subject: Re: ETL - sql orchestrator is stuck when there is not sleep() between queries
Date: 2021-07-08 13:49:12
Message-ID: CANrBv8rNm1bUcpPyUW0DsPyxdSCfTdDDAU3tUhSMB1YKcdSFQw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello,

> Is it true that the SELECTs have no joins in them ?

Yes there is a lot of LEFT JOIN.

> When it doesn't work, you could check SELECT * FROM pg_stat_activity, and
>SELECT pg_blocking_pids(pid), * FROM pg_locks, to see what's going on.

I can't see any blocking queries blocking pg_locks, pg_blocking_pids.

> It'd be very useful to get "explain analyze" for a working query and for a
> stuck query. It sound like the stuck query never finishes, so maybe the
second
> part is impossible (?)

We run an explain analysis and we see some very interesting stuff going on.
It seems without explicitly adding a `ANALYZE`, the query has a cost of
over billions, so the query is not stuck but took forever.
When I run the same scripts with an ANALYZE right before running the query,
the query is exec is 50secondes and the cost is normal

Explain analyze WITHOUT ANALYZE https://explain.depesz.com/s/RaSr
Explain analyze same query WITH ANALYZE BEFORE
https://explain.depesz.com/s/tYVl

The configuration is tuned by aws aurora, but this issue happens also with
a default config.

allow_system_table_mods,off
application_name,DataGrip 2021.1.3
archive_command,(disabled)
archive_mode,off
archive_timeout,5min
array_nulls,on
authentication_timeout,1min
autovacuum,on
autovacuum_analyze_scale_factor,0.05
autovacuum_analyze_threshold,50
autovacuum_freeze_max_age,200000000
autovacuum_max_workers,12
autovacuum_multixact_freeze_max_age,400000000
autovacuum_naptime,5s
autovacuum_vacuum_cost_delay,1ms
autovacuum_vacuum_cost_limit,1200
autovacuum_vacuum_scale_factor,0.1
autovacuum_vacuum_threshold,50
autovacuum_work_mem,-1
backend_flush_after,0
backslash_quote,safe_encoding
bgwriter_delay,200ms
bgwriter_flush_after,0
bgwriter_lru_maxpages,100
bgwriter_lru_multiplier,2
bonjour,off
bytea_output,hex
check_function_bodies,on
checkpoint_completion_target,0.9
checkpoint_flush_after,0
checkpoint_timeout,15min
checkpoint_warning,30s
client_encoding,UTF8
client_min_messages,notice
commit_delay,0
commit_siblings,5
constraint_exclusion,partition
cpu_index_tuple_cost,0.005
cpu_operator_cost,0.0025
cpu_tuple_cost,0.01
cursor_tuple_fraction,0.1
DateStyle,"ISO, MDY"
db_user_namespace,off
deadlock_timeout,1s
debug_pretty_print,on
debug_print_parse,off
debug_print_plan,off
debug_print_rewritten,off
default_statistics_target,500
default_text_search_config,pg_catalog.simple
default_transaction_deferrable,off
default_transaction_isolation,read committed
default_transaction_read_only,off
dynamic_library_path,$libdir
effective_cache_size,4GB
effective_io_concurrency,600
enable_bitmapscan,on
enable_gathermerge,on
enable_hashagg,on
enable_hashjoin,on
enable_indexonlyscan,on
enable_indexscan,on
enable_material,on
enable_mergejoin,on
enable_nestloop,on
enable_parallel_append,on
enable_parallel_hash,on
enable_partition_pruning,on
enable_partitionwise_aggregate,off
enable_partitionwise_join,off
enable_seqscan,on
enable_sort,on
enable_tidscan,on
escape_string_warning,on
event_source,PostgreSQL
exit_on_error,off
extra_float_digits,3
force_parallel_mode,off
from_collapse_limit,8
fsync,off
full_page_writes,off
geqo,on
geqo_effort,5
geqo_generations,0
geqo_pool_size,0
geqo_seed,0
geqo_selection_bias,2
geqo_threshold,12
gin_fuzzy_search_limit,0
gin_pending_list_limit,4MB
hot_standby,off
hot_standby_feedback,on
huge_pages,try
idle_in_transaction_session_timeout,25min
ignore_checksum_failure,off
ignore_system_indexes,off
IntervalStyle,postgres
jit,off
jit_above_cost,100000
jit_debugging_support,off
jit_dump_bitcode,off
jit_expressions,on
jit_inline_above_cost,500000
jit_optimize_above_cost,500000
jit_profiling_support,off
jit_provider,llvmjit
jit_tuple_deforming,on
join_collapse_limit,8
lc_monetary,C
lc_numeric,C
lc_time,C
listen_addresses,*
lock_timeout,0
lo_compat_privileges,off
maintenance_work_mem,2GB
max_connections,100
max_files_per_process,1000
max_locks_per_transaction,256
max_logical_replication_workers,4
max_parallel_maintenance_workers,12
max_parallel_workers,12
max_parallel_workers_per_gather,6
max_pred_locks_per_page,2
max_pred_locks_per_relation,-2
max_pred_locks_per_transaction,64
max_prepared_transactions,0
max_replication_slots,10
max_stack_depth,6MB
max_standby_archive_delay,30s
max_standby_streaming_delay,14s
max_sync_workers_per_subscription,2
max_wal_senders,0
max_wal_size,8GB
max_worker_processes,12
min_parallel_index_scan_size,512kB
min_parallel_table_scan_size,8MB
min_wal_size,2GB
old_snapshot_threshold,-1
operator_precedence_warning,off
parallel_leader_participation,off
parallel_setup_cost,1000
parallel_tuple_cost,0.1
password_encryption,md5
port,5432
post_auth_delay,0
pre_auth_delay,0
quote_all_identifiers,off
random_page_cost,1.1
restart_after_crash,on
row_security,on
search_path,public
seq_page_cost,1
session_replication_role,origin
shared_buffers,1GB
standard_conforming_strings,on
statement_timeout,0
superuser_reserved_connections,3
synchronize_seqscans,on
synchronous_commit,on
syslog_facility,local0
syslog_ident,postgres
syslog_sequence_numbers,on
syslog_split_messages,on
tcp_keepalives_count,9
tcp_keepalives_idle,7200
tcp_keepalives_interval,75
temp_buffers,8MB
temp_file_limit,-1
TimeZone,UTC
trace_notify,off
trace_recovery_messages,log
trace_sort,off
track_activities,on
track_activity_query_size,4kB
track_commit_timestamp,off
track_counts,on
track_functions,none
track_io_timing,off
transform_null_equals,off
update_process_title,on
vacuum_cleanup_index_scale_factor,0.1
vacuum_cost_delay,0
vacuum_cost_limit,200
vacuum_cost_page_dirty,20
vacuum_cost_page_hit,1
vacuum_cost_page_miss,0
vacuum_defer_cleanup_age,0
vacuum_freeze_min_age,50000000
vacuum_freeze_table_age,150000000
vacuum_multixact_freeze_min_age,5000000
vacuum_multixact_freeze_table_age,150000000
wal_buffers,16MB
wal_compression,off
wal_level,minimal
wal_log_hints,off
wal_receiver_status_interval,10s
wal_receiver_timeout,30s
wal_retrieve_retry_interval,5s
wal_sender_timeout,1min
wal_sync_method,fdatasync
wal_writer_delay,200ms
wal_writer_flush_after,1MB
work_mem,2GB
xmlbinary,base64
xmloption,content
zero_damaged_pages,off

On Thu, Jul 8, 2021 at 2:33 PM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:

> On Thu, Jul 08, 2021 at 01:00:28AM +0200, Allan Barrielle wrote:
> > All is fine, and can work great.
> > But sometimes, some queries that used to take about 20 secs to complete
> can
> > suddenly end in 5mins.
> > Important all queries have the same shape -> CREATE TABLE SELECT AS *(a
> bit
> > of transform) FROM TABLE). No update, nothing, it’s dead simple.
> >
> > Nothing works, it’s very random, some query won’t simply work ( even
> after
> > hours ).
>
> When it doesn't work, you could check SELECT * FROM pg_stat_activity, and
> SELECT pg_blocking_pids(pid), * FROM pg_locks, to see what's going on.
>
> > Important all queries have the same shape -> CREATE TABLE SELECT AS *(a
> bit
> > of transform) FROM TABLE). No update, nothing, it’s dead simple.
> > We are just trying to copy a table from schema1, to schema2, to schema3
> and
> > finally schema3. That’s it.
>
> Is it true that the SELECTs have no joins in them ?
>
> Did this ever work better or differently under different versions of
> postgres ?
>
> > Why does the query never end even after hours ? Why there is no log about
> > where the query is stuck.
>
> Please send your nondefault config.
> https://wiki.postgresql.org/wiki/Server_Configuration
>
> Also enable logging (I just added this to the wiki).
> https://wiki.postgresql.org/wiki/Slow_Query_Questions#Enable_Logging
>
> It'd be very useful to get "explain analyze" for a working query and for a
> stuck query. It sound like the stuck query never finishes, so maybe the
> second
> part is impossible (?)
>
> But it'd be good to get at least "explain" output. You'd have to edit
> your sql
> script to run an "explain" before each query, and run it, logging the
> ouput,
> until you capture the plan for a stuck query. Save the output and send
> here,
> along with the query plan for a working query.
>
> --
> Justin
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Allan Barrielle 2021-07-08 13:51:48 Re: ETL - sql orchestrator is stuck when there is not sleep() between queries
Previous Message Imre Samu 2021-07-08 13:47:15 Re: ETL - sql orchestrator is stuck when there is not sleep() between queries