Re: Unexpected expensive index scan

From: Jake Nielsen <jake(dot)k(dot)nielsen(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Unexpected expensive index scan
Date: 2016-09-28 00:21:40
Message-ID: CAP3LSG7PfEfaK5V6rT7rgP1MjLED-wnqGY5Qtp5Gx0s64L16tg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Herp, forgot to include the query:

SELECT * FROM SyncerEvent WHERE ID > 12468 AND propogatorId NOT IN
('"d8130ab9!-66d0!-4f13!-acec!-a9556362f0ad"') AND conflicted != 1 AND
userId = '57dc984f1c87461c0967e228' ORDER BY ID LIMIT 4000;^

On Tue, Sep 27, 2016 at 5:02 PM, Jake Nielsen <jake(dot)k(dot)nielsen(at)gmail(dot)com>
wrote:

> I've got a query that takes a surprisingly long time to run, and I'm
> having a really rough time trying to figure it out.
>
> Before I get started, here are the specifics of the situation:
>
> Here is the table that I'm working with (apologies for spammy indices,
> I've been throwing shit at the wall)
>
> Table "public.syncerevent"
>
> Column | Type | Modifiers
>
>
> --------------+---------+-----------------------------------
> -----------------------
>
> id | bigint | not null default nextval('syncerevent_id_seq'::
> regclass)
>
> userid | text |
>
> event | text |
>
> eventid | text |
>
> originatorid | text |
>
> propogatorid | text |
>
> kwargs | text |
>
> conflicted | integer |
>
> Indexes:
>
> "syncerevent_pkey" PRIMARY KEY, btree (id)
>
> "syncereventidindex" UNIQUE, btree (eventid)
>
> "anothersyncereventidindex" btree (userid)
>
> "anothersyncereventidindexwithascending" btree (userid, id)
>
> "asdfasdgasdf" btree (userid, id DESC)
>
> "syncereventuseridhashindex" hash (userid)
>
> To provide some context, as per the wiki,
> there are 3,290,600 rows in this table.
> It gets added to frequently, but never deleted from.
> The "kwargs" column often contains mid-size JSON strings (roughly 30K
> characters on average)
> As of right now, the table has 53 users in it. About 20% of those have a
> negligible number of events, but the rest of the users have a fairly even
> smattering.
>
> EXPLAIN (ANALYZE, BUFFERS) says:
>
>
> QUERY PLAN
>
>
> ------------------------------------------------------------
> ------------------------------------------------------------
> --------------------------------------
>
> Limit (cost=0.43..1218.57 rows=4000 width=615) (actual
> time=3352.390..3403.572 rows=4000 loops=1)
>
> Buffers: shared hit=120244 read=160198
>
> -> Index Scan using syncerevent_pkey on syncerevent
> (cost=0.43..388147.29 rows=1274560 width=615) (actual
> time=3352.386..3383.100 rows=4000 loops=1)
>
> Index Cond: (id > 12468)
>
> Filter: ((propogatorid <> '"d8130ab9!-66d0!-4f13!-acec!-a9556362f0ad"'::text)
> AND (conflicted <> 1) AND (userid = '57dc984f1c87461c0967e228'::text))
>
> Rows Removed by Filter: 1685801
>
> Buffers: shared hit=120244 read=160198
>
> Planning time: 0.833 ms
>
> Execution time: 3407.633 ms
>
> (9 rows)
>
>
> The postgres verison is: PostgreSQL 9.5.2 on x86_64-pc-linux-gnu, compiled
> by gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-16), 64-bit
>
>
> This query has gotten slower over time.
>
> The postgres server is running on a db.m3.medium RDS instance on Amazon.
>
> (3.75GB of ram)
>
> (~3 GHz processor, single core)
>
> I ran VACUUM, and ANALYZEd this table just prior to running the EXPLAIN
> command.
>
> Here are the server settings:
>
> name | current_setting
> | source
>
>
>
>
> application_name | psql
> | client
>
> archive_command | /etc/rds/dbbin/pgscripts/rds_wal_archive
> %p | configuration file
>
> archive_mode | on
> | configuration file
>
> archive_timeout | 5min
> | configuration file
>
> autovacuum_analyze_scale_factor | 0.05
> | configuration file
>
> autovacuum_naptime | 30s
> | configuration file
>
> autovacuum_vacuum_scale_factor | 0.1
> | configuration file
>
> checkpoint_completion_target | 0.9
> | configuration file
>
> client_encoding | UTF8
> | client
>
> effective_cache_size | 1818912kB
> | configuration file
>
> fsync | on
> | configuration file
>
> full_page_writes | on
> | configuration file
>
> hot_standby | off
> | configuration file
>
> listen_addresses | *
> | command line
>
> lo_compat_privileges | off
> | configuration file
>
> log_checkpoints | on
> | configuration file
>
> log_directory | /rdsdbdata/log/error
>
> Sorry for the formatting, I'm not sure of the best way to format this data
> on a mailing list.
>
>
> If it matters/interests you, here is my underlying confusion:
>
> From some internet sleuthing, I've decided that having a table per user
> (which would totally make this problem a non-issue) isn't a great idea.
> Because there is a file per table, having a table per user would not scale.
> My next thought was partial indexes (which would also totally help), but
> since there is also a table per index, this really doesn't side-step the
> problem. My rough mental model says: If there exists a way that a
> table-per-user scheme would make this more efficient, then there should
> also exist an index that could achieve the same effect (or close enough to
> not matter). I would think that "userid = '57dc984f1c87461c0967e228'" could
> utilize at least one of the two indexes on the userId column, but clearly
> I'm not understanding something.
>
> Any help in making this query more efficient would be greatly appreciated,
> and any conceptual insights would be extra awesome.
>
> Thanks for reading.
>
> -Jake
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Mike Sofen 2016-09-28 00:41:55 Re: Unexpected expensive index scan
Previous Message Jake Nielsen 2016-09-28 00:02:43 Unexpected expensive index scan