Re: MongoDB 3.2 beating Postgres 9.5.1?

From: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
To: Oleg Bartunov <obartunov(at)gmail(dot)com>
Cc: Paul Jones <pbj(at)cmicdo(dot)com>, Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: MongoDB 3.2 beating Postgres 9.5.1?
Date: 2016-03-14 07:31:36
Message-ID: CA+q6zcWkSwQcub_pTQw5stCRVF0HxbN7c6k+oe1SgBkY_scSuQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi, Paul

I agree with Oleg, EDB benchmarks are strange sometimes. I did the same
benchmarks several months ago. I never noticed the cache influence back
then, so I tried to reproduce your situation now (on a 5*10^6 records
although). I started to play with db cache (using `echo 3 >
/proc/sys/vm/drop_cache`), and I see difference in time execution for two
subsequent queries, but `explain` info are almost identical, e.g. `shared
hit & read`:

```
benchmark=# explain (buffers, analyze, verbose) select data from
json_tables where data @> '{"name": "AC3 Case Red"}';
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on public.json_tables (cost=102.74..19001.47 rows=4999
width=1257) (actual time=740.556..215956.655 rows=454546 loops=1)
Output: data
Recheck Cond: (json_tables.data @> '{"name": "AC3 Case Red"}'::jsonb)
Rows Removed by Index Recheck: 2114606
Heap Blocks: exact=31624 lossy=422922
Buffers: shared hit=1371 read=455551
-> Bitmap Index Scan on json_tables_idx (cost=0.00..101.49 rows=4999
width=0) (actual time=731.010..731.010 rows=454547 loops=1)
Index Cond: (json_tables.data @> '{"name": "AC3 Case Red"}'::jsonb)
Buffers: shared hit=1371 read=1005
Planning time: 6.352 ms
Execution time: 216075.830 ms
(11 rows)

benchmark=# explain (buffers, analyze, verbose) select data from
json_tables where data @> '{"name": "AC3 Case Red"}';
QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on public.json_tables (cost=102.74..19001.47 rows=4999
width=1257) (actual time=222.476..10692.703 rows=454546 loops=1)
Output: data
Recheck Cond: (json_tables.data @> '{"name": "AC3 Case Red"}'::jsonb)
Rows Removed by Index Recheck: 2114606
Heap Blocks: exact=31624 lossy=422922
Buffers: shared hit=1371 read=455551
-> Bitmap Index Scan on json_tables_idx (cost=0.00..101.49 rows=4999
width=0) (actual time=214.736..214.736 rows=454547 loops=1)
Index Cond: (json_tables.data @> '{"name": "AC3 Case Red"}'::jsonb)
Buffers: shared hit=1371 read=1005
Planning time: 0.089 ms
Execution time: 10767.739 ms
(11 rows)
```

But I see almost the same execution time from mongodb `explain` (216075ms
for pg and 177784ms for mongo, which isn't so much I think):

```
DBQuery.shellBatchSize = 10000000000; db.json_tables.find({"name": "AC3
Case Red"}).explain(true)
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "benchmark.json_tables",
"indexFilterSet" : false,
"parsedQuery" : {
"name" : {
"$eq" : "AC3 Case Red"
}
},
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"name" : 1
},
"indexName" : "name_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"name" : [
"[\"AC3 Case Red\", \"AC3
Case Red\"]"
]
}
}
},
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 454546,
"executionTimeMillis" : 177784,
"totalKeysExamined" : 454546,
"totalDocsExamined" : 454546,
"executionStages" : {
"stage" : "FETCH",
"nReturned" : 454546,
"executionTimeMillisEstimate" : 175590,
"works" : 454547,
"advanced" : 454546,
"needTime" : 0,
"needYield" : 0,
"saveState" : 8638,
"restoreState" : 8638,
"isEOF" : 1,
"invalidates" : 0,
"docsExamined" : 454546,
"alreadyHasObj" : 0,
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 454546,
"executionTimeMillisEstimate" : 700,
"works" : 454547,
"advanced" : 454546,
"needTime" : 0,
"needYield" : 0,
"saveState" : 8638,
"restoreState" : 8638,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"name" : 1
},
"indexName" : "name_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"name" : [
"[\"AC3 Case Red\", \"AC3
Case Red\"]"
]
},
"keysExamined" : 454546,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0
}
},
"allPlansExecution" : [ ]
},
"serverInfo" : {
"host" : "ip-172-30-0-236",
"port" : 27017,
"version" : "3.2.4",
"gitVersion" : "e2ee9ffcf9f5a94fad76802e28cc978718bb7a30"
},
"ok" : 1
}
```

I not missed anything, am I right? Are you sure that it took much more time
for PostgreSQL?
Besides, everything is fine for queries with more small results (while the
query {"name": "AC3 Case Red"} is almost 1/10 of entire dataset):

```
=# insert into json_tables values('{"name": "test name"}'::jsonb);

=# explain (buffers, analyze, verbose) select data from json_tables where
data @> '{"name": "test name"}';

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on public.json_tables (cost=62.75..18965.16 rows=5000
width=1257) (actual time=0.020..0.021 rows=1 loops=1)
Output: data
Recheck Cond: (json_tables.data @> '{"name": "test name"}'::jsonb)
Heap Blocks: exact=1
Buffers: shared hit=5
-> Bitmap Index Scan on json_tables_idx (cost=0.00..61.50 rows=5000
width=0) (actual time=0.011..0.011 rows=1 loops=1)
Index Cond: (json_tables.data @> '{"name": "test name"}'::jsonb)
Buffers: shared hit=4
Planning time: 1.164 ms
Execution time: 0.045 ms
(10 rows)
```

As far as I know there isn't much to do about caching. I don't know if it's
appropriate, but you can manually warm-up the cache (something like `cat
/var/lib/postgresql/9.5/main/base/*/* > /dev/null`).

On 14 March 2016 at 00:30, Oleg Bartunov <obartunov(at)gmail(dot)com> wrote:

>
> On Mar 11, 2016 4:40 PM, "Paul Jones" <pbj(at)cmicdo(dot)com> wrote:
> >
> > I have been running the EDB benchmark that compares Postgres and MongoDB.
> > I believe EDB ran it against PG 9.4 and Mongo 2.6. I am running it
> > against PG 9.5.1 and Mongo 3.2 with WiredTiger storage using 10,000,000
> > JSON records generated by the benchmark. It looks like Mongo is winning,
> > and apparently because of its cache management.
>
> Dmitry was working on the same benchmarks. I think edb benchmark is broken
> by design. Better, use ycsb benchmarks. I hope, Dmitry will share his
> results.
>
> >
> > The first queries on both run in ~30 min. And, once PG fills its cache,
> > it whips Mongo on repeats of the *same* query (vmstat shows no disk
> > reads for PG).
> >
> > However, when different query on the same table is issued to both,
> > vmstat shows that PG has to read the *entire* table again, and it takes
> > ~30 min. Mongo does a lot of reads initially but after about 5 minutes,
> > it stops reading and completes the query, most likely because it is
> > using its cache very effectively.
> >
> > Host: Virtual Machine
> > 4 CPUs
> > 16 Gb RAM
> > 200 Gb Disk
> > RHEL 6.6
> >
> > PG: 9.5.1 compiled from source
> > shared_buffers = 7GB
> > effectve_cache_size = 12GB
> >
> > Mongo: 3.2 installed with RPM from Mongo
> >
> > In PG, I created the table by:
> >
> > CREATE TABLE json_tables
> > (
> > data JSONB
> > );
> >
> > After loading, it creates the index:
> >
> > CREATE INDEX json_tables_idx ON json_tables USING GIN (data
> jsonb_path_ops);
> >
> > After a lot of experimentation, I discovered that the benchmark was not
> > using PG's index, so I modified the four queries to be:
> >
> > SELECT data FROM json_tables WHERE data @> '{"brand": "ACME"}';
> > SELECT data FROM json_tables WHERE data @> '{"name": "Phone Service
> Basic Plan"}';
> > SELECT data FROM json_tables WHERE data @> '{"name": "AC3 Case Red"}';
> > SELECT data FROM json_tables WHERE data @> '{"type": "service"}';
> >
> > Here are two consecutive explain analyze for PG, for the same query.
> > No functional difference in the plans that I can tell, but the effect
> > of PG's cache on the second is dramatic.
> >
> > If anyone has ideas on how I can get PG to more effectively use the cache
> > for subsequent queries, I would love to hear them.
> >
> > -------
> >
> > benchmark=# explain analyze select data from json_tables where data @>
> '{"name": "AC3 Case Red"}';
> >
> > QUERY PLAN
> >
> >
> >
> -----------------------------------------------------------------------------------------------------------------------------------------
> >
> > Bitmap Heap Scan on json_tables (cost=113.50..37914.64 rows=10000
> width=1261)
> > (actual time=2157.118..1259550.327 rows=909091 loops=1)
> > Recheck Cond: (data @> '{"name": "AC3 Case Red"}'::jsonb)
> > Rows Removed by Index Recheck: 4360296
> > Heap Blocks: exact=37031 lossy=872059
> > -> Bitmap Index Scan on json_tables_idx (cost=0.00..111.00
> rows=10000 width =0) (actual time=2141.250..2141.250 rows=909091 loops=1)
> > Index Cond: (data @> '{"name": "AC3 Case Red"}'::jsonb)
> > Planning time: 291.932 ms
> > Execution time: 1259886.920 ms
> > (8 rows)
> >
> > Time: 1261191.844 ms
> >
> > benchmark=# explain analyze select data from json_tables where data @>
> '{"name": "AC3 Case Red"}';
> > QUERY PLAN
> >
> >
> ---------------------------------------------------------------------------------------------------------------------------------------
> >
> > Bitmap Heap Scan on json_tables (cost=113.50..37914.64 rows=10000
> width=1261) (actual time=779.261..29815.262 rows=909091 loops=1)
> > Recheck Cond: (data @> '{"name": "AC3 Case Red"}'::jsonb)
> > Rows Removed by Index Recheck: 4360296
> > Heap Blocks: exact=37031 lossy=872059
> > -> Bitmap Index Scan on json_tables_idx (cost=0.00..111.00
> rows=10000 width =0) (actual time=769.081..769.081 rows=909091 loops=1)
> > Index Cond: (data @> '{"name": "AC3 Case Red"}'::jsonb)
> > Planning time: 33.967 ms
> > Execution time: 29869.381 ms
> >
> > (8 rows)
> >
> > Time: 29987.122 ms
> >
> >
> > --
> > Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-general
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Michael Paquier 2016-03-14 08:04:31 Re: MongoDB 3.2 beating Postgres 9.5.1?
Previous Message Tom Smith 2016-03-14 03:13:06 Re: retrieve grouping sets/rollup rows