Re: Slow query with big tables

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Craig James <cjames(at)emolecules(dot)com>
Cc: Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Mike Sofen <msofen(at)runbox(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Slow query with big tables
Date: 2016-08-27 16:36:56
Message-ID: 6955.1472315816@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Craig James <cjames(at)emolecules(dot)com> writes:
> Straight hash-table indexes (which Postgres doesn't use) have O(1) access
> time. The amount of data has no effect on the access time.

This is wishful thinking --- once you have enough data, O(1) goes out the
window. For example, a hash index is certainly not going to continue to
scale linearly once you reach its maximum possible number of buckets
(2^N for N-bit hashes, and remember you can't get very many useful hash
bits out of small objects like integers). But even before that, large
numbers of buckets put enough stress on your storage system that you will
see some not very O(1)-ish behavior, just because too little of the index
fits in whatever cache and RAM you have. Any storage hierarchy is
ultimately going to impose O(log N) access costs, that's the way they're
built.

I think it's fairly pointless to discuss such matters in the abstract.
If you want to make useful engineering tradeoffs you have to talk about
specific data sets and available hardware.

regards, tom lane

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Jeff Janes 2016-08-27 18:12:17 Re: Slow query with big tables
Previous Message Craig James 2016-08-27 14:13:00 Re: Slow query with big tables