Quick Links

Re: How to get good performance for very large lists/sets?

From:	Alban Hertroys <haramrae(at)gmail(dot)com>
To:	Richard Frith-Macdonald <richard(dot)frith-macdonald(at)brainstorm(dot)co(dot)uk>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: How to get good performance for very large lists/sets?
Date:	2014-10-07 06:49:33
Message-ID:	397F595E-DC1D-4E5F-BEF5-476CA50F63FA@gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On 06 Oct 2014, at 10:02, Richard Frith-Macdonald <richard(dot)frith-macdonald(at)brainstorm(dot)co(dot)uk> wrote:

> I'm wondering if anyone can help with advice on how to manage large lists/sets of items in a postgresql database.
>
> I have a database which uses multiple lists of items roughly like this:
>
> CREATE TABLE List (
> ID SERIAL,
> Name VARCHAR ....
> );
>
> and a table containing individual entries in the lists:
>
> CREATE TABLE ListEntry (
> ListID INT, /* Reference the List table */
> ItemID INT /* References an Item table */
> ) ;
> CREATE UNIQUE INDEX ListEntryIDX ON ListEntry(ListID, ItemID);

Don’t you have any PK’s? A UNIQUE INDEX is not the same as a PK, a PK does not allow NULLs for example.

For that matter, I’d ditch the serial column in List - it attributes to a larger index size which decreases the chances that the index will fit in memory, making it less feasable to the query planner. IMHO, natural keys are to be preferred here over surrogate keys. That is assuming that List.Name is supposed to be unique.

> Now, there are thousands of lists, many with millions of entries, and items are added to and removed from lists in an unpredictable way (in response to our customer's actions, not under our control). Lists are also created by customer actions.

> I think that server won't use index-only scans because, even in cases where a particular list has not had any recent changes, the ListEntry table will almost always have had some change (for one of the other lists) since its last vacuum.
> Perhaps creating multiple ListEntry tables (one for each list) would allow better performance; but that would be thousands (possibly tens of thousands) of tables, and allowing new tables to be created by our clients might conflict with things like nightly backups.
>
> Is there a better way to manage list/set membership for many thousands of sets and many millions of items?

Another benefit of using natural keys is that you don’t need to fetch the actual List entries - the Names are right there in your ListEntry table. You only you need to look records up in the List table when you want their details (columns other than Name).

A possible drawback in this case is that the PK index on ListEntry would probably be larger.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

In response to

How to get good performance for very large lists/sets? at 2014-10-06 08:02:17 from Richard Frith-Macdonald

Browse pgsql-general by date

	From	Date	Subject
Next Message	Adrian Klaver	2014-10-07 13:04:37	Re: Really strange foreign key constraint problem blocking delete
Previous Message	Tim Mickelson	2014-10-07 06:29:48	Re: Really strange foreign key constraint problem blocking delete