Re: people who buy A, also buy C, D, E

From: Jan Wieck <JanWieck(at)Yahoo(dot)com>
To: Greg Sabino Mullane <greg(at)turnstep(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org, dan(at)langille(dot)org
Subject: Re: people who buy A, also buy C, D, E
Date: 2005-06-25 14:00:53
Message-ID: 42BD6395.8070301@Yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

On 6/24/2005 11:35 PM, Greg Sabino Mullane wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
>> The goal of my query is: given a book, what did other people who
>> bought this book also buy? I plan the list the 5 most popular such
>> books.

This sounds very much like one of the congestion problems given in the
TPC-W benchmark. You might want to take a look at some of the published
full disclosure reports or the PHP TPC-W implementation on pgfoundry to
get some hints.

Jan

>
> I've been playing with this a little bit, and I don't think you are
> going to get better than you already have. Certainly, the caching
> won't work either as any insert into the watch_list_element has
> the potential to change a very large number of pre-compiled lists.
> However, there are some minor optimizations that can be made to
> speed up the existing query quite a bit. One note first: the LIMIT
> should be 6 not 5 if you really want the five other "books" and the
> book itself will more than likely appear in the list. Picking it
> out is something the client app can do.
>
> * Make sure the tables are freshly analyzed. Might want to bump
> up the default stats a bit too.
>
> * Looks like you already have indexes on the watch_list_element
> table. The watch_list_element_element_id index could be broken
> into multiple conditional indexes, but your explain shows this
> would not really gain us much:
>
> actual time=37.957..41.789
>
> * One big gain would be to cluster the table on watch_list_id:
>
> CREATE INDEX watch_index ON watch_list_element (watch_list_id);
> CLUSTER watch_index ON watch_list_element;
>
> I got about a 25% speedup on my queries by doing this. YMMV, as I
> don't know enough about your conditions to do more than make an
> approximate test database. But it should help this query out.
>
> * Finally, you should upgrade if at all possible. Going from
> 7.4.7 to 8.0.1 gave me a 10% speed increase, while going from
> 8.0.1 to 8.1.0 (e.g. the upcoming version) gave me an additional
> 25% speed boost, mostly due to the new bitmap stuff. So, making
> the jump to 8.0.1 will be good practice for the 8.1.0 jump, right? :)
>
> Overall, I was able to get the query to go about a third faster
> than when I started. Hope this helps.
>
> - --
> Greg Sabino Mullane greg(at)turnstep(dot)com
> PGP Key: 0x14964AC8 200506242328
> http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
>
> -----BEGIN PGP SIGNATURE-----
>
> iD8DBQFCvNCrvJuQZxSWSsgRAmkDAJ44z/Ei27HuEBqx/htmCRHJZWi8VQCfV2mm
> upeE0p3z4h11NJzl5aOqCkc=
> =LVqI
> -----END PGP SIGNATURE-----
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck(at)Yahoo(dot)com #

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Marc G. Fournier 2005-06-26 03:24:53 using 'zoneinfo' to manipulate dates ...
Previous Message Greg Sabino Mullane 2005-06-25 06:40:21 Re: Grouping Too Closely