From: | Richard Huxton <dev(at)archonet(dot)com> |
---|---|
To: | Brendan Duddridge <brendan(at)clickspace(dot)com> |
Cc: | PostgreSQL Performance <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: OT: Data structure design question: How do they count |
Date: | 2006-04-10 09:23:48 |
Message-ID: | 443A2424.4020702@archonet.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Brendan Duddridge wrote:
>
> Now, initially I thought they would just pre-compute these counts, but
> the problem is, when you click on any of the above attribute values,
> they reduce the remaining possible set of matching products (and set of
> possible remaining attributes and attribute values) by the amount
> displayed next to the attribute value selected. You can click on any
> combination of attribute values to filter down the remaining set of
> matching products, so there's a large combination of paths you can take
> to arrive at a set of products you might be interested in.
>
> Do you think they are pre-computed? Or do you think they might use a
> query similar to the following?:
Pre-computed almost certainly, but at what level of granularity? And
with application-level caching?
> select pav.attribute_value_id, count(p.product_id)
> from product_attribute_value pav,
> attribute a,
> product p
> where a.attribute_id in (some set of attribute ids) and
> pav.product_id = p.product_id and
> pav.attribute_id = a.attribute_id and p.product_id in
> (select product_id
> from category_product
> where category_id = some category id) and
> p.is_active = 'true'
> group by pav.attribute_value_id;
>
> It would seem to me that although the above query suggests a normalized
> database structure, that joining with 3 tables plus a 4th table in the
> sub-query with an IN qualifier and grouping to get the product counts
> would take a VERY long time, especially on a possible result set of
> 1,260,658 products.
Hmm - I'm not sure I'd say this was necessarily normalised. In the
example you gave there were three definite types of attribute:
1. Price range (< 20, 20-50, ...)
2. Product type (lighting, rugs, ...)
3. Store (art.com, homeannex, ...)
Your example discards this type information.
I'm also not sure it lets store A sell widgets for 19.99 and B for 25.99
So - let's look at how we might break this down into simple relations:
product_types (product_id, prod_type, prod_subtype)
product_availability (product_id, store_id, price_range)
and so on for each set of parameters.
Then, if PG isn't calculating fast enough I'd be tempted to throw in a
summary table:
product_counts(store_id, price_range, prod_type, prod_subtype, ...,
num_products)
Then total over this for the top-level queries.
I'd also cache common top-level queries at the applicaton level anyway.
--
Richard Huxton
Archonet Ltd
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2006-04-10 09:26:49 | Re: pg 8.1.3, AIX, huge box, painfully slow. |
Previous Message | Richard Huxton | 2006-04-10 09:06:41 | Re: pg 8.1.3, AIX, huge box, painfully slow. |