From: | C F <tacnaboyz(at)yahoo(dot)com> |
---|---|
To: | Richard Huxton <dev(at)archonet(dot)com>, pgsql-sql(at)postgresql(dot)org |
Subject: | Re: CASE returning multiple values (was SQL Help) |
Date: | 2003-05-30 18:01:13 |
Message-ID: | 20030530180113.47091.qmail@web20405.mail.yahoo.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
I was afraid someone was going to ask that :)
Okay, I'll do my best at explaining where I'm coming from....
I'm working on a mapping application it is user-configurable. What this means (as it pertains to this disucssion) is that the through a configuration file, the user is able to define the rules that the application will use to determine which geometries *and* attributes to pull from PG at various scales. These 'rules' that the user defines also contain other specifics such as how to symbolize the geometry, how to label the geometry etc. All of these parameters can either be hard coded by the user into the configuration file, or they can define an expression that will be used to dynamically pull it from the database. On top of all of this, we have two more levels of queries.
So... at the very top level, the user can define an expression that will determine that everything queried from this table will match these criteria... this is my WHERE clause.
Then below this level, various rules can be defined... each rule can have another definition that evaluates into a SQL expression. Now, I could make each rule an entirely separate query, but for one, they all share the exact same top level WHERE clause, and two, there could potentially be many many rules which I would think would cause severe performance issues. Let me give you an example...
Let's say we're mapping cities of the United States based on population... In other words, I want to symbolize the cities on the map based on population (larger symbol for larger populations, smaller symbol for smaller populations, etc). I also want to show the city names of the larger cities *only*. So, what the client application (client to PostgreSQL) needs is; the city location, which rules evaluate to true, and the city names of those larger cities (defined by a rule).
We have a table of cities of the world. So the top level filter (that all rules will share) is, "COUNTRY = 'USA'".
Rule 1 says that cities with a population over 1,000,000 will have a large symbol and be labeled with the city name. So the sql could look like this...
select longitude, latitude, city_name from city where country = 'USA' and population > 1000000;
... seems easy enough, but remember we can have an infinite number of rules (not really inifinite, but you get the point). So....
Rule2 says that cities with a population under 1,000,000 will have a small symbol (note, we do not care about the city name here). So, by itself, the SQL could look like this...
select longitude, latitude from city where country = 'USA' and population < 1000000;
Okay, for this simple example, I would have no problem doing two different queries (this example is extremely simplified compared to what is possible/likely). But what if the user wanted to give a different symbol for every population in 100,000 increments? If our range of populations was 100,000 to 5,000,000 that would be 50 queries! Not only would it be 50 queries, but it would be 50 queries using a nearly identical WHERE clause. So I thought it would be more efficient to combine the queries into something like the following...
select
longitude,
latitutde,
(case when population > 1000000 then true else false end) as rule1,
(case when population > 1000000 then city_name end) as label1,
(case when population < 1000000 then true else false end) as rule2
where
country = 'USA'
;
I could just only concern the SQL with the boolean values for the rules, and return all city names, and let the application simply discard them, but that seems like not a good thing to do for very large resultsets (and again, this is overly simplified, we could have many such columns full of uncessary data being returned). And by the way, that query cannot be written as something like...
(case when population > 1000000 then 'rule1' when population < 1000000 then 'rule2' end) as rules
... because the rules are NOT mutually exclusive, there can many positives.
Anyway, hopefully I didn't leave anything important out. It sounds like there's no obvious solution to avoiding multiple evaluations of the test expressions. The rules are relatively static once the config files are read in, so I could conceivably create stored procedures with a bunch of IF statements at that time. However, I'm not sure if in PG there is a way to dynamically populate the resulting recordset on the fly. I can think of 10 different ways accomplish what I'm trying to do, but hopefully someone has some ideas on what would be the best performing.
Sorry if it's information overload, but you tried to answer my questions, so I thought I should at least try to answer yours :)
Any thoughts much appreciated.
You could write a set returning function, but you'd just end up doing the same
thing. Can you explain what it is you're trying to acheive - real
fields/schemas etc?
--
Richard Huxton
---------------------------------
Do you Yahoo!?
Free online calendar with sync to Outlook(TM).
From | Date | Subject | |
---|---|---|---|
Next Message | Katka a Daniel Dunajsky | 2003-05-30 18:04:05 | Calculating with the time |
Previous Message | Tom Lane | 2003-05-30 17:51:49 | Re: CASE returning multiple values (was SQL Help) |