Re: Select where id in (LARGE LIST) ?

From: Csaba Nagy <nagy(at)ecircle-ag(dot)com>
To: Jasper Potts <jasper(at)jasperpotts(dot)com>
Cc: Postgres JDBC <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: Select where id in (LARGE LIST) ?
Date: 2005-08-02 09:52:17
Message-ID: 1122976336.2837.220.camel@coppola.muc.ecircle.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

In this case I wouldn't go with a boolean flag, setting it on a row will
actually insert a new row in your table (that's what the updates really
do). Then you're better off with the range approach, and keep your
ranges in memory. I have no idea how big the query can get, I guess it
should be somewhere in the documentation at some chapter dealing with
limitations...

Cheers,
Csaba.

On Tue, 2005-08-02 at 11:23, Jasper Potts wrote:
> We are working on a desktop(Swing) application, postgresql is running
> embedded in the application and is hence single user. We have a
> scrollable view of grid of icons much like explorer. The data set is a
> list of item ids resulting from a complex query that can take 30
> seconds. This data set of the query results are stored in table in the
> database. The user then needs to be able to browse(scroll) through the
> list of items and make a selection. Then they can perform operations on
> that selection. Those operations require a way of the selection being
> used in a database query. The options are:
>
> (1) Store the selection on memory on the client.
> (2) Store the selection as a column in the query results table
> (boolean maybe)
> (3) Hybrid solution where the selection is in memory and in the database.
>
> The problem with (1) is how to transfer the selection to the db if it
> gets large. This is where the chunking comes in.
> The problem with the pure selection in the database route(2) is the
> simple selection operations like clicking on an item to select only it.
> Are too slow.
> (3) could be good but is complex to implement with Threading/Locking etc.
>
> My current plan is to go with (1) but change the storage model for the
> selection. The first idea was just a set of ids of selected items,
> problem is if that is 100,000 then is hard to transfer to the WHERE part
> of a query. The new idea is to store it as a list of ranges eg.
> (26-32,143-198,10922-10923) this could then be sent as a "WHERE (row
> >=26 and row <=32) or (row >=143 and row <=198) or (row >=10922 and row
> <=10923)". As the user has to choose each range by hand it is unlikely
> there could be more than 100 ranges.
>
> The big question here is whats the max length of a query sent though
> JDBC to Postgresql???
>
> Many Thanks
>
> Jasper
>
> Csaba Nagy wrote:
>
> >I don't quite understand what you're doing here, but smells to me for
> >something which shouldn't be interactive in the first place. If some
> >query expectedly exceeds a few seconds, we make the operation
> >asynchronous, i.e. the user starts it, gets a page saying the thing is
> >currently being done, and then he receives some kind of notification
> >when the thing is finished, either on the web page (by periodically
> >reloading) or by email for really long lasting things.
> >Now the chunking method does have an overhead indeed, but it has a set
> >of good properties too: it's easily interruptible, it won't bug your
> >database down (it will let some other things go too between the chunks),
> >and you won't get connection timeouts for really long operations... it's
> >just more manageable in many respects.
> >And if you do have some long lasting things, make sure you won't let the
> >user hit reload 50 times... an asynchronous way of doing it will help in
> >this regard too, cause then you know what the user did and reload will
> >only show the status.
> >
> >HTH,
> >Csaba.
> >
> >
> >
> >On Fri, 2005-07-29 at 21:18, Jasper Potts wrote:
> >
> >
> >>I have been trying the chunk method and it wins in cases where the
> >>number of ids is not too high. I was comparing:
> >>
> >>(1) joining the main table with a table that had a column with ids and a
> >>boolean column for selected, both id columns have indexes.
> >>(2) select over main table where id in (...) done in 100s with prepared
> >>statement and results accumulated in java
> >>
> >>with a select that did a sum() of another column the results were:
> >>
> >>No. of Selected Items | Join Select Time in sec | Chunk Select
> >>Time in sec
> >>30 | 0.4
> >>| 0.007
> >>4000 | 0.5
> >>| 0.24
> >>30000 | 0.7 |
> >>1.12
> >>
> >>All of these were with 30,000 rows in main table from JDBC.
> >>
> >>These results don't take into account the time it takes to clear then
> >>populate the select table with selection data from client. At the moment
> >>this can take 1-2 seconds which is far from interactive for the user.
> >>Which is why I am looking for a better method.
> >>
> >>Any other suggestions? :-)
> >>
> >>I would like to push the data size up to a million, which means the user
> >>could in theory select a million rows and apply operation. This is going
> >>to get very slow with the chunk method.
> >>
> >>Many Thanks
> >>
> >>Jasper
> >>
> >>Csaba Nagy wrote:
> >>
> >>
> >>
> >>>Jasper,
> >>>
> >>>You can chunk your operation. That means to only use 100 entries in one
> >>>run, and repeat it until all ids were processed. Use a prepared
> >>>statement, that will save you some overhead. The last chunk will have
> >>>less entries than the parameter placeholders, so you will have to build
> >>>a special last statement, or to set the superfluous parameters to null
> >>>or to one of the values from the last chunk, depends on what kind of
> >>>query you have. We do all our data import/export this way, and it works
> >>>fine.
> >>>
> >>>Cheers,
> >>>Csaba.
> >>>
> >>>
> >>>On Fri, 2005-07-29 at 17:30, Jasper Potts wrote:
> >>>
> >>>
> >>>
> >>>
> >>>>I am working on a gui application which has a list of ids of selected
> >>>>items. To perform an operation on the selected items I do a
> >>>>"select/update .. where id in(...)". There seems to be a limit of
> >>>>100-200 items in the list. Is there a way of doing this with large
> >>>>lists, maybe 10,000, 100,000 long?
> >>>>
> >>>>The best solution I have so far is to create a selection table and write
> >>>>the select out to that and perform a join but the 100,000 inserts are slow.
> >>>>
> >>>>The ids are int8(long), so not talking large amounts of data, couple Mb
> >>>>at most. Database and client running on same machine over localhost.
> >>>>
> >>>>Many Thanks
> >>>>
> >>>>Jasper
> >>>>
> >>>>---------------------------(end of broadcast)---------------------------
> >>>>TIP 9: In versions below 8.0, the planner will ignore your desire to
> >>>> choose an index scan if your joining column's datatypes do not
> >>>> match
> >>>>
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >>>
> >>---------------------------(end of broadcast)---------------------------
> >>TIP 6: explain analyze is your friend
> >>
> >>
> >
> >
> >
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org

In response to

Browse pgsql-jdbc by date

  From Date Subject
Next Message Jasper Potts 2005-08-02 14:00:31 Re: Select where id in (LARGE LIST) ?
Previous Message Oliver Jowett 2005-08-02 09:49:07 Re: Select where id in (LARGE LIST) ?