From: | "Glen Parker" <glenebob(at)nwlink(dot)com> |
---|---|
To: | <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: proposal: be smarter about i/o patterns in index scan |
Date: | 2004-05-19 21:06:44 |
Message-ID: | AJEKKAIECKNMBCEKADJPKEACCFAA.glenebob@nwlink.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> -----Original Message-----
> From: pgsql-hackers-owner(at)postgresql(dot)org
> [mailto:pgsql-hackers-owner(at)postgresql(dot)org]On Behalf Of Tom Lane
> > Or:
> > 2. Sort AND COPY TID's into physical order.
> > 3. Read heap in phy. order, match results to un-sorted TID list.
> > That sounds quite cheap.
>
> No, it sounds like handwaving. What's your "match results" step, and
You got me :-)
> does it take less than O(N^2) time? How do you get to *return* the
Less than O(N^2)??? Geez I think so!
> tuples in index order, without having read them all before you return
> the first one? (Which is really what the problem is with the sort
> alternative, at least for fast-start cases such as the LIMIT 1 example.)
(you'll have to pardon me for thinking as I type... and being to long winded
:-)
It takes O(1) time.
If you have a hard maximum of TID's you'll grab from the index (which sounds
right), you could store the TID list in a vector. The IO-order-sorted list
could be a singly-linked-list OR a vector, but either way, each entry would
have to point back to an offset in the index-ordered list.
The index-ordered list would be (sizeof(TID) * max_TID_count) bytes, and the
IO-ordered list would be (sizeof(TID) + sizeof(int) * max_TID_count) bytes.
If a TID is 8 bytes and we're going to fetch 2048 index entries per
iteration, that gets us (*counting on fingers*) 16K (for index-ordered list)
+ 24K for the IO-ordered list.
The index-ordered list can also be a singly-linked list, in which case the
mapping would be by pointer. If both lists are single-linked lists, then
the size overhead (for a fully realized 2048 entry scan) would be:
((sizeof(TID) + sizeof(void*)) * max_TID_count) + ((sizeof(TID) +
sizeof(void*) + sizeof(void*)) * max_TID_count) = (*more counting on
fingers*) ... 24K + 32K = 56K.
Hmm, the vector style would be crazy expensive for small queries. The
linked-list approach sounds pretty good though. I assume no one thinks
doing this would be completely free :-) And, if you did the IO-ordered list
as a vector, you'd save the linked-list overhead, and you would know exactly
how big to make it, so it wouldn't have to hurt you on small queries.
2048 entries seems pretty darn generous actually.
> How do you get to *return* the
> tuples in index order, without having read them all before you return
Hmm, worsed case scenario would be to scan the heap in exactly-reverse order
of index order. With a 2048 entry batch size, you'd have a fair amount of
overhead on a LIMIT 1 :-)
Unless the index scanner could be given a maximum tuple count, rather than
just 'knowing' it from a GUC value. Would this dirty up the plan tree
beyond all recognition?
> What happens when you run out of memory for the list of TIDs ... er,
> make that two lists of TIDs?
Now you really got me. Hand waving: the same thing you do when you run out
of memory anywhere else :-) By configuring the server to fetch index
entries in 1-entry batches, you'd be right back to the overhead (or very
nearly so) of the current implementation, so it would only hurt people who
chose to be hurt by it :-)
-Glen
From | Date | Subject | |
---|---|---|---|
Next Message | Dave Page | 2004-05-19 21:18:52 | Re: search engine down (again) |
Previous Message | Gaetano Mendola | 2004-05-19 21:01:11 | emaildt_0.0.2 |