The task of the planner/optimizer is to create an optimal execution plan. It first combines all possible ways of scanning and joining the relations that appear in a query. All the created paths lead to the same result and it's the task of the optimizer to estimate the cost of executing each path and find out which one is the cheapest.
The planner/optimizer decides which plans should be generated based upon the types of indexes defined on the relations appearing in a query. There is always the possibility of performing a sequential scan on a relation, so a plan using only sequential scans is always created. Assume an index is defined on a relation (for example a B-tree index) and a query contains the restriction relation.attribute OPR constant. If relation.attribute happens to match the key of the B-tree index and OPR is anything but '<>' another plan is created using the B-tree index to scan the relation. If there are further indexes present and the restrictions in the query happen to match a key of an index further plans will be considered.
After all feasible plans have been found for scanning single relations, plans for joining relations are created. The planner/optimizer considers only joins between every two relations for which there exists a corresponding join clause (i.e. for which a restriction like where rel1.attr1=rel2.attr2 exists) in the where qualification. All possible plans are generated for every join pair considered by the planner/optimizer. The three possible join strategies are:
nested iteration join: The right relation is scanned once for every tuple found in the left relation. This strategy is easy to implement but can be very time consuming.
merge sort join: Each relation is sorted on the join attributes before the join starts. Then the two relations are merged together taking into account that both relations are ordered on the join attributes. This kind of join is more attractive because every relation has to be scanned only once.
hash join: the right relation is first hashed on its join attributes. Next the left relation is scanned and the appropriate values of every tuple found are used as hash keys to locate the tuples in the right relation.
Here we will give a little description of the nodes appearing in the plan. Figure \ref{plan} shows the plan produced for the query in example \ref{simple_select}.
The top node of the plan is a MergeJoin node that has two successors, one attached to the field lefttree and the second attached to the field righttree. Each of the subnodes represents one relation of the join. As mentioned above a merge sort join requires each relation to be sorted. That's why we find a Sort node in each subplan. The additional qualification given in the query (s.sno > 2) is pushed down as far as possible and is attached to the qpqual field of the leaf SeqScan node of the corresponding subplan.
The list attached to the field mergeclauses of the MergeJoin node contains information about the join attributes. The values 65000 and 65001 for the varno fields in the VAR nodes appearing in the mergeclauses list (and also in the targetlist) mean that not the tuples of the current node should be considered but the tuples of the next "deeper" nodes (i.e. the top nodes of the subplans) should be used instead.
Note that every Sort and SeqScan node appearing in figure \ref{plan} has got a targetlist but because there was not enough space only the one for the MergeJoin node could be drawn.
Another task performed by the planner/optimizer is fixing the operator ids in the Expr and Oper nodes. As mentioned earlier, PostgreSQL supports a variety of different data types and even user defined types can be used. To be able to maintain the huge amount of functions and operators it is necessary to store them in a system table. Each function and operator gets a unique operator id. According to the types of the attributes used within the qualifications etc., the appropriate operator ids have to be used.