| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> | 
|---|---|
| To: | Raúl Gutiérrez Sánchez <raul(at)laeff(dot)esa(dot)es> | 
| Cc: | PgSql-SQL Mailing List <pgsql-sql(at)postgresql(dot)org> | 
| Subject: | Re: Speed depending of Join Order. | 
| Date: | 2003-01-22 20:24:57 | 
| Message-ID: | 23778.1043267097@sss.pgh.pa.us | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-sql | 
=?iso-8859-1?Q?Ra=FAl=20Guti=E9rrez=20S=E1nchez?= <raul(at)laeff(dot)esa(dot)es> writes:
> Note that the only difference is the order of the join elements. Using
> version 7.2.2, which I have been using untill now, the time expended in
> both of them was the same, using the right indexes. However, using
> version 7.3.1 which I have instaled recently, the results of the explain
> are the following:
It turns out that the problem was inaccuracy in some recently-added code
that tries to account for the possibility that a mergejoin won't scan
all the way to the end.  Your sample data had only one possible value
for the mis_id and mod_mis_id columns; this boundary case exposed the
fact that the code was testing for "x < max" where it should be testing
"x <= max".  Coupled with a lack of sanity-checking, the bogus
calculation affected the estimated costs in an asymmetrical way.  This
is why the choice of a bad plan only occurred in one case out of two.
I've applied the attached patch for 7.3.2. Thanks for the report!
regards, tom lane
*** src/backend/optimizer/path/costsize.c.orig	Wed Sep  4 16:31:20 2002
--- src/backend/optimizer/path/costsize.c	Wed Jan 22 15:10:20 2003
***************
*** 645,652 ****
  		innerscansel = firstclause->left_mergescansel;
  	}
  
! 	outer_rows = outer_path->parent->rows * outerscansel;
! 	inner_rows = inner_path->parent->rows * innerscansel;
  
  	/* cost of source data */
  
--- 645,666 ----
  		innerscansel = firstclause->left_mergescansel;
  	}
  
! 	/* convert selectivity to row count; must scan at least one row */
! 
! 	outer_rows = ceil(outer_path->parent->rows * outerscansel);
! 	if (outer_rows < 1)
! 		outer_rows = 1;
! 	inner_rows = ceil(inner_path->parent->rows * innerscansel);
! 	if (inner_rows < 1)
! 		inner_rows = 1;
! 
! 	/*
! 	 * Readjust scan selectivities to account for above rounding.  This is
! 	 * normally an insignificant effect, but when there are only a few rows
! 	 * in the inputs, failing to do this makes for a large percentage error.
! 	 */
! 	outerscansel = outer_rows / outer_path->parent->rows;
! 	innerscansel = inner_rows / inner_path->parent->rows;
  
  	/* cost of source data */
  
*** src/backend/utils/adt/selfuncs.c.orig	Fri Oct 18 22:56:16 2002
--- src/backend/utils/adt/selfuncs.c	Wed Jan 22 15:12:05 2003
***************
*** 1740,1746 ****
  				rsortop,
  				ltop,
  				gtop,
! 				revltop;
  	Datum		leftmax,
  				rightmax;
  	double		selec;
--- 1740,1748 ----
  				rsortop,
  				ltop,
  				gtop,
! 				leop,
! 				revgtop,
! 				revleop;
  	Datum		leftmax,
  				rightmax;
  	double		selec;
***************
*** 1779,1813 ****
  	/* Look up the "left < right" and "left > right" operators */
  	op_mergejoin_crossops(opno, <op, >op, NULL, NULL);
  
! 	/* Look up the "right < left" operator */
! 	revltop = get_commutator(gtop);
! 	if (!OidIsValid(revltop))
! 		return;					/* shouldn't happen */
  
  	/*
  	 * Now, the fraction of the left variable that will be scanned is the
  	 * fraction that's <= the right-side maximum value.  But only believe
  	 * non-default estimates, else stick with our 1.0.
  	 */
! 	selec = scalarineqsel(root, ltop, false, left,
  						  rightmax, right->vartype);
  	if (selec != DEFAULT_INEQ_SEL)
  		*leftscan = selec;
  
  	/* And similarly for the right variable. */
! 	selec = scalarineqsel(root, revltop, false, right,
  						  leftmax, left->vartype);
  	if (selec != DEFAULT_INEQ_SEL)
  		*rightscan = selec;
  
  	/*
  	 * Only one of the two fractions can really be less than 1.0; believe
! 	 * the smaller estimate and reset the other one to exactly 1.0.
  	 */
  	if (*leftscan > *rightscan)
  		*leftscan = 1.0;
! 	else
  		*rightscan = 1.0;
  }
  
  /*
--- 1781,1829 ----
  	/* Look up the "left < right" and "left > right" operators */
  	op_mergejoin_crossops(opno, <op, >op, NULL, NULL);
  
! 	/* Look up the "left <= right" operator */
! 	leop = get_negator(gtop);
! 	if (!OidIsValid(leop))
! 		return;					/* insufficient info in catalogs */
! 
! 	/* Look up the "right > left" operator */
! 	revgtop = get_commutator(ltop);
! 	if (!OidIsValid(revgtop))
! 		return;					/* insufficient info in catalogs */
! 
! 	/* Look up the "right <= left" operator */
! 	revleop = get_negator(revgtop);
! 	if (!OidIsValid(revleop))
! 		return;					/* insufficient info in catalogs */
  
  	/*
  	 * Now, the fraction of the left variable that will be scanned is the
  	 * fraction that's <= the right-side maximum value.  But only believe
  	 * non-default estimates, else stick with our 1.0.
  	 */
! 	selec = scalarineqsel(root, leop, false, left,
  						  rightmax, right->vartype);
  	if (selec != DEFAULT_INEQ_SEL)
  		*leftscan = selec;
  
  	/* And similarly for the right variable. */
! 	selec = scalarineqsel(root, revleop, false, right,
  						  leftmax, left->vartype);
  	if (selec != DEFAULT_INEQ_SEL)
  		*rightscan = selec;
  
  	/*
  	 * Only one of the two fractions can really be less than 1.0; believe
! 	 * the smaller estimate and reset the other one to exactly 1.0.  If we
! 	 * get exactly equal estimates (as can easily happen with self-joins),
! 	 * believe neither.
  	 */
  	if (*leftscan > *rightscan)
  		*leftscan = 1.0;
! 	else if (*leftscan < *rightscan)
  		*rightscan = 1.0;
+ 	else
+ 		*leftscan = *rightscan = 1.0;
  }
  
  /*
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Stephan Szabo | 2003-01-22 21:09:14 | Re: To use a VIEW or not to use a View..... | 
| Previous Message | Tomasz Myrta | 2003-01-22 18:55:20 | Re: To use a VIEW or not to use a View..... |