big joins not converging

From: Dan Ancona <da(at)vizbang(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: big joins not converging
Date: 2011-03-10 21:25:24
Message-ID: D1D5E528-DAEB-4E4F-A3E6-EA2A95340D1A@vizbang.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi postgressers -

As part of my work with voter file data, I pretty regularly have to
join one large-ish (over 500k rows) table to another. Sometimes this
is via a text field (countyname) + integer (voter id). I've noticed
sometimes this converges and sometimes it doesn't, seemingly
regardless of how I index things. So I'm looking for general thoughts
on the joining of large tables, but also running into a specific issue
with the following slightly different query:

This one is between two tables that are a 754k row list of voters and
a 445k row list of property owners. (I'm trying to find records where
the owner name matches the voter name at the same address.) I have
btree single column indices built on all the relevant fields, and
multicolumn indices built across all the columns I'm matching. The
full schemas of both tables are below. The machine is an older-ish (3
years ago) dual-core pentium w/ 4GB RAM running FreeBSD, more details
below.

This is the query I've come up with so far:

explain analyze
update vanalameda set ownerflag = 'exact'
from aralameda where
vanalameda.streetno ~~ aralameda.streetnum and
vanalameda.streetname ~~ aralameda.streetname and
vanalameda.lastname ~~ aralameda.ownername and
vanalameda.firstname ~~ aralameda.ownername;

If I include the analyze, this didn't complete after running
overnight. If I drop the analyze and just explain, I get this:

"Nested Loop (cost=46690.74..15384448712.74 rows=204 width=204)"
" Join Filter: (((vanalameda.streetno)::text ~~
(aralameda.streetnum)::text) AND ((vanalameda.streetname)::text ~~
(aralameda.streetname)::text) AND ((vanalameda.lastname)::text ~~
(aralameda.ownername)::text) AND ((vanalameda.firstname)::text ~~
(aralameda.ownername)::text))"
" -> Seq Scan on vanalameda (cost=0.00..26597.80 rows=734780
width=204)"
" -> Materialize (cost=46690.74..58735.87 rows=444613 width=113)"
" -> Seq Scan on aralameda (cost=0.00..38647.13 rows=444613
width=113)"

One general question: does the width of the tables (i.e. the numbers
of columns not being joined and the size of those fields) matter? The
tables do have a lot of extra columns that I could slice out.

Thanks so much!

Dan

System:
client: pgadmin III, Mac OS

server:
select version();
PostgreSQL 8.3.7 on i386-portbld-freebsd7.2, compiled by GCC cc (GCC)
4.2.1 20070719 [FreeBSD]
(installed from freebsd package system, default configuration)

%sysctl -a | egrep -i 'hw.machine|hw.model|hw.ncpu'
hw.machine: i386
hw.model: Genuine Intel(R) CPU 2160 @ 1.80GHz
hw.ncpu: 2
hw.machine_arch: i386

w/ 4GB RAM, 1 1GB disk, no RAID.

Here's the tables...

Table "public.aralameda"
Column | Type | Modifiers
-----------------+-----------------------+-----------
dt000o039001010 | character varying(13) |
o3901010 | character varying(15) |
dt17 | character varying(2) |
dt046 | character varying(3) |
streetnum | character varying(10) |
streetname | character varying(50) |
unitnum | character varying(10) |
city | character varying(30) |
zip | character varying(5) |
unk3 | character varying(1) |
crap1 | character varying(12) |
crap2 | character varying(12) |
crap3 | character varying(12) |
crap4 | character varying(12) |
crap5 | character varying(12) |
crap6 | character varying(12) |
crap7 | character varying(12) |
crap8 | character varying(12) |
crap9 | character varying(12) |
crap10 | character varying(12) |
dt2009 | character varying(4) |
dt066114 | character varying(6) |
crap11 | character varying(8) |
crap12 | character varying(8) |
ownername | character varying(50) |
careofname | character varying(50) |
unk4 | character varying(1) |
maddr1 | character varying(60) |
munitnum | character varying(10) |
mcitystate | character varying(30) |
mzip | character varying(5) |
mplus4 | character varying(4) |
dt40 | character varying(2) |
dt4 | character varying(1) |
crap13 | character varying(8) |
d | character varying(1) |
dt0500 | character varying(4) |
unk6 | character varying(1) |
crap14 | character varying(8) |
unk7 | character varying(1) |
Indexes:
"arall" btree (streetnum, streetname, ownername)
"aroname" btree (ownername)
"arstreetname" btree (streetname)
"arstreetnum" btree (streetnum)

Table "public.vanalameda"
Column | Type | Modifiers
---------------+-----------------------+-----------
vanid | character varying(8) |
lastname | character varying(25) |
firstname | character varying(16) |
middlename | character varying(16) |
suffix | character varying(3) |
streetno | character varying(5) |
streetnohalf | character varying(3) |
streetprefix | character varying(2) |
streetname | character varying(24) |
streettype | character varying(4) |
streetsuffix | character varying(2) |
apttype | character varying(4) |
aptno | character varying(8) |
city | character varying(13) |
state | character varying(2) |
zip5 | character varying(5) |
zip4 | character varying(4) |
vaddress | character varying(33) |
maddress | character varying(41) |
mcity | character varying(25) |
mstate | character varying(2) |
mzip5 | character varying(5) |
mzip4 | character varying(4) |
mstreetno | character varying(6) |
mstreetnohalf | character varying(9) |
mstreetprefix | character varying(2) |
mstreetname | character varying(40) |
mstreettype | character varying(4) |
mstreetsuffix | character varying(2) |
mapttype | character varying(4) |
maptno | character varying(13) |
dob | character varying(10) |
countyfileid | character varying(7) |
countyid | character varying(3) |
affno | character varying(12) |
ownerflag | character varying(20) |
Indexes:
"vanall" btree (streetno, streetname, lastname, firstname)
"vanfname" btree (firstname)
"vanlname" btree (lastname)
"vanstreetname" btree (streetname)
"vanstreetno" btree (streetno)

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Steve Atkins 2011-03-10 22:13:02 Re: big joins not converging
Previous Message Merlin Moncure 2011-03-10 21:14:50 Re: unexpected stable function behavior