Re: General guidance if there is an in dadabase solution or should stay as excel vba solution.

From: Henry Drexler <alonup8tb(at)gmail(dot)com>
To: David Johnston <polobo(at)yahoo(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: General guidance if there is an in dadabase solution or should stay as excel vba solution.
Date: 2011-09-19 15:25:03
Message-ID: CAAtgU9SUthm5pTU1TnXrs0Wq-LykPOtv2vXZG8o5mWBtG723Kw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

excellent - thank you again.

On Mon, Sep 19, 2011 at 11:05 AM, David Johnston <polobo(at)yahoo(dot)com> wrote:

> You can probably do this without plpgsql through liberal use of CTEs (WITH)
> and sub-queries.
>
> Also look at arrayed types for "saving" matches and filtering out already
> tested pairs.
>
> David J.
>
>
> On Sep 19, 2011, at 10:37, Henry Drexler <alonup8tb(at)gmail(dot)com> wrote:
>
> Thanks you that is the kind of suggestion I was looking for - I will look
> into plpgsql.
>
> Yes, there are several optimizations in it - though due to the actual data
> the first few characters cannot be tested. Some of the actual optimizations
> are only to reach out to the surrounding 100 rows and to skip numbers in the
> characters.
>
> On Mon, Sep 19, 2011 at 10:17 AM, David Johnston < <polobo(at)yahoo(dot)com>
> polobo(at)yahoo(dot)com> wrote:
>
>> Look at this module for the actual comparison algorithms (found in
>> Appendix F)****
>>
>> ** **
>>
>> “fuzzystrmatch”****
>>
>> ** **
>>
>> Performance would be my only concern but you have that issue either way.
>> With “plpgsql” you can do most things in the database you could do in VBA.
>> Whether you want to bog the DB down with a processor intensive process like
>> this is another question to consider.****
>>
>> ** **
>>
>> I am hoping you are putting in limits such as requiring that the first
>> character (or even first partial word) are equal before even checking for an
>> off-by-one error. With the “Levenshtein” algorithm you’d be looking for a
>> value of “1” to match your current behavior.****
>>
>> ** **
>>
>> In short, what you are doing (given your specification below) in VBA is
>> also doable in PostgreSQL.****
>>
>> ** **
>>
>> David J.****
>>
>> ** **
>>
>> ** **
>>
>> *From:* <pgsql-general-owner(at)postgresql(dot)org>
>> pgsql-general-owner(at)postgresql(dot)org [mailto:<pgsql-general-owner(at)postgresql(dot)org>
>> pgsql-general-owner(at)postgresql(dot)org] *On Behalf Of *Henry Drexler
>> *Sent:* Monday, September 19, 2011 9:10 AM
>> *To:* pgsql-general
>> *Subject:* [GENERAL] General guidance if there is an in dadabase solution
>> or should stay as excel vba solution.****
>>
>> ** **
>>
>> I have no problem doing this in excel vba, though as the list grows larger
>> obviously excel has row limits.****
>>
>> ** **
>>
>> ** **
>>
>> What is being done:****
>>
>> There is a column of data imported into the db - they are
>> just text strings, there are about 80,000 rows of them. The goal is to do a
>> single character elimination to find matches.****
>>
>> ** **
>>
>> so for instance the data is a bunch of rows of this:****
>>
>> ** **
>>
>> hello there****
>>
>> what is your name****
>>
>> happy birthday****
>>
>> we are winner****
>>
>> we are winners****
>>
>> we like the sky****
>>
>> task to do****
>>
>> tasks to do****
>>
>> ** **
>>
>> so for the above in excel I created a macro that will remove one character
>> and compare and do this for each character of each text string.****
>>
>> ** **
>>
>> The final product:****
>>
>> ** **
>>
>> hello there****
>>
>> what is your name****
>>
>> happy birthday****
>>
>> we are winner we are winners****
>>
>> we are winners we are winner****
>>
>> we like the sky****
>>
>> task to do tasks to do****
>>
>> tasks to do task to do****
>>
>> ** **
>>
>> ** **
>>
>> so you can see that it found the matches with being one character off.***
>> *
>>
>> ** **
>>
>> ** **
>>
>> Is this something best done outside of the db and in excel as I am doing
>> or is it possible to do it in db?****
>>
>> ** **
>>
>> Note I am not looking for someone to give a whole solution - just if they
>> know it can be done let me know the direction so I can research it and
>> figure it out.****
>>
>> ** **
>>
>> Any advice is welcome.****
>>
>> ** **
>>
>
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Szymon Guz 2011-09-19 16:00:42 Re: postgis and pgpool
Previous Message David Johnston 2011-09-19 15:05:00 Re: General guidance if there is an in dadabase solution or should stay as excel vba solution.