Re: [HACKERS] Slow count(*) again...

From: david(at)lang(dot)hm
To: Віталій Тимчишин <tivv00(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: [HACKERS] Slow count(*) again...
Date: 2011-02-04 07:32:47
Message-ID: alpine.DEB.2.00.1102032327430.8162@asgard.lang.hm
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Fri, 4 Feb 2011, ??????? ???????? wrote:

> 2011/2/3 <david(at)lang(dot)hm>
>
>>
>> If the table is not large enough to fit in ram, then it will compete for
>> I/O, and the user will have to wait.
>>
>> what I'm proposing is that as the records are created, the process doing
>> the creation makes copies of the records (either all of them, or some of
>> them if not all are needed for the analysis, possibly via shareing memory
>> with the analysis process), this would be synchronous with the load, not
>> asynchronous.
>>
>> this would take zero I/O bandwidth, it would take up some ram, memory
>> bandwidth, and cpu time, but a load of a large table like this is I/O
>> contrained.
>>
>> it would not make sense for this to be the default, but as an option it
>> should save a significant amount of time.
>>
>> I am making the assumption that an Analyze run only has to go over the data
>> once (a seqential scan of the table if it's >> ram for example) and gathers
>> stats as it goes.
>>
>> with the current code, this is a completely separate process that knows
>> nothing about the load, so if you kick it off when you start the load, it
>> makes a pass over the table (competing for I/O), finishes, you continue to
>> update the table, so it makes another pass, etc. As you say, this is a bad
>> thing to do. I am saying to have an option that ties the two togeather,
>> essentially making the data feed into the Analyze run be a fork of the data
>> comeing out of the insert run going to disk. So the Analyze run doesn't do
>> any I/O and isn't going to complete until the insert is complete. At which
>> time it will have seen one copy of the entire table.
>>
> Actually that are two different problems. The one is to make analyze more
> automatic to make select right after insert more clever by providing
> statistics to it.
> Another is to make it take less IO resources.
> I dont like for it to be embedded into insert (unless the threshold can be
> determined before inserts starts). Simply because it is more CPU/memory that
> will slow down each insert. And if you will add knob, that is disabled by
> default, this will be no more good than manual analyze.

if it can happen during the copy instead of being a step after the copy it
will speed things up. things like the existing parallel restore could use
this instead ofneeding a separate pass. so I don't think that having to
turn it on manually makes it useless, any more than the fact that you have
to explicity disable fsync makes that disabling feature useless (and the
two features would be likely to be used togeather)

when a copy command is issued, I assume that there is some indication of
how much data is going to follow. I know that it's not just 'insert
everything until the TCP connection terminates' because that would give
you no way of knowing if the copy got everything in or was interrupted
part way through. think about what happens with ftp if the connection
drops, you get a partial file 'successfully' as there is no size provided,
but with HTTP you get a known-bad transfer that you can abort or resume.

David Lang

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Віталій Тимчишин 2011-02-04 07:39:38 Re: [HACKERS] Slow count(*) again...
Previous Message Віталій Тимчишин 2011-02-04 07:24:20 Re: [HACKERS] Slow count(*) again...

Browse pgsql-performance by date

  From Date Subject
Next Message Віталій Тимчишин 2011-02-04 07:39:38 Re: [HACKERS] Slow count(*) again...
Previous Message Віталій Тимчишин 2011-02-04 07:24:20 Re: [HACKERS] Slow count(*) again...