Re: How to do faster DML

From: "Peter J(dot) Holzer" <hjp-pgsql(at)hjp(dot)at>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: How to do faster DML
Date: 2024-02-13 14:47:04
Message-ID: 20240213144704.areoyvswhsyfhk36@hjp.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 2024-02-12 11:28:41 -0500, Greg Sabino Mullane wrote:
> On Mon, Feb 12, 2024 at 1:50 AM veem v <veema0000(at)gmail(dot)com> wrote:
>
> So we were thinking, adding many column to a table should be fine in
> postgres (as here we have a use case in which total number of columns may
> go till ~500+). But then,  considering the access of columns towards the
> end of a row is going to add more time as compared to the column which is
> at the start of the row. As you mentioned, accessing 100th column may add 4
> to 5 times more as compared to the access of first column. So , is it
> advisable here to go for similar approach of breaking the table into two ,
> if the total number of column reaches certain number/threshold for a table?
>
>
> I'm not sure of what Peter was testing exactly to get those 4-5x figures,

Sorry, I should have included my benchmark code (it's short enough - see
below).

What i was actually timing was

select count(*) from t_postgresql_column_bench where v{i} = 'a'

for various i.

> but I presume that is column access time,

That was the goal. Of course there is always some overhead but I figured
that by counting rows where a column has a constant value the overhead
is minimal or at least constant.

> which would not mean a direct effect on your total query time of 4-5x.

Right. In any real application the column access time is only a part of
the total processing time and probably a small part, so the effect on
total processing time is correspondingly smaller.

hp

------------------------------------------------------------------------
#!/usr/bin/python3

import random
import time
import psycopg2

n_cols = 100
n_rows = 100000

db = psycopg2.connect("")
csr = db.cursor()

csr.execute("drop table if exists t_postgresql_column_bench")
q = "create table t_postgresql_column_bench ("
q += ", ".join(f"v{i} text" for i in range(n_cols))
q += ")"
csr.execute(q)
q = "insert into t_postgresql_column_bench values("
q += ", ".join("%s" for i in range(n_cols))
q += ")"

for j in range(n_rows):
v = [ chr(random.randint(96+1, 96+26)) for i in range(n_cols)]
csr.execute(q, v)
db.commit()

for i in range(n_cols):
q = f"select count(*) from t_postgresql_column_bench where v{i} = 'a'"
t0 = time.clock_gettime(time.CLOCK_MONOTONIC)
csr.execute(q)
r = csr.fetchall()
print(r)
t1 = time.clock_gettime(time.CLOCK_MONOTONIC)
print(i, t1 - t0)
db.commit()
------------------------------------------------------------------------

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp(at)hjp(dot)at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

In response to

Browse pgsql-general by date

  From Date Subject
Next Message David G. Johnston 2024-02-13 14:49:28 Re: FOR UPDATE SKIP LOCKED and get locked row/avoid updating other row(s)
Previous Message Ron Johnson 2024-02-13 14:10:48 Re: MAT. VIEW security problems and PG 10-11 versions?