mongoDB VS PostgreSQL dml performance use python (pymongo & py-postgresql)

前面测试了mongodb和postgresql的插入性能对比, 参考如下 : 

1. http://blog.163.com/digoal@126/blog/static/16387704020151435825593/

2. http://blog.163.com/digoal@126/blog/static/1638770402015142858224/

3. http://blog.163.com/digoal@126/blog/static/16387704020151210840303/

本文将测试对比一下select, update, 以及select, insert, update混合场景的性能.

同样使用并行8个线程测试.

(因为使用的驱动问题, python测试结果性能较差, mongo改用motor驱动的异步调用后, 性能提升明显, 测试结果仅供参考, 如果是PG的话, 建议使用pgbench测试)

mongoDB : 

更新测试 :

# vi test.py
import threading
import time
import pymongo
import random

c=pymongo.MongoClient('/tmp/mongodb-5281.sock')
db = c.test_database
db.drop_collection('test_collection')
collection = db.test_collection
print(collection.count())
for i in range(0,1000000):
  collection.insert({'id':i, 'username': 'digoal.zhou', 'age':32, 'email':'digoal@126.com', 'qq':'276732431'})
collection.create_index("id")

class n_t(threading.Thread):   #The timer class is derived from the class threading.Thread
  def __init__(self, num):
    threading.Thread.__init__(self)
    self.thread_num = num

  def run(self): #Overwrite run() method, put what you want the thread do here
    c=pymongo.MongoClient('/tmp/mongodb-5281.sock')
    db = c.test_database
    collection = db.test_collection
    start_t = time.time()
    print("TID:" + str(self.thread_num) + " " + str(start_t))
    for i in range(0,125000):
      collection.update({'id':random.randrange(0,1000000)}, {'$set': {'age': random.randrange(0,1000000)}})
    stop_t = time.time()
    print("TID:" + str(self.thread_num) + " " + str(stop_t))
    print(stop_t-start_t)

def test():
  t_names = dict()
  for i in range(0,8):
    t_names[i] = n_t(i)
    t_names[i].start()
  return

if __name__ == '__main__':
  test()

测试结果379秒 : 

[root@localhost ~]# python test.py
0
TID:0 1423070907.7699816
TID:1 1423070907.770696
TID:2 1423070907.7744105
TID:3 1423070907.7760801
TID:4 1423070907.779555
TID:7 1423070907.78037
TID:6 1423070907.7860947
TID:5 1423070907.78793
TID:6 1423071285.5971715
377.81107687950134
TID:7 1423071286.6500263
378.8696563243866
TID:0 1423071286.9464445
379.17646288871765
TID:1 1423071287.1227949
379.352098941803
TID:3 1423071287.1230247
379.3469445705414
TID:5 1423071287.2262568
379.4383268356323
TID:4 1423071287.2609653
379.4814102649689
TID:2 1423071287.4058232
379.6314127445221

查询测试 : 

import threading
import time
import pymongo
import random

c=pymongo.MongoClient('/tmp/mongodb-5281.sock')
db = c.test_database
# db.drop_collection('test_collection')
collection = db.test_collection
print(collection.count())
# for i in range(0,1000000):
#   collection.insert({'id':i, 'username': 'digoal.zhou', 'age':32, 'email':'digoal@126.com', 'qq':'276732431'})
# collection.create_index("id")

class n_t(threading.Thread):   #The timer class is derived from the class threading.Thread
  def __init__(self, num):
    threading.Thread.__init__(self)
    self.thread_num = num

  def run(self): #Overwrite run() method, put what you want the thread do here
    c=pymongo.MongoClient('/tmp/mongodb-5281.sock')
    db = c.test_database
    collection = db.test_collection
    start_t = time.time()
    print("TID:" + str(self.thread_num) + " " + str(start_t))
    for i in range(0,125000):
      collection.find_one({'id': random.randrange(0,1000000)})
    stop_t = time.time()
    print("TID:" + str(self.thread_num) + " " + str(stop_t))
    print(stop_t-start_t)

def test():
  t_names = dict()
  for i in range(0,8):
    t_names[i] = n_t(i)
    t_names[i].start()
  return

if __name__ == '__main__':
  test()

测试结果361秒 : 

[root@localhost ~]# python test.py
2000000
TID:2 1423079875.4572093
TID:3 1423079875.4576375
TID:1 1423079875.4596934
TID:0 1423079875.4600854
TID:4 1423079875.4589622
TID:5 1423079875.4653761
TID:6 1423079875.463017
TID:7 1423079875.4694664
TID:7 1423080235.7239776
360.2545111179352
TID:3 1423080236.109339
360.6517014503479
TID:4 1423080236.1194305
360.66046833992004
TID:1 1423080236.260948
360.8012545108795
TID:2 1423080236.5218844
361.06467509269714
TID:5 1423080236.6404896
361.17511343955994
TID:0 1423080236.6446981
361.1846127510071
TID:6 1423080236.6607506
361.1977336406708

更新, 插入, 查询综合测试 : 

import threading
import time
import pymongo
import random

c=pymongo.MongoClient('/tmp/mongodb-5281.sock')
db = c.test_database
# db.drop_collection('test_collection')
collection = db.test_collection
print(collection.count())
# for i in range(0,1000000):
#   collection.insert({'id':i, 'username': 'digoal.zhou', 'age':32, 'email':'digoal@126.com', 'qq':'276732431'})
# collection.create_index("id")

class n_t(threading.Thread):   #The timer class is derived from the class threading.Thread
  def __init__(self, num):
    threading.Thread.__init__(self)
    self.thread_num = num

  def run(self): #Overwrite run() method, put what you want the thread do here
    c=pymongo.MongoClient('/tmp/mongodb-5281.sock')
    db = c.test_database
    collection = db.test_collection
    start_t = time.time()
    print("TID:" + str(self.thread_num) + " " + str(start_t))
    for i in range(0,125000):
      collection.insert({'id':random.randrange(1000001,2000000), 'username': 'digoal.zhou', 'age':32, 'email':'digoal@126.com', 'qq':'276732431'})
      collection.update({'id':random.randrange(0,1000000)}, {'$set': {'age': random.randrange(0,1000000)}})
      collection.find_one({'id': random.randrange(0,1000000)})
    stop_t = time.time()
    print("TID:" + str(self.thread_num) + " " + str(stop_t))
    print(stop_t-start_t)

def test():
  t_names = dict()
  for i in range(0,8):
    t_names[i] = n_t(i)
    t_names[i].start()
  return

if __name__ == '__main__':
  test()

测试结果1150秒 : 

[root@localhost ~]# python test.py
2000000
TID:0 1423080359.006871
TID:1 1423080359.0083587
TID:2 1423080359.009925
TID:4 1423080359.0124109
TID:3 1423080359.015088
TID:5 1423080359.0179524
TID:6 1423080359.0209677
TID:7 1423080359.0235417
TID:4 1423081508.11507
1149.1026592254639
TID:7 1423081508.1888452
1149.1653034687042
TID:2 1423081508.2641344
1149.2542095184326
TID:1 1423081508.3973265
1149.3889677524567
TID:0 1423081508.5400703
1149.5331993103027
TID:6 1423081508.594207
1149.573239326477
TID:5 1423081509.0279126
1150.0099601745605
TID:3 1423081509.0943637
1150.0792756080627

PostgreSQL : 

更新测试 :

import threading
import time
import postgresql
import random

conn = { "user": "postgres",
         "database": "postgres",
         "unix": "/data01/pgdata/pg_root/.s.PGSQL.1921"
       }

db = postgresql.open(**conn)
db.execute("drop table if exists tt")
db.execute("create table tt(id int, username name, age int2, email text, qq text)")
ins = db.prepare("insert into tt values($1,$2,$3,$4,$5)")
for i in range(0,1000000):
  ins(i,'digoal.zhou',32,'digoal@126.com','276732431')
db.execute("create index idx_tt_id on tt(id)")
print(db.query("select count(1) as a from tt"))

class n_t(threading.Thread):   #The timer class is derived from the class threading.Thread
  def __init__(self, num):
    threading.Thread.__init__(self)
    self.thread_num = num

  def run(self): #Overwrite run() method, put what you want the thread do here
    conn = { "user": "postgres",
             "database": "postgres",
             "unix": "/data01/pgdata/pg_root/.s.PGSQL.1921"
           }

    db = postgresql.open(**conn)
    upd = db.prepare("update tt set age=$1 where id=$2")
    start_t = time.time()
    print("TID:" + str(self.thread_num) + " " + str(start_t))
    for i in range(0,125000):
      upd(random.randrange(0,100), random.randrange(0,1000000))
    stop_t = time.time()
    print("TID:" + str(self.thread_num) + " " + str(stop_t))
    print(stop_t-start_t)

def test():
  t_names = dict()
  for i in range(0,8):
    t_names[i] = n_t(i)
    t_names[i].start()
  return

if __name__ == '__main__':
  test()

测试结果244秒 : 

TID:0 1423072792.0481002
TID:1 1423072792.050467
TID:3 1423072792.0514963
TID:2 1423072792.051693
TID:5 1423072792.059382
TID:4 1423072792.0605848
TID:7 1423072792.0643597
TID:6 1423072792.0657377
TID:2 1423073034.8827112
242.8310182094574
TID:5 1423073035.024978
242.96559596061707
TID:4 1423073035.2550452
243.1944603919983
TID:7 1423073035.5245414
243.46018171310425
TID:1 1423073036.0639975
244.0135304927826
TID:3 1423073036.3519847
244.30048847198486
TID:0 1423073036.5292883
244.48118805885315
TID:6 1423073036.5383787
244.47264099121094

查询测试 : 

import threading
import time
import postgresql
import random

conn = { "user": "postgres",
         "database": "postgres",
         "unix": "/data01/pgdata/pg_root/.s.PGSQL.1921"
       }

db = postgresql.open(**conn)
# db.execute("drop table if exists tt")
# db.execute("create table tt(id int, username name, age int2, email text, qq text)")
# ins = db.prepare("insert into tt values($1,$2,$3,$4,$5)")
# for i in range(0,1000000):
#   ins(i,'digoal.zhou',32,'digoal@126.com','276732431')
# db.execute("create index idx_tt_id on tt(id)")
print(db.query("select count(1) as a from tt"))

class n_t(threading.Thread):   #The timer class is derived from the class threading.Thread
  def __init__(self, num):
    threading.Thread.__init__(self)
    self.thread_num = num

  def run(self): #Overwrite run() method, put what you want the thread do here
    conn = { "user": "postgres",
             "database": "postgres",
             "unix": "/data01/pgdata/pg_root/.s.PGSQL.1921"
           }

    db = postgresql.open(**conn)
    sel = db.prepare("select * from tt where id=$1 limit 1")
    start_t = time.time()
    print("TID:" + str(self.thread_num) + " " + str(start_t))
    for i in range(0,125000):
      sel(random.randrange(0,1000000))
    stop_t = time.time()
    print("TID:" + str(self.thread_num) + " " + str(stop_t))
    print(stop_t-start_t)

def test():
  t_names = dict()
  for i in range(0,8):
    t_names[i] = n_t(i)
    t_names[i].start()
  return

if __name__ == '__main__':
  test()

测试结果438秒 : 

postgres@localhost-> python test.py
[(1000000,)]
TID:2 1423081634.6041436
TID:1 1423081634.6072564
TID:5 1423081634.6098883
TID:0 1423081634.6110475
TID:6 1423081634.611464
TID:7 1423081634.6147678
TID:3 1423081634.617597
TID:4 1423081634.6184704
TID:7 1423082070.8112974
436.1965296268463
TID:4 1423082071.5796437
436.96117329597473
TID:5 1423082071.6695313
437.0596430301666
TID:0 1423082071.8521369
437.24108934402466
TID:1 1423082072.5634701
437.95621371269226
TID:2 1423082072.678791
438.0746474266052
TID:3 1423082072.9825838
438.3649866580963
TID:6 1423082072.9963892
438.3849251270294

更新, 插入, 查询综合测试 : 

import threading
import time
import postgresql
import random

conn = { "user": "postgres",
         "database": "postgres",
         "unix": "/data01/pgdata/pg_root/.s.PGSQL.1921"
       }

db = postgresql.open(**conn)
# db.execute("drop table if exists tt")
# db.execute("create table tt(id int, username name, age int2, email text, qq text)")
# ins = db.prepare("insert into tt values($1,$2,$3,$4,$5)")
# for i in range(0,1000000):
#   ins(i,'digoal.zhou',32,'digoal@126.com','276732431')
# db.execute("create index idx_tt_id on tt(id)")
print(db.query("select count(1) as a from tt"))

class n_t(threading.Thread):   #The timer class is derived from the class threading.Thread
  def __init__(self, num):
    threading.Thread.__init__(self)
    self.thread_num = num

  def run(self): #Overwrite run() method, put what you want the thread do here
    conn = { "user": "postgres",
             "database": "postgres",
             "unix": "/data01/pgdata/pg_root/.s.PGSQL.1921"
           }

    db = postgresql.open(**conn)
    ins = db.prepare("insert into tt values($1,$2,$3,$4,$5)")
    upd = db.prepare("update tt set age=$1 where id=$2")
    sel = db.prepare("select * from tt where id=$1")

    start_t = time.time()
    print("TID:" + str(self.thread_num) + " " + str(start_t))
    for i in range(0,125000):
      ins(random.randrange(1000001,2000000),'digoal.zhou',32,'digoal@126.com','276732431')
      upd(random.randrange(0,100), random.randrange(0,1000000))
      sel(random.randrange(0,1000000))
    stop_t = time.time()
    print("TID:" + str(self.thread_num) + " " + str(stop_t))
    print(stop_t-start_t)

def test():
  t_names = dict()
  for i in range(0,8):
    t_names[i] = n_t(i)
    t_names[i].start()
  return

if __name__ == '__main__':
  test()

测试结果938秒 : 

postgres@localhost-> python test.py
[(1000000,)]
TID:0 1423083626.888068
TID:2 1423083626.8912995
TID:1 1423083626.8920445
TID:3 1423083626.893638
TID:4 1423083626.8974612
TID:6 1423083626.9039218
TID:5 1423083626.9061637
TID:7 1423083626.908666

TID:5 1423084561.3804135
934.4742498397827
TID:6 1423084563.4344044
936.5304825305939
TID:2 1423084564.1677904
937.2764909267426
TID:0 1423084564.5768228
937.6887547969818
TID:4 1423084564.839536
937.9420747756958
TID:1 1423084564.9242597
938.0322151184082
TID:7 1423084565.0638845
938.1552186012268
TID:3 1423084565.345857
938.4522190093994

PostgreSQL使用pgbench的测试结果 : 

# 初始化数据
psql
truncate tt;
insert into tt select generate_series(1,1000000), 'digoal.zhou',32,'digoal@126.com','276732431';

更新

postgres@localhost-> vi test.sql
\setrandom v_id 0 1000000
\setrandom v_age 0 100
update tt set age=:v_age where id=:v_id;

测试结果32秒 : 

postgres@localhost-> pgbench -M prepared -n -r -f ./test.sql -c 8 -j 4 -t 125000
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 8
number of threads: 4
number of transactions per client: 125000
number of transactions actually processed: 1000000/1000000
tps = 31631.177435 (including connections establishing)
tps = 31637.366682 (excluding connections establishing)
statement latencies in milliseconds:
        0.002963        \setrandom v_id 0 1000000
        0.000671        \setrandom v_age 0 100
        0.232106        update tt set age=:v_age where id=:v_id;

查询

postgres@localhost-> vi test.sql
\setrandom v_id 0 1000000
select * from tt where id=:v_id;

测试结果15秒 : 

postgres@localhost-> pgbench -M prepared -n -r -f ./test.sql -c 8 -j 4 -t 125000
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 8
number of threads: 4
number of transactions per client: 125000
number of transactions actually processed: 1000000/1000000
tps = 66487.929382 (including connections establishing)
tps = 66514.422913 (excluding connections establishing)
statement latencies in milliseconds:
        0.002007        \setrandom v_id 0 1000000
        0.104391        select * from tt where id=:v_id;

插入, 更新, 查询

postgres@localhost-> vi test.sql
\setrandom v_newid 1000001 2000000
\setrandom v_id 0 1000000
\setrandom v_age 0 100
insert into tt values(:v_newid, 'digoal.zhou',32,'digoal@126.com','276732431');
update tt set age=:v_age where id=:v_id;
select * from tt where id=:v_id;

测试结果耗时69秒 : 

postgres@localhost-> pgbench -M prepared -n -r -f ./test.sql -c 8 -j 4 -t 125000
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 8
number of threads: 4
number of transactions per client: 125000
number of transactions actually processed: 1000000/1000000
tps = 14429.720005 (including connections establishing)
tps = 14431.006796 (excluding connections establishing)
statement latencies in milliseconds:
        0.003031        \setrandom v_newid 1000001 2000000
        0.000816        \setrandom v_id 0 1000000
        0.000733        \setrandom v_age 0 100
        0.150249        insert into tt values(:v_newid, 'digoal.zhou',32,'digoal@126.com','276732431');
        0.180603        update tt set age=:v_age where id=:v_id;
        0.190850        select * from tt where id=:v_id;

[小结]

python 测试结果 : 

mongoDB 

插入100W记录耗时 - 364秒

更新100W记录耗时 - 379秒

100W记录索引检索100W次耗时 - 361秒

综合测试, 插入100W记录, 更新100W记录, 查询100W次耗时 - 1150秒

PostgreSQL

插入100W记录耗时 - 226秒.

更新100W记录耗时 - 244秒.

100W记录索引检索100W次耗时 - 438秒.

综合测试, 插入100W记录, 更新100W记录, 查询100W次耗时 - 938秒.

PostgreSQL 使用pgbench测试结果 : 

插入100W记录耗时 - 16秒

更新100W记录耗时 - 32秒.

100W记录索引检索100W次耗时 - 15秒.

综合测试, 插入100W记录, 更新100W记录, 查询100W次耗时 - 69秒.

[参考]
1. http://blog.163.com/digoal@126/blog/static/16387704020151435825593/

2. http://blog.163.com/digoal@126/blog/static/1638770402015142858224/

3. http://blog.163.com/digoal@126/blog/static/16387704020151210840303/

4. http://api.mongodb.org/python/current/api/pymongo/connection.html

5. http://pythonhosted.org//py-postgresql/driver.html#queries

6. https://docs.python.org/2/library/random.html#module-random

时间: 2024-07-28 13:14:21

mongoDB VS PostgreSQL dml performance use python (pymongo & py-postgresql)的相关文章

Psycopg v2.4发布 Python语言的PostgreSQL数据库接口

Psycopg 是 Python 语言的 PostgreSQL 数据库接口. 它的主要优势在于完全支持Python DB API 2.0,以及安全的多线程支持.它适用于随时创建.销毁大量游标的.和产生大量并发INSERT.UPDATE操作的多线程数据库应用.Psycopg包内含 ZPsycopgDA,一个Zope数据库接口. 特性: ●对libpq v3协议的完全支持: ●COPY TO/COPY FROM语句: ●对所有基本Python类型的支持:字符串(支持Unicode).整型.长整型.浮

name-如何用python web.py web service 做 多重参数的 服务?

问题描述 如何用python web.py web service 做 多重参数的 服务? 大家好.我用了python 的 web.py做了一个 web service install web.py cd webpy 编辑python web service. #!/usr/bin/env python urls = ('/title_matching2','title_matching2') app = web.application(urls,globals()) class title_m

setup py-如何用python setup.py打包多个.py文件到同一个包下面

问题描述 如何用python setup.py打包多个.py文件到同一个包下面 网上提供的一个最简单的例子是:只有一个test.py需要打包时 在setup.py中写上如下内容: coding: utf-8 from distutils.core import setup setup(name="MyTest", version="1.0", description="test", author="ChaoMa", autho

关于django执行python manage.py syncdb命令却缺少自己所需要建的models.py内的表

问题描述 关于django执行python manage.py syncdb命令却缺少自己所需要建的models.py内的表 在django项目中的settings.py文件中的_app代码,数据库没问题,可以连接上,但是:python manage.py syncdb命令却无法生成自己models.py内所定义的实体类表. INSTALLED_APPS = ( 'django.contrib.admin', 'django.contrib.auth', 'django.contrib.cont

python __init__.py

python __init__.py,可以理解为配置引用的配置文件.   如何在你的代码中引用一个.py文件呢,常见的情况是,事先写好一个.py文 件,在另一个文件中需要import时,将事先写好的.py文件拷贝 到当前目录,或者是在sys.path中增加事先写好的.py文件所在的目录,然后import. 这样的做法,对于少数文件是可行的,但如果程序数目很 多,层级很复杂,就很吃力了. 有没有办法,像Java的Package一样,将多个.py文件组织起来,以便在外部统一调用,和在内部互相调用呢?

PostgreSQL介绍以及如何开发框架中使用PostgreSQL数据库

最近准备下PostgreSQL数据库开发的相关知识,本文把总结的PPT内容通过博客记录分享,本随笔的主要内容是介绍PostgreSQL数据库的基础信息,以及如何在我们的开发框架中使用PostgreSQL数据库,希望大家多多提意见. 1.PostgreSQL数据库介绍 PostgreSQL是以加州大学伯克利分校计算机系开发的 POSTGRES,现在已经更名为PostgreSQL.它具有很多不错的特点: 开源:PostgreSQL 是一个自由的对象-关系数据库服务器(数据库管理系统),它在灵活的 B

test luasql's postgresql driver performance (not better than pgbench)

测试数据库性能的方法很多, 例如使用pgbench, sysbench. 对PostgreSQL而言, pgbench是性能损失最小的一种测试工具, 本文将使用lua以及luasql驱动测试一下, 我们对比一下使用lua和pgbench测试的结果, 看看lua会带来多少性能损失. 本文测试环境 :  Lua 5.2.3 CentOS 6.4 x64 PostgreSQL 9.3.1 CPU model name : Intel(R) Xeon(R) CPU E5504 @ 2.00GHz ste

test luadbi's postgresql driver performance (vs pgbench)

测试机环境 :  Lua 5.1.5 CentOS 5.7 x64 PostgreSQL 9.3.2 CPU model name : Intel(R) Xeon(R) CPU E5440 @ 2.83GHz stepping : 6 cpu MHz : 2833.435 Luadbi测试结果 :  [root@db-172-16-3-33 luadbi]# taskset -c 2 /opt/lua5.1/bin/lua Lua 5.1.5 Copyright (C) 1994-2012 Lu

Python : __init__.py的作用

一.__init__.py的作用:         在使用PyDev插件,在Eclipse创建package时,会自动在package所在的目录自动创建一个__init__.py文件,且文件内容为空.这个文件有什么有呢?        __init__.py的作用有如下几点:        1. 相当于class中的def __init__(self):函数,用来初始化模块.        2. 把所在目录当作一个package处理        3. from-import 语句导入子包时需要