使用pymongo驱动连接mongodb插入100W记录, 需要364秒. 性能较差, 原因是pymongo未使用异步接口.
motor驱动使用异步接口, 性能有所提升.
安装motor驱动, 可以使用pip快速安装, 解决依赖问题. pip是一个脚本, 可以自动从PyPI安装module. 如下
# cat /usr/local/bin/pip3.4
#!/usr/local/bin/python3.4
# -*- coding: utf-8 -*-
import re
import sys
from pip import main
if __name__ == '__main__':
sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
sys.exit(main())
安装motor
[root@localhost bin]# pip3.4 install motor
或
[root@localhost bin]# pip3.4 install motor --upgrade
motor的使用手册如下
http://motor.readthedocs.org/en/stable/index.html
http://motor.readthedocs.org/en/stable/tutorial.html
修改测试脚本 :
使用批量写入.
[root@localhost ~]# cat t.py
import motor
import tornado
import time
import threading
c = motor.MotorClient('/tmp/mongodb-5281.sock', max_pool_size=1000)
db = c.test_database
db.drop_collection('test_collection')
collection = db.test_collection
@motor.gen.coroutine
def do_count():
n = yield collection.count()
print(str(n) + ' documents in collection')
tornado.ioloop.IOLoop.current().run_sync(do_count)
def my_callback(result, error):
tornado.ioloop.IOLoop.instance().stop()
print(time.time())
class n_t(threading.Thread): #The timer class is derived from the class threading.Thread
def __init__(self, num):
threading.Thread.__init__(self)
self.thread_num = num
def run(self): #Overwrite run() method, put what you want the thread do here
start_t = time.time()
print("TID:" + str(self.thread_num) + " " + str(start_t))
collection.insert(({'id':i, 'username': 'digoal.zhou', 'age':32, 'email':'digoal@126.com', 'qq':'276732431'} for i in range(0,100000)), callback=my_callback)
tornado.ioloop.IOLoop.instance().start()
stop_t = time.time()
print("TID:" + str(self.thread_num) + " " + str(stop_t))
print(stop_t-start_t)
def test():
t_names = dict()
for i in range(0,8):
t_names[i] = n_t(i)
t_names[i].start()
return
if __name__ == '__main__':
test()
报了一堆错误, 除了0号线程未报错, 其他线程都报类似以下错误 :
Exception in thread Thread-5:
Traceback (most recent call last):
File "/usr/local/lib/python3.4/threading.py", line 921, in _bootstrap_inner
self.run()
File "t.py", line 33, in run
tornado.ioloop.IOLoop.instance().start()
File "/usr/local/lib/python3.4/site-packages/tornado/ioloop.py", line 704, in start
raise RuntimeError("IOLoop is already running")
RuntimeError: IOLoop is already running
ERROR:tornado.application:Uncaught exception, closing connection.
Traceback (most recent call last):
File "/usr/local/lib/python3.4/site-packages/tornado/iostream.py", line 508, in wrapper
return callback(*args)
File "/usr/local/lib/python3.4/site-packages/tornado/stack_context.py", line 275, in null_wrapper
return fn(*args, **kwargs)
File "/usr/local/lib/python3.4/site-packages/motor/__init__.py", line 138, in callback
child_gr.switch(result)
greenlet.error: cannot switch to a different thread
改为单线程执行, 一次批量1000条 :
[root@localhost ~]# cat t.py
import motor
import tornado
import time
import threading
c = motor.MotorClient('/tmp/mongodb-5281.sock', max_pool_size=1000)
db = c.test_database
db.drop_collection('test_collection')
collection = db.test_collection
@motor.gen.coroutine
def do_count():
n = yield collection.count()
print(str(n) + ' documents in collection')
tornado.ioloop.IOLoop.current().run_sync(do_count)
def my_callback(result, error):
tornado.ioloop.IOLoop.instance().stop()
print(time.time())
class n_t(threading.Thread): #The timer class is derived from the class threading.Thread
def __init__(self, num):
threading.Thread.__init__(self)
self.thread_num = num
def run(self): #Overwrite run() method, put what you want the thread do here
start_t = time.time()
print("TID:" + str(self.thread_num) + " " + str(start_t))
all = 1000000
pice = 1000
for i in range(1,int(all/pice)+1):
if i == 1:
s=0
e=pice
else:
s=pice*(i-1)
e=pice*i
collection.insert(({'id':i, 'username': 'digoal.zhou', 'age':32, 'email':'digoal@126.com', 'qq':'276732431'} for i in range(s,e)), callback=my_callback)
tornado.ioloop.IOLoop.instance().start()
stop_t = time.time()
print("TID:" + str(self.thread_num) + " " + str(stop_t))
print(stop_t-start_t)
tornado.ioloop.IOLoop.current().run_sync(do_count)
def test():
t_names = dict()
for i in range(0,1):
t_names[i] = n_t(i)
t_names[i].start()
return
if __name__ == '__main__':
test()
插入耗时38秒 :
[root@localhost ~]# python t.py
0 documents in collection
1423138288.7702804
TID:0 1423138288.7712061
TID:0 1423138326.7712593
38.00005316734314
1000000 documents in collection
以上相比PostgreSQL 使用pgbench 单步提交测试16秒还有差距.
开16个一起测, 220秒插入1600W.
不删除collection: 注释
#db.drop_collection('test_collection')
mongo 127.0.0.1:5281/test_database
use test_database
db.test_collection.drop()
# vi t.py
import motor
import tornado
import time
import threading
c = motor.MotorClient('/tmp/mongodb-5281.sock', max_pool_size=1000)
db = c.test_database
#db.drop_collection('test_collection')
collection = db.test_collection
@motor.gen.coroutine
def do_count():
n = yield collection.count()
print(str(n) + ' documents in collection')
tornado.ioloop.IOLoop.current().run_sync(do_count)
def my_callback(result, error):
tornado.ioloop.IOLoop.instance().stop()
start_t = time.time()
print(str(start_t))
all = 1000000
pice = 1000
for i in range(1,int(all/pice)+1):
if i == 1:
s=0
e=pice
else:
s=pice*(i-1)
e=pice*i
collection.insert(({'id':i, 'username': 'digoal.zhou', 'age':32, 'email':'digoal@126.com', 'qq':'276732431'} for i in range(s,e)), callback=my_callback)
tornado.ioloop.IOLoop.instance().start()
stop_t = time.time()
print(str(stop_t))
print(stop_t-start_t)
tornado.ioloop.IOLoop.current().run_sync(do_count)
# vi test.sh
for ((m=1;m<17;m++))
do
python /root/t.py &
done
测试结果 :
[root@localhost ~]# . ./test.sh
[root@localhost ~]# 0 documents in collection
1423138761.355559
TID:0 1423138761.3678153
0 documents in collection
0 documents in collection
1423138761.4109175
1423138761.4113982
TID:0 1423138761.4338183
TID:0 1423138761.4408538
1255 documents in collection
1127 documents in collection
1423138761.4774377
1423138761.4775264
2383 documents in collection
1423138761.4832406
TID:0 1423138761.483632
TID:0 1423138761.4897928
TID:0 1423138761.5128014
6506 documents in collection
1423138761.5683684
8037 documents in collection
1423138761.58344
TID:0 1423138761.58417
7550 documents in collection
1423138761.5864558
8932 documents in collection
1423138761.5868194
TID:0 1423138761.5982714
TID:0 1423138761.5984921
TID:0 1423138761.613552
11407 documents in collection
1423138761.620963
TID:0 1423138761.623182
15278 documents in collection
1423138761.6682127
TID:0 1423138761.6716254
17677 documents in collection
17677 documents in collection
17677 documents in collection
1423138761.713706
1423138761.7137444
17677 documents in collection
1423138761.7138374
1423138761.7139668
TID:0 1423138761.7142007
TID:0 1423138761.714226
TID:0 1423138761.714233
TID:0 1423138761.714348
TID:0 1423138980.550259
218.92707705497742
15900545 documents in collection
TID:0 1423138980.9974186
219.56360030174255
15932365 documents in collection
TID:0 1423138981.2562747
219.54192662239075
15951346 documents in collection
TID:0 1423138981.4110239
219.81253170967102
15961596 documents in collection
TID:0 1423138981.4463546
219.96272253990173
15964366 documents in collection
TID:0 1423138981.4915378
219.90736770629883
15967637 documents in collection
TID:0 1423138981.540123
220.17230772972107
15970868 documents in collection
TID:0 1423138981.6574204
220.05914902687073
15979686 documents in collection
TID:0 1423138981.681502
220.00987672805786
15981430 documents in collection
TID:0 1423138981.7540042
220.03977823257446
15985686 documents in collection
TID:0 1423138981.792348
220.3514940738678
15988377 documents in collection
TID:0 1423138981.8279915
220.315190076828
15991149 documents in collection
TID:0 1423138981.8720205
220.38222765922546
15994000 documents in collection
TID:0 1423138981.874455
220.26090288162231
15994000 documents in collection
TID:0 1423138981.9668252
220.25259232521057
15998000 documents in collection
TID:0 1423138982.0496628
220.33546209335327
16000000 documents in collection
[1] Done /usr/local/bin/python3 /root/t.py
[2] Done /usr/local/bin/python3 /root/t.py
[3] Done /usr/local/bin/python3 /root/t.py
[4] Done /usr/local/bin/python3 /root/t.py
[5] Done /usr/local/bin/python3 /root/t.py
[6] Done /usr/local/bin/python3 /root/t.py
[7] Done /usr/local/bin/python3 /root/t.py
[8] Done /usr/local/bin/python3 /root/t.py
[9] Done /usr/local/bin/python3 /root/t.py
[10] Done /usr/local/bin/python3 /root/t.py
[11] Done /usr/local/bin/python3 /root/t.py
[12] Done /usr/local/bin/python3 /root/t.py
[13] Done /usr/local/bin/python3 /root/t.py
[14] Done /usr/local/bin/python3 /root/t.py
[15]- Done /usr/local/bin/python3 /root/t.py
[16]+ Done /usr/local/bin/python3 /root/t.py
将PostgreSQL改为批量提交测试结果42秒插入1600W .
postgres@localhost-> pgbench -M prepared -n -r -f ./test.sql -c 16 -j 4 -t 50000
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 16
number of threads: 4
number of transactions per client: 50000
number of transactions actually processed: 800000/800000
tps = 18835.837413 (including connections establishing)
tps = 18840.926458 (excluding connections establishing)
statement latencies in milliseconds:
0.842201 insert into tt values(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431');
postgres@localhost-> psql
psql (9.3.5)
Type "help" for help.
postgres=# select count(*) from tt;
count
----------
16000000
(1 row)
postgres=# select 16000000/20/18835.837413;
?column?
---------------------
42.4722289993786537
(1 row)
[参考]
http://motor.readthedocs.org/en/stable/index.html
http://motor.readthedocs.org/en/stable/tutorial.html
时间: 2024-09-20 14:54:56