Motor: Asynchronous Python driver for Tornado and MongoDB

使用pymongo驱动连接mongodb插入100W记录, 需要364秒. 性能较差, 原因是pymongo未使用异步接口.

motor驱动使用异步接口, 性能有所提升.

安装motor驱动, 可以使用pip快速安装, 解决依赖问题. pip是一个脚本, 可以自动从PyPI安装module. 如下

# cat /usr/local/bin/pip3.4
#!/usr/local/bin/python3.4

# -*- coding: utf-8 -*-
import re
import sys

from pip import main

if __name__ == '__main__':
    sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
    sys.exit(main())

安装motor

[root@localhost bin]# pip3.4 install motor

或
[root@localhost bin]# pip3.4 install motor --upgrade

motor的使用手册如下

http://motor.readthedocs.org/en/stable/index.html

http://motor.readthedocs.org/en/stable/tutorial.html

修改测试脚本 : 

使用批量写入.

[root@localhost ~]# cat t.py
import motor
import tornado
import time
import threading

c = motor.MotorClient('/tmp/mongodb-5281.sock', max_pool_size=1000)
db = c.test_database
db.drop_collection('test_collection')
collection = db.test_collection

@motor.gen.coroutine
def do_count():
  n = yield collection.count()
  print(str(n) + ' documents in collection')

tornado.ioloop.IOLoop.current().run_sync(do_count)

def my_callback(result, error):
  tornado.ioloop.IOLoop.instance().stop()

print(time.time())

class n_t(threading.Thread):   #The timer class is derived from the class threading.Thread
  def __init__(self, num):
    threading.Thread.__init__(self)
    self.thread_num = num

  def run(self): #Overwrite run() method, put what you want the thread do here
    start_t = time.time()
    print("TID:" + str(self.thread_num) + " " + str(start_t))

    collection.insert(({'id':i, 'username': 'digoal.zhou', 'age':32, 'email':'digoal@126.com', 'qq':'276732431'} for i in range(0,100000)), callback=my_callback)
    tornado.ioloop.IOLoop.instance().start()

    stop_t = time.time()
    print("TID:" + str(self.thread_num) + " " + str(stop_t))
    print(stop_t-start_t)

def test():
  t_names = dict()
  for i in range(0,8):
    t_names[i] = n_t(i)
    t_names[i].start()
  return

if __name__ == '__main__':
  test()

报了一堆错误, 除了0号线程未报错, 其他线程都报类似以下错误 : 

Exception in thread Thread-5:
Traceback (most recent call last):
  File "/usr/local/lib/python3.4/threading.py", line 921, in _bootstrap_inner
    self.run()
  File "t.py", line 33, in run
    tornado.ioloop.IOLoop.instance().start()
  File "/usr/local/lib/python3.4/site-packages/tornado/ioloop.py", line 704, in start
    raise RuntimeError("IOLoop is already running")
RuntimeError: IOLoop is already running

ERROR:tornado.application:Uncaught exception, closing connection.
Traceback (most recent call last):
  File "/usr/local/lib/python3.4/site-packages/tornado/iostream.py", line 508, in wrapper
    return callback(*args)
  File "/usr/local/lib/python3.4/site-packages/tornado/stack_context.py", line 275, in null_wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.4/site-packages/motor/__init__.py", line 138, in callback
    child_gr.switch(result)
greenlet.error: cannot switch to a different thread

改为单线程执行, 一次批量1000条 : 

[root@localhost ~]# cat t.py
import motor
import tornado
import time
import threading

c = motor.MotorClient('/tmp/mongodb-5281.sock', max_pool_size=1000)
db = c.test_database
db.drop_collection('test_collection')
collection = db.test_collection

@motor.gen.coroutine
def do_count():
  n = yield collection.count()
  print(str(n) + ' documents in collection')

tornado.ioloop.IOLoop.current().run_sync(do_count)

def my_callback(result, error):
  tornado.ioloop.IOLoop.instance().stop()

print(time.time())

class n_t(threading.Thread):   #The timer class is derived from the class threading.Thread
  def __init__(self, num):
    threading.Thread.__init__(self)
    self.thread_num = num

  def run(self): #Overwrite run() method, put what you want the thread do here
    start_t = time.time()
    print("TID:" + str(self.thread_num) + " " + str(start_t))

    all = 1000000
    pice = 1000
    for i in range(1,int(all/pice)+1):
      if i == 1:
        s=0
        e=pice
      else:
        s=pice*(i-1)
        e=pice*i
      collection.insert(({'id':i, 'username': 'digoal.zhou', 'age':32, 'email':'digoal@126.com', 'qq':'276732431'} for i in range(s,e)), callback=my_callback)
      tornado.ioloop.IOLoop.instance().start()

    stop_t = time.time()
    print("TID:" + str(self.thread_num) + " " + str(stop_t))
    print(stop_t-start_t)
    tornado.ioloop.IOLoop.current().run_sync(do_count)

def test():
  t_names = dict()
  for i in range(0,1):
    t_names[i] = n_t(i)
    t_names[i].start()
  return

if __name__ == '__main__':
  test()

插入耗时38秒 : 

[root@localhost ~]# python t.py
0 documents in collection
1423138288.7702804
TID:0 1423138288.7712061
TID:0 1423138326.7712593
38.00005316734314
1000000 documents in collection

以上相比PostgreSQL 使用pgbench 单步提交测试16秒还有差距. 

开16个一起测, 220秒插入1600W.

不删除collection: 注释
#db.drop_collection('test_collection')
mongo 127.0.0.1:5281/test_database
use test_database
db.test_collection.drop()

# vi t.py
import motor
import tornado
import time
import threading

c = motor.MotorClient('/tmp/mongodb-5281.sock', max_pool_size=1000)
db = c.test_database
#db.drop_collection('test_collection')
collection = db.test_collection

@motor.gen.coroutine
def do_count():
  n = yield collection.count()
  print(str(n) + ' documents in collection')

tornado.ioloop.IOLoop.current().run_sync(do_count)

def my_callback(result, error):
  tornado.ioloop.IOLoop.instance().stop()

start_t = time.time()
print(str(start_t))

all = 1000000
pice = 1000
for i in range(1,int(all/pice)+1):
  if i == 1:
    s=0
    e=pice
  else:
    s=pice*(i-1)
    e=pice*i
  collection.insert(({'id':i, 'username': 'digoal.zhou', 'age':32, 'email':'digoal@126.com', 'qq':'276732431'} for i in range(s,e)), callback=my_callback)
  tornado.ioloop.IOLoop.instance().start()

stop_t = time.time()
print(str(stop_t))
print(stop_t-start_t)
tornado.ioloop.IOLoop.current().run_sync(do_count)

# vi test.sh
for ((m=1;m<17;m++))
do
  python /root/t.py &
done

测试结果 :
[root@localhost ~]# . ./test.sh
[root@localhost ~]# 0 documents in collection
1423138761.355559
TID:0 1423138761.3678153
0 documents in collection
0 documents in collection
1423138761.4109175
1423138761.4113982
TID:0 1423138761.4338183
TID:0 1423138761.4408538
1255 documents in collection
1127 documents in collection
1423138761.4774377
1423138761.4775264
2383 documents in collection
1423138761.4832406
TID:0 1423138761.483632
TID:0 1423138761.4897928
TID:0 1423138761.5128014
6506 documents in collection
1423138761.5683684
8037 documents in collection
1423138761.58344
TID:0 1423138761.58417
7550 documents in collection
1423138761.5864558
8932 documents in collection
1423138761.5868194
TID:0 1423138761.5982714
TID:0 1423138761.5984921
TID:0 1423138761.613552
11407 documents in collection
1423138761.620963
TID:0 1423138761.623182
15278 documents in collection
1423138761.6682127
TID:0 1423138761.6716254
17677 documents in collection
17677 documents in collection
17677 documents in collection
1423138761.713706
1423138761.7137444
17677 documents in collection
1423138761.7138374
1423138761.7139668
TID:0 1423138761.7142007
TID:0 1423138761.714226
TID:0 1423138761.714233
TID:0 1423138761.714348
TID:0 1423138980.550259
218.92707705497742
15900545 documents in collection
TID:0 1423138980.9974186
219.56360030174255
15932365 documents in collection
TID:0 1423138981.2562747
219.54192662239075
15951346 documents in collection
TID:0 1423138981.4110239
219.81253170967102
15961596 documents in collection
TID:0 1423138981.4463546
219.96272253990173
15964366 documents in collection
TID:0 1423138981.4915378
219.90736770629883
15967637 documents in collection
TID:0 1423138981.540123
220.17230772972107
15970868 documents in collection
TID:0 1423138981.6574204
220.05914902687073
15979686 documents in collection
TID:0 1423138981.681502
220.00987672805786
15981430 documents in collection
TID:0 1423138981.7540042
220.03977823257446
15985686 documents in collection
TID:0 1423138981.792348
220.3514940738678
15988377 documents in collection
TID:0 1423138981.8279915
220.315190076828
15991149 documents in collection
TID:0 1423138981.8720205
220.38222765922546
15994000 documents in collection
TID:0 1423138981.874455
220.26090288162231
15994000 documents in collection
TID:0 1423138981.9668252
220.25259232521057
15998000 documents in collection
TID:0 1423138982.0496628
220.33546209335327
16000000 documents in collection

[1]   Done                    /usr/local/bin/python3 /root/t.py
[2]   Done                    /usr/local/bin/python3 /root/t.py
[3]   Done                    /usr/local/bin/python3 /root/t.py
[4]   Done                    /usr/local/bin/python3 /root/t.py
[5]   Done                    /usr/local/bin/python3 /root/t.py
[6]   Done                    /usr/local/bin/python3 /root/t.py
[7]   Done                    /usr/local/bin/python3 /root/t.py
[8]   Done                    /usr/local/bin/python3 /root/t.py
[9]   Done                    /usr/local/bin/python3 /root/t.py
[10]   Done                    /usr/local/bin/python3 /root/t.py
[11]   Done                    /usr/local/bin/python3 /root/t.py
[12]   Done                    /usr/local/bin/python3 /root/t.py
[13]   Done                    /usr/local/bin/python3 /root/t.py
[14]   Done                    /usr/local/bin/python3 /root/t.py
[15]-  Done                    /usr/local/bin/python3 /root/t.py
[16]+  Done                    /usr/local/bin/python3 /root/t.py

将PostgreSQL改为批量提交测试结果42秒插入1600W .

postgres@localhost-> pgbench -M prepared -n -r -f ./test.sql -c 16 -j 4 -t 50000
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 16
number of threads: 4
number of transactions per client: 50000
number of transactions actually processed: 800000/800000
tps = 18835.837413 (including connections establishing)
tps = 18840.926458 (excluding connections establishing)
statement latencies in milliseconds:
        0.842201        insert into tt values(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431');
postgres@localhost-> psql
psql (9.3.5)
Type "help" for help.

postgres=# select count(*) from tt;
  count
----------
 16000000
(1 row)
postgres=# select 16000000/20/18835.837413;
      ?column?
---------------------
 42.4722289993786537
(1 row)

[参考]

http://motor.readthedocs.org/en/stable/index.html

http://motor.readthedocs.org/en/stable/tutorial.html

时间: 2024-09-20 14:54:56

Motor: Asynchronous Python driver for Tornado and MongoDB的相关文章

在Python中使用mongoengine操作MongoDB教程

  在Python中使用mongoengine操作MongoDB教程          这篇文章主要介绍了在Python中使用mongoengine操作MongoDB教程,包括在Django下的一些使用技巧,需要的朋友可以参考下 最近重新拾起Django,但是Django并不支持mongodb,但是有一个模块mongoengine可以实现Django Model类似的封装.但是mongoengine的中文文档几乎没有,有的也是简短的几句介绍和使用.下面我就分享一下我在使用过程中所记录下的一些笔记

ubuntu下使用Python连接Mysql数据库和Mongodb数据库

连接Mysql        如果要在ubuntu下使用Python连接Mysql只要两个步骤就ok              第一步: 在终端下输入sudo apt-get install python-mysqldb                第二步: 在终端下测试                 如果可以导入MySQLdb的包说明,可以成功的使用MySQL                第三步:可以写个代码测试一下,这里不做演示 连接mongodb        第一步:在终端下输入s

use python threading multi-thread test PostgreSQL &amp; mongodb insert tps

前面两篇测试了一下python单线程压mongo和PostgreSQL的性能. 相比PostgreSQL的pgbench, python用到的这两个驱动未使用异步接口, 对性能影响极大. http://blog.163.com/digoal@126/blog/static/1638770402015142858224/ http://blog.163.com/digoal@126/blog/static/16387704020151210840303/ 本文使用threading这个模块, 测试

在Python中使用mongoengine操作MongoDB教程_python

最近重新拾起Django,但是Django并不支持mongodb,但是有一个模块mongoengine可以实现Django Model类似的封装.但是mongoengine的中文文档几乎没有,有的也是简短的几句介绍和使用.下面我就分享一下我在使用过程中所记录下的一些笔记,可能有点乱.大家可以参考一下.安装mongoengine easy_install pymongo # 依赖库 easy_install mongoengine 基本使用 from mongoengine import * fr

基于Python生成器的Tornado协程异步

Tornado 4.0 已经发布了很长一段时间了, 新版本广泛的应用了协程(Future)特性. 我们目前已经将 Tornado 升级到最新版本, 而且也大量的使用协程特性. 很长时间没有更新博客, 今天就简单介绍下 Tornado 协程实现原理, Tornado 的协程是基于 Python 的生成器实现的, 所以首先来回顾下生成器. 生成器 Python 的生成器可以保存执行状态 并在下次调用的时候恢复, 通过在函数体内使用 yield 关键字 来创建一个生成器, 通过内置函数 next 或生

Motor 0.3.2 发布,MongoDB 的 Python 驱动

Motor 0.3.2 发布,此版本兼容 http://www.aliyun.com/zixun/aggregation/13461.html">MongoDB 2.2,2.4 和 2.6,最低要求 PyMongo 2.7.1. 此版本修复了在 "copy_database" 方法的 socket 泄漏,重写了 "Let Us Now Praise ResourceWarnings" 里面的问题和 bug. 获得最新版本:pip install --

MongoDB 生态 - 客户端 Driver 支持

工欲善其事,必先利其器,我们在使用数据库时,通常需要各种工具的支持来提高效率:很多新用户在刚接触 MongoDB 时,遇到的问题是『不知道有哪些现成的工具可以使用』,本系列文章将主要介绍 MongoDB 生态在工具.driver.可视化管理等方面的支持情况. 本文主要介绍 MongoDB 对各个语言的客户端(driver)支持情况 MongoDB 官方目前支持10+种语言的客户端,所有官方客户端 driver都支持 MongoDB Connection String URI 的方式去连接,客户端

深入解析Python的Tornado框架中内置的模板引擎_python

template中的_parse方法是模板文法的解析器,而这个文件中一坨一坨的各种node以及block,就是解析结果的承载者,也就是说在经过parse处理过后,我们输入的tornado的html模板就变成了各种block的集合. 这些block和node的祖宗就是这个"抽象"类, _Node,它定义了三个方法定义,其中generate方法是必须由子类提供实现的(所以我叫它"抽象"类).  理论上来说,当一个类成为祖宗类时,必定意味着这个类包含了一些在子类中通用的行

tornado的使用让你的异步请求非阻塞

也许有同学很迷惑:tornado不是标榜异步非阻塞解决10K问题的嘛?但是我却发现不是torando不好,而是你用错了.比如最近发现一个事情:某网站打开页面很慢,服务器cpu/内存都正常.网络状态也良好. 后来发现,打开页面会有很多请求后端数据库的访问,有一个mongodb的数据库业务api的rest服务.但是它的tornado却用错了,一步步的来研究问题: 说明 以下的例子都有2个url,一个是耗时的请求,一个是可以或者说需要立刻返回的请求,我想就算一个对技术不熟,从道理上来说的用户, 他希望