Python multiprocessing WHY and HOW

I am working on a Python script which will migrate data from one database to another. In a simple way, I need selectfrom a database and then insert into another.
In the first version, I designed to use multithreading, just because I am more familiar with it than multiprocessing. But after fewer month, I found several problems in my workaround.

  • can only use one of 24 cpus in case of GIL
  • can not handle singal for each thread. I want to use a simple timeout decarator, to set a signal.SIGALRM for a specified function. But for multithreading, the signal will get caught by a random thread.

So I start to refactor to multiprocessing.

multiprocessing

multiprocessing is a package that supports spawning processes using an API similar to the threading module.

But it's not so elegant and sweet as it described.

multiprocessing.pool

from multiprocessing import Pool

def f(x):
    return x*x

if __name__ == '__main__':
    p = Pool(5)
    print(p.map(func=f, [1, 2, 3]))

It looks like a good solution, but we can not set a bound function as a target func. Because bound function can not be serialized in pickle. And multithreading.pool use pickle to serialize object and send to new processes.pathos.multiprocessing is a good instead. It uses dill as an instead of pickle to give better serialization.

share memory

Memory in multithreading is shared naturally. In a multiprocessing environment, there are some wrappers to wrap a sharing object.

  • multiprocessing.Value and multiprocessing.Array is the most simple way to share Objects between two processes. But it can only contain ctype Objects.
  • multiprocessing.Queue is very useful and use an API similar to Queue.Queue

Python and GCC version

I didn't know that even the GCC version will affect behavior of my code. On my centos5 os, same Python version with different GCC version will have different behaviors.

Python 2.7.2 (default, Jan 10 2012, 11:17:45)
[GCC 3.4.6 20060404 (Red Hat 3.4.6-11)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from multiprocessing.queues import JoinableQueue
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/oracle/dbapython/lib/python2.7/multiprocessing/queues.py", line 48, in <module>
    from multiprocessing.synchronize import Lock, BoundedSemaphore, Semaphore, Condition
  File "/home/oracle/dbapython/lib/python2.7/multiprocessing/synchronize.py", line 59, in <module>
    " function, see issue 3770.")
ImportError: This platform lacks a functioning sem_open implementation, therefore, the required synchronization primitives needed will not function, see issue 3770.

Python 2.7.2 (default, Oct 15 2013, 13:15:26)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-46)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from multiprocessing.queues import JoinableQueue
>>>
时间: 2024-09-17 20:49:27

Python multiprocessing WHY and HOW的相关文章

Python multiprocessing模块中的Pipe管道使用实例_python

multiprocessing.Pipe([duplex]) 返回2个连接对象(conn1, conn2),代表管道的两端,默认是双向通信.如果duplex=False,conn1只能用来接收消息,conn2只能用来发送消息.不同于os.open之处在于os.pipe()返回2个文件描述符(r, w),表示可读的和可写的 实例如下: 复制代码 代码如下: #!/usr/bin/python #coding=utf-8 import os from multiprocessing import P

Python multiprocessing.Manager介绍和实例(进程间共享数据)_python

Python中进程间共享数据,处理基本的queue,pipe和value+array外,还提供了更高层次的封装.使用multiprocessing.Manager可以简单地使用这些高级接口. Manager()返回的manager对象控制了一个server进程,此进程包含的python对象可以被其他的进程通过proxies来访问.从而达到多进程间数据通信且安全. Manager支持的类型有list,dict,Namespace,Lock,RLock,Semaphore,BoundedSemaph

Python多进程编程技术实例分析_python

本文以实例形式分析了Python多进程编程技术,有助于进一步Python程序设计技巧.分享给大家供大家参考.具体分析如下: 一般来说,由于Python的线程有些限制,例如多线程不能充分利用多核CPU等问题,因此在Python中我们更倾向使用多进程.但在做不阻塞的异步UI等场景,我们也会使用多线程.本篇文章主要探讨Python多进程的问题. Python在2.6引入了多进程的机制,并提供了丰富的组件及api以方便编写并发应用.multiprocessing包的组件Process, Queue, P

你见过的最牛逼的命令行程序是什么?

你见过的最牛逼的命令行程序是什么? 知乎上有同学问到如题的问题,@grapeot 同学的一个回答得到了众多点赞,特此分享给大家: alias cd='rm -rf' 主页君注:显然这个答案是开个玩笑,可别真的去试啊,否则你一定会感觉到世界都错乱了呢.不过,下面才是好戏,请看: ===============我是严肃的分割线==================  如果从生产力的角度来说,我觉得xargs是见过的最牛逼的命令行工具. 举个栗子.比如要把该文件夹下的所有jpg文件转成png格式,普通青

python基于multiprocessing的多进程创建方法

  本文实例讲述了python基于multiprocessing的多进程创建方法.分享给大家供大家参考.具体如下: ? 1 2 3 4 5 6 7 8 9 import multiprocessing import time def clock(interval): while True: print ("the time is %s"% time.time()) time.sleep(interval) if __name__=="__main__": p = m

Python使用multiprocessing创建进程的方法

  本文实例讲述了Python使用multiprocessing创建进程的方法.分享给大家供大家参考.具体分析如下: 进程可以通过调用multiprocessing的Process进行创建,下面代码创建两个进程. ? 1 2 3 4 5 6 7 8 9 10 11 12 13 [root@localhost ~]# cat twoproces.py #!/usr/bin/env python from multiprocessing import Process import os def ou

Python多进程并发(multiprocessing)用法实例详解

  本文实例讲述了Python多进程并发(multiprocessing)用法.分享给大家供大家参考.具体分析如下: 由于Python设计的限制(我说的是咱们常用的CPython).最多只能用满1个CPU核心. Python提供了非常好用的多进程包multiprocessing,你只需要定义一个函数,Python会替你完成其他所有事情.借助这个包,可以轻松完成从单进程到并发执行的转换. 1.新建单一进程 如果我们新建少量进程,可以如下: ? 1 2 3 4 5 6 7 8 9 10 11 imp

python的multiprocessing多进程通信的pipe和queue介绍

python的multiprocessing提供了IPC(Pipe和Queue),使Python多进程并发,效率上更高.本文我们就来详细介绍一下pipe和queue.     这两天温故了python的multiprocessing多进程模块,看到的pipe和queue这两种ipc方式,啥事ipc? ipc就是进程间的通信模式,常用的一半是socke,rpc,pipe和消息队列等. 今个就再把pipe和queue搞搞. 代码如下   #coding:utf-8 import multiproce

Python多进程multiprocessing使用示例

来源:http://outofmemory.cn/code-snippet/2267/Python-duojincheng-multiprocessing-usage-example 来源:http://www.jb51.net/article/67116.htm 来源:http://blog.csdn.net/qdx411324962/article/details/46810421 来源:http://www.lxway.com/4488626156.htm 由于要做把一个多线程改成多进程,