Using PyMongo with Multiprocessing

MongoClient的实例不能从父进程复制到子进程,父进程和每个子进程必须创建各自的MongoClient实例.

示例如:

import multiprocessing
import pymongo

# Each process creates its own instance of MongoClient.
def func():
    db = pymongo.MongoClient().mydb
    # Do something with db.

proc = multiprocessing.Process(target=func)
proc.start()

错误示例:

client = pymongo.MongoClient()

# Each child process attempts to copy a global MongoClient
# created in the parent process. Never do this.
def func():
  db = client.mydb
  # Do something with db.

proc = multiprocessing.Process(target=func)
proc.start()

会出现如下提示:

UserWarning: MongoClient opened before fork. Create MongoClient with connect=False, or create client after forking.

1. 示例

#!/usr/bin/python3
#!--*--coding:utf-8--*--
import time
from tqdm import tqdm
from multiprocessing import Pool
from bson.binary import Binary
from pymongo import MongoClient

client = MongoClient('192.168.1.101:27017', connect=False) #
coll = client['tedt_db']['test_coll']

def convertImgtoBSON(imgfile):
    with open(imgfile, 'rb') as binary_file:
        data = binary_file.read()
    return Binary(data, 0)

def create_object(imgfile):
    bson_content = convertImgtoBSON(imgfile)

    doc = {}
    doc['img'] = bson_content
    try:
        coll.insert_one(doc)
    except:
        return 'fail'
    
    return 'success'

if __name__ == '__main__':
    start = time.time()
    with Pool(processes=36) as pool:
        rets = list(pool.apply_async(create_object, args=(imgfile,)) for imgfile in imgfiles)
        for ret in tqdm(rets):
            status = ret.get()
            if status == 'fail':
                print('[INFO]', status)
    print("[INFO]timecost: ", time.time() - start)
    print('[INFO]Done.')

相关

[1] - The efficient way of using multiprocessing with pymongo - 2018.11.21

Last modification:May 17th, 2021 at 09:32 am