本人使用 Django + Celery 搭建了一套测试平台,其中有个需求是运行一个任务,这个任务需要处理几万至上百万张图片,为此使用 celery 进行任务调度和执行。现在的问题是一次任务需要花半小时才能完成 10W 张图片的处理,我想缩短任务执行时间,所以想在 celery 中的一个任务中使用多进程 + 协程的方式,但我在网上没有找到 celery 中使用多进程、协程的示例,官方文档也没有涉及这一块。
我尝试在 celery 直接使用多进程,运行后报错,不通过 celery 是没问题的,以下是代码:

import time

from celery import Celery
from multiprocessing import Pool

app = Celery('tasks', broker='redis://:docserver123456!@172.17.10.175:6379/3')


def func(msg):
    print("*msg: ", msg)
    time.sleep(3)
    print("*end")


@app.task
def add():
    p = Pool(5)
    for i in range(10):
        msg = f'hello str({i})'
        p.apply_async(func, (msg,))
    p.close()
    p.join()
    print('all done')

以下是运行信息,直接调用成功,使用 celery 失败:

[root@localhost ce_test]# python
Python 3.7.8 (default, Jan 26 2021, 15:45:27)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from tasks import add
>>> add()
*msg:  hello str(0)
*msg:  hello str(1)
*msg:  hello str(2)
*msg:  hello str(3)
*msg:  hello str(4)
*end
*end
*end
*msg:  hello str(5)
*end
*msg:  hello str(6)
*msg:  hello str(7)
*end
*msg:  hello str(8)
*msg:  hello str(9)
*end
*end
*end
*end
*end
all done
>>> add.delay()
<AsyncResult: f0bbd061-53b6-44a9-8087-ad08fed4401c>

以下是报错信息:

[root@localhost ce_test]# celery -A tasks worker -l info
/usr/local/python3/lib/python3.7/site-packages/celery/platforms.py:801: RuntimeWarning: You're running the worker with superuser privileges: this is
absolutely not recommended!

Please specify a different user using the --uid option.

User information: uid=0 euid=0 gid=0 egid=0

  uid=uid, euid=euid, gid=gid, egid=egid,

 -------------- celery@localhost.localdomain v4.4.7 (cliffs)
--- ***** -----
-- ******* ---- Linux-3.10.0-1160.11.1.el7.x86_64-x86_64-with-centos-7.9.2009-Core 2021-03-18 15:45:41
- *** --- * ---
- ** ---------- [config]
- ** ---------- .> app:         tasks:0x7fae4a6f1590
- ** ---------- .> transport:   redis://:**@172.17.10.175:6379/3
- ** ---------- .> results:     disabled://
- *** --- * --- .> concurrency: 4 (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** -----
 -------------- [queues]
                .> celery           exchange=celery(direct) key=celery


[tasks]
  . tasks.add

[2021-03-18 15:45:41,779: INFO/MainProcess] Connected to redis://:**@172.17.10.175:6379/3
[2021-03-18 15:45:41,797: INFO/MainProcess] mingle: searching for neighbors
[2021-03-18 15:45:42,895: INFO/MainProcess] mingle: all alone
[2021-03-18 15:45:42,920: INFO/MainProcess] celery@localhost.localdomain ready.
[2021-03-18 15:49:02,260: INFO/MainProcess] Received task: tasks.add[f0bbd061-53b6-44a9-8087-ad08fed4401c]
[2021-03-18 15:49:02,271: ERROR/ForkPoolWorker-1] Task tasks.add[f0bbd061-53b6-44a9-8087-ad08fed4401c] raised unexpected: AssertionError('daemonic processes are not allowed to have children')
Traceback (most recent call last):
  File "/usr/local/python3/lib/python3.7/site-packages/celery/app/trace.py", line 412, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/usr/local/python3/lib/python3.7/site-packages/celery/app/trace.py", line 704, in __protected_call__
    return self.run(*args, **kwargs)
  File "/home/ce_test/tasks.py", line 17, in add
    p = Pool(5)
  File "/usr/local/python3/lib/python3.7/multiprocessing/context.py", line 119, in Pool
    context=self.get_context())
  File "/usr/local/python3/lib/python3.7/multiprocessing/pool.py", line 176, in __init__
    self._repopulate_pool()
  File "/usr/local/python3/lib/python3.7/multiprocessing/pool.py", line 241, in _repopulate_pool
    w.start()
  File "/usr/local/python3/lib/python3.7/multiprocessing/process.py", line 110, in start
    'daemonic processes are not allowed to have children'
AssertionError: daemonic processes are not allowed to have children

有哪位大哥能帮忙解答下吗?😀


↙↙↙阅读原文可查看相关链接,并与作者交流