从小白到架构师 pyinstaller 打包机器学习库挖坑集

易寒 · 2019年07月16日 · 1620 次阅读

参考文档Recipe Multiprocessing

背景

之前调研的pyinstaller打包bin的方案进入落地阶段,之前调研文章见利用 pyinstaller 打包 python 项目发布到线上。之前实验的对象是个很简单的 web 服务,没有过多的依赖其他包,这次落地的项目里面使用了很多的机器学习库,所以落地过程中还是稍显麻烦。

问题

  • pyd 文件引入问题
  • .so 文件引入问题
  • multiprocessing 和 pyinstaller 冲突问题

下面一一来说

pyd 文件引入问题

pipenv run pyinstaller -F main.py -n scscore

打包成功后,生成了一个 spec 文件,执行程序,报错

[doctorq@gz-inf-development01 scscore]$ ./dist/scscore
/tmp/_MEINtWbir/sklearn/externals/joblib/__init__.py:15: DeprecationWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
Traceback (most recent call last):
  File "main.py", line 8, in <module>
    from src.route import load_route
  File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
    exec(bytecode, module.__dict__)
  File "src/route.py", line 6, in <module>
    from src.view.forecast_view.feature_importance_view import FeatureImportanceView
  File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
    exec(bytecode, module.__dict__)
  File "src/view/forecast_view/feature_importance_view.py", line 7, in <module>
    from src.importance.feature_importance import FeatureImportance
  File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
    exec(bytecode, module.__dict__)
  File "src/importance/feature_importance.py", line 7, in <module>
    from src.forecasting.trainer import Trainer
  File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
    exec(bytecode, module.__dict__)
  File "src/forecasting/trainer.py", line 13, in <module>
    from src.Models.collect_models import ModelCollector
  File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
    exec(bytecode, module.__dict__)
  File "src/Models/collect_models.py", line 7, in <module>
    from src.Models.statistic_model.ARIMA import ARIMA
  File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
    exec(bytecode, module.__dict__)
  File "src/Models/statistic_model/ARIMA.py", line 2, in <module>
    from pmdarima.arima import auto_arima
  File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
    exec(bytecode, module.__dict__)
  File "site-packages/pmdarima/__init__.py", line 29, in <module>
  File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
    exec(bytecode, module.__dict__)
  File "site-packages/pmdarima/arima/__init__.py", line 6, in <module>
  File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
    exec(bytecode, module.__dict__)
  File "site-packages/pmdarima/arima/arima.py", line 10, in <module>
  File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
    exec(bytecode, module.__dict__)
  File "site-packages/sklearn/metrics/__init__.py", line 36, in <module>
  File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
    exec(bytecode, module.__dict__)
  File "site-packages/sklearn/metrics/cluster/__init__.py", line 20, in <module>
  File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
    exec(bytecode, module.__dict__)
  File "site-packages/sklearn/metrics/cluster/unsupervised.py", line 16, in <module>
  File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
    exec(bytecode, module.__dict__)
  File "site-packages/sklearn/metrics/pairwise.py", line 32, in <module>
  File "sklearn/metrics/pairwise_fast.pyx", line 1, in init sklearn.metrics.pairwise_fast
ModuleNotFoundError: No module named 'sklearn.utils._cython_blas'
[6625] Failed to execute script main

这些文件是 c/c++ 编译成的 python 库,供 python 调用,需要额外处理,处理逻辑就是把这些库按个加到 scscore.spec 文件中的 hiddenimports 属性中,我是把各个库下面的里的cpython关键字的文件都加上了

[doctorq@gz-inf-development01 utils]$ ll|grep cpython
-rwxrwxr-x 1 doctorq doctorq 221256 7月  16 15:41 arrayfuncs.cpython-37m-x86_64-linux-gnu.so
-rwxrwxr-x 1 doctorq doctorq 426280 7月  16 15:41 _cython_blas.cpython-37m-x86_64-linux-gnu.so
-rwxrwxr-x 1 doctorq doctorq 238344 7月  16 15:41 fast_dict.cpython-37m-x86_64-linux-gnu.so
-rwxrwxr-x 1 doctorq doctorq  95824 7月  16 15:41 graph_shortest_path.cpython-37m-x86_64-linux-gnu.so
-rwxrwxr-x 1 doctorq doctorq  28512 7月  16 15:41 lgamma.cpython-37m-x86_64-linux-gnu.so
-rwxrwxr-x 1 doctorq doctorq 179592 7月  16 15:41 _logistic_sigmoid.cpython-37m-x86_64-linux-gnu.so
-rwxrwxr-x 1 doctorq doctorq  88856 7月  16 15:41 murmurhash.cpython-37m-x86_64-linux-gnu.so
-rwxrwxr-x 1 doctorq doctorq  99864 7月  16 15:41 _random.cpython-37m-x86_64-linux-gnu.so
-rwxrwxr-x 1 doctorq doctorq 140992 7月  16 15:41 seq_dataset.cpython-37m-x86_64-linux-gnu.so
-rwxrwxr-x 1 doctorq doctorq 643648 7月  16 15:41 sparsefuncs_fast.cpython-37m-x86_64-linux-gnu.so
-rwxrwxr-x 1 doctorq doctorq  62880 7月  16 15:41 weight_vector.cpython-37m-x86_64-linux-gnu.so

添加后的 spec 文件如下:

hiddenimports=['cython','sklearn','sklearn.utils._cython_blas','statsmodels','statsmodels.tsa'
'statsmodels.tsa.statespace._kalman_smoother',
'statsmodels.tsa.statespace._representation',
'statsmodels.tsa.statespace._simulation_smoother',
'statsmodels.tsa.statespace._statespace',
'statsmodels.tsa.statespace._tools',
'statsmodels.tsa.statespace._filters._conventional',
'statsmodels.tsa.statespace._filters._inversions',
'statsmodels.tsa.statespace._filters._univariate',
'statsmodels.tsa.statespace._smoothers._alternative',
'statsmodels.tsa.statespace._smoothers._classical',
'statsmodels.tsa.statespace._smoothers._conventional',
'statsmodels.tsa.statespace._smoothers._univariate',
'sklearn.neighbors.typedefs',
'sklearn.neighbors.quad_tree',
'sklearn.neighbors.ball_tree',
'sklearn.neighbors.dist_metrics',
'sklearn.neighbors.kd_tree',
'sklearn.tree._utils',
'sklearn.tree._criterion',
'sklearn.tree._splitter',
'sklearn.tree._utils',

然后我们再编译,所依赖的这种类型的库,都集成进去了。

> pipenv run pyinstaller scscore.spec # 从spec文件安装
> dist/scscore

  File "site-packages/xgboost/__init__.py", line 11, in <module>
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
    exec(bytecode, module.__dict__)
  File "site-packages/xgboost/core.py", line 161, in <module>
  File "site-packages/xgboost/core.py", line 123, in _load_lib
  File "site-packages/xgboost/libpath.py", line 48, in find_lib_path
xgboost.libpath.XGBoostLibraryNotFound: Cannot find XGBoost Library in the candidate path, did you install compilers and run build.sh in root path?
List of candidates:
/tmp/_MEIdNDrR6/xgboost/libxgboost.so
/tmp/_MEIdNDrR6/xgboost/../../lib/libxgboost.so
/tmp/_MEIdNDrR6/xgboost/./lib/libxgboost.so
/tmp/_MEIdNDrR6/xgboost/libxgboost.so
[32970] Failed to execute script main

.so 文件引入问题

上面的报错主要都是 xgboost 的动态连接库的问题,该问题解决方法就是在${pipenv --venv}/lib/python3.7/site-packages/PyInstaller/hooks下新增一个文件hook-xgboost.py,文件名严格要求,文件内容如下:

from PyInstaller.utils.hooks import collect_all

datas, binaries, hiddenimports = collect_all("xgboost")

然后再运行打包

> pipenv run pyinstaller --clean scscore.spec
> ./dist/scscore

然后执行会出现如下情况,一直在启动,不能停~~

出现这个问题是因为 joblib 库的一个 bug,见文章Pyinstaller exe keeps opening itself,只需要把 joblib 降级到 0.11 就行了。

> pipenv install joblib==0.11
> pipenv run pyinstaller --clean scscore.spec
> ./dist/scscore

搞定

通过以下配置将程序临时文件存到其他地方,防止打爆/tmp 文件

runtime_tmpdir='/home/doctorq/python-dev/scscore/tmp',
如果觉得我的文章对您有用,请随意打赏。您的支持将鼓励我继续创作!
暂无回复。
需要 登录 后方可回复, 如果你还没有账号请点击这里 注册