Ubuntu 16.04 系统(解释器为 python3.12)在Pycharm虚拟环境中安装 pyspider 爬虫工具
一:安装步骤
步骤1. 系统Terminal命令行执行如下命令安装依赖的组件 PhantomJS
$ wget https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-2.1.1-linux-x86_64.tar.bz2
$ sudo tar -xvf phantomjs-2.1.1-linux-x86_64.tar.bz2 -C /usr/local/
$ sudo ln -s /usr/local/phantomjs-2.1.1-linux-x86_64/bin/phantomjs /usr/local/bin/phantomjs
$ phantomjs --version
步骤2. 系统Terminal命令行执行如下命令安装其他依赖包
$ sudo apt update
$ sudo apt install libcurl4-openssl-dev backports
步骤3. pyspider 依赖的 tornado 库在 Python 3.12 环境下需要 backports.ssl_match_hostname 模块,而 pyspider 尚未完全适配这些改动。需通过pycharm的Terminal中执行如下命令解决
$ pip install backports.ssl_match_hostname
步骤4(可选). pyspider 在 Python 3.12 环境中运行时存在兼容性问题。附件中是已修复好兼容问题的终版压缩包[ubuntu16.04(python3.12解释器)下pyspider兼容性修复完后的.tar.gz],可以直接解压使用。然后进行步骤6(前提是先通过pip install强制安装了异常版本 pip install --force-reinstall pyspider),如果想自己一步步修改源码,可跳过此步,参考"步骤5.兼容问题修复"
点击下载压缩包:
步骤5. 兼容问题修复。pyspider 在 Python 3.12 环境中运行时存在兼容性问题。这是由于 Python 3.12 对部分旧模块进行了移除或者调整,而 pyspider 尚未完全适配这些改动,需要做如下修改
5.1 兼容问题1:Python3 中的 async 已经变成了关键字。需将源码中 async 替换成其他变量,
如: 将下面位置的 async 改为 mark_async
pyspider/run.py 的231行、245行(两个)、365行
pyspider/webui/app.py 的95行
pyspider/fetcher/tornado_fetcher.py 的81行、89行(两个)、95行、117行
5.2 兼容问题2:部分旧模块进行了移除或者调整,而 pyspider 尚未完全适配这些改动。需通过如下步骤手动修改pyspider代码来解决兼容性问题
a). 修复 UserDict 和 Mapping 导入问题:
把 pyspider/libs/counter.py 文件里的python代码第14行:
try:
from UserDict import DictMixin
except ImportError:
from collections import Mapping as DictMixin
改成:
try:
from collections import UserDict as DictMixin
except ImportError:
from collections.abc import Mapping as DictMixin
把 pyspider/scheduler/task_queue.py 文件里的python代码第12行
try:
from UserDict import DictMixin
except ImportError:
from collections import Mapping as DictMixin
改成:
try:
from collections import UserDict as DictMixin
except ImportError:
from collections.abc import Mapping as DictMixin
b). 修复 imp 模块缺失问题:
把 pyspider/processor/project_module.py 文件中的python代码第11行:
import imp
改成:
import importlib.util
c). 修复 MutableMapping 导入问题:
tornado 库引用MutableMapping出现错误,可修改 tornado/httputil.py 文件中的python代码第106行:
class HTTPHeaders(collections.MutableMapping):
改成:class HTTPHeaders(collections.abc.MutableMapping):
5.3 兼容问题3:pyspider 在 Python 3.12 环境下运行时存在兼容性问题,fractions模块已被移除,而 pyspider 尚未完全适配这些改动。修改如下将 fractions 替换成 matha). pyspider/libs/base_handler.py的python代码第12行空白行新增
import math
b).修改 pyspider/libs/base_handler.py的python代码将其中的第115行
min_tick = fractions.gcd(min_tick, each.tick)
改成:min_tick = math.gcd(min_tick, each.tick)
5.4 兼容问题4:pyspider 使用的 Flask 版本不兼容。pyspider 是基于 Flask 旧版本开发的,而新版本(如 Flask 2.3+)移除了 before_first_request 装饰器,需进行如下修改
修改 pyspider/webui/debug.py的python代码将其中的第 64 行
@app.before_first_request
def enable_projects_import():
sys.meta_path.append(ProjectFinder(app.config['projectdb']))
改成:
@app.before_request
def enable_projects_import():
if not hasattr(app, '_got_first_request'):
app._got_first_request = True
sys.meta_path.append(ProjectFinder(app.config['projectdb']))
5.5 兼容问题5:pyspider 的 WebDAV 模块在 Python 3.12 环境下存在兼容性问题。
a) 在 Python 3.12 中,抽象基类(ABC)的检查变得更加严格,ScriptProvider 类没有实现其基类要求的所有抽象方法。修改如下
修改 pyspider/webui/webdav.py的python代码第165行
class ScriptProvider(DAVProvider):
def __init__(self, app):
super(ScriptProvider, self).__init__()
self.app = app
def __repr__(self):
return "pyspiderScriptProvider"
def getResourceInst(self, path, environ):
path = os.path.normpath(path).replace('\\', '/')
if path in ('/', '.', ''):
path = '/'
return RootCollection(path, environ, self.app)
else:
return ScriptResource(path, environ, self.app)
改为:
class ScriptProvider(DAVProvider):
def __init__(self, app):
super(ScriptProvider, self).__init__()
self.app = app
def __repr__(self):
return "pyspiderScriptProvider"
def getResourceInst(self, path, environ):
path = os.path.normpath(path).replace('\\', '/')
if path in ('/', '.', ''):
path = '/'
return RootCollection(path, environ, self.app)
else:
return ScriptResource(path, environ, self.app)
# 添加缺失的抽象方法实现
def get_resource_inst(self, path, environ):
return ScriptResource(path, self, environ)
b) WsgiDAV 库的配置格式发生了改变,domaincontroller 选项已被弃用,需要使用 http_authenticator.domain_controller 替代。修改如下
修改 pyspider/webui/webdav.py的python代码第207行
config = DEFAULT_CONFIG.copy()
config.update({
'mount_path': '/dav',
'provider_mapping': {
'/': ScriptProvider(app)
},
'domaincontroller': NeedAuthController(app),
'verbose': 1 if app.debug else 0,
'dir_browser': {'davmount': False,
'enable': True,
'msmount': False,
'response_trailer': ''},
})
dav_app = WsgiDAVApp(config)
改成:
config = DEFAULT_CONFIG.copy()
config.update({
'mount_path': '/dav',
'provider_mapping': {
'/': ScriptProvider(app)
},
# 更新认证配置
"http_authenticator": {
"domain_controller": NeedAuthController, # 移动到 http_authenticator 下
"accept_basic": True,
"accept_digest": False,
"default_to_digest": False,
},
'verbose': 1 if app.debug else 0,
'dir_browser': {'davmount': False,
'enable': True,
'msmount': False,
'response_trailer': ''},
})
dav_app = WsgiDAVApp(config)
c) WsgiDAV 的认证控制器接口发生变化,存在兼容性问题,修改如下
修改 pyspider/webui/webdav.py的python代码第186行:
class NeedAuthController(object):
def __init__(self, app):
self.app = app
def getDomainRealm(self, inputRelativeURL, environ):
return 'need auth'
def requireAuthentication(self, realmname, environ):
return self.app.config.get('need_auth', False)
def isRealmUser(self, realmname, username, environ):
return username == self.app.config.get('webui_username')
def getRealmUserPassword(self, realmname, username, environ):
return self.app.config.get('webui_password')
def authDomainUser(self, realmname, username, password, environ):
return username == self.app.config.get('webui_username') \
and password == self.app.config.get('webui_password')
改成:
class NeedAuthController(object):
def __init__(self, app, config=None):
self.app = app
# 处理额外的config参数,使其兼容WsgiDAV的初始化方式
if config is not None:
self.config = config
else:
# 如果config未提供,尝试从app中获取
self.config = app.config.get("http_authenticator", {})
def getDomainRealm(self, inputRelativeURL, environ):
return 'need auth'
def requireAuthentication(self, realmname, environ):
return self.app.config.get('need_auth', False)
def isRealmUser(self, realmname, username, environ):
return username == self.app.config.get('webui_username')
def getRealmUserPassword(self, realmname, username, environ):
return self.app.config.get('webui_password')
def authDomainUser(self, realmname, username, password, environ):
return username == self.app.config.get('webui_username') \
and password == self.app.config.get('webui_password')
# 添加WsgiDAV期望的接口方法,转发到原有方法
def get_domain_realm(self, input_path, environ):
return self.getDomainRealm(input_path, environ)
def basic_auth_user(self, realm, user_name, password, environ):
return self.authDomainUser(realm, user_name, password, environ)
def supports_http_digest_auth(self):
return False # 我们不支持摘要认证
def is_share_anonymous(self, share_path):
"""检查指定的共享路径是否允许匿名访问"""
# 如果不需要认证,则所有共享都允许匿名访问
return not self.app.config.get('need_auth', False)
5.6 兼容问题6:Werkzeug 库版本与 pyspider 不兼容。从 Python 3.12 开始,Werkzeug v2.3.0 及以上版本已经移除了DispatcherMiddleware,将其移至独立的werkzeug.middleware.dispatcher模块中。
而 pyspider 仍在使用旧的导入方式。修改如下
修改 pyspider/webui/app.py的python代码第64行
from werkzeug.wsgi import DispatcherMiddleware
改成:
from werkzeug.middleware.dispatcher import DispatcherMiddleware
步骤6. pycharm的Terminal中执行如下命令安装pyspider
$ pip install pyspider
步骤7. Pycharm的Terminal命令行执行如下命令,验证pyspider是否安装成功,打印所示当前安装的pyspider最新版本是 0.3.10
$ pyspider --version
pyspider, version 0.3.10
步骤8. pycharm的Terminal命令行执行 pyspider 启动 pyspider 网页端控制台,如下打印结果表示成功启动 pyspider,并且启用了5000端口。浏览器可以访问 http://localhost:5000/ 进入PySpider网页控制台爬数据了。
$ pyspider phantomjs fetcher running on port 25555 [I 250515 15:08:09 result_worker:49] result_worker starting... [I 250515 15:08:10 processor:211] processor starting... [I 250515 15:08:10 tornado_fetcher:638] fetcher starting... [I 250515 15:08:10 scheduler:647] scheduler starting... [I 250515 15:08:10 scheduler:782] scheduler.xmlrpc listening on 127.0.0.1:23333 [I 250515 15:08:10 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0 [I 250515 15:08:10 app:76] webui running on 0.0.0.0:5000
步骤9. 如果运行 pyspider 命令,出现错误Error: Could not create web server listening on port 25555,原因是25555端口被占用,需要释放端口重新执行步骤8
解决方案: 使用 lsof -i 25555查看端口被哪个PID占用,然后用 kill -9 <PID> 释放端口
$ lsof -i :25555
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
phantomjs 16615 wanghu 7u IPv4 278700 0t0 TCP *:25555 (LISTEN)
$ kill -9 16615
二:pyspider踩坑过程与解决方案:
1.安装 pyspider 时出现了如下的 ConfigurationError: Could not run curl-config 错误,这是因为系统缺少 curl-devel 或 libcurl 开发库,而 pycurl 依赖这些库来编译。
解决方案:使用如下命令安装 libcurl4-openssl-dev 包(Ubuntu/Debian 系统),它包含了编译 pycurl 所需的 curl-config 工具和头文件
sudo apt install libcurl4-openssl-dev
错误详情:
Getting requirements to build wheel ... error
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> [33 lines of output]
Traceback (most recent call last):
File "<string>", line 230, in configure_unix
File "/usr/local/lib/python3.12/subprocess.py", line 1026, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/usr/local/lib/python3.12/subprocess.py", line 1950, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'curl-config'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 389, in <module>
main()
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 373, in main
json_out["return_val"] = hook(**hook_input["kwargs"])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 143, in get_requires_for_build_wheel
return hook(config_settings)
^^^^^^^^^^^^^^^^^^^^^
File "/tmp/pip-build-env-6by6cyoh/overlay/lib/python3.12/site-packages/setuptools/build_meta.py", line 331, in get_requires_for_build_wheel
return self._get_build_requires(config_settings, requirements=[])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/pip-build-env-6by6cyoh/overlay/lib/python3.12/site-packages/setuptools/build_meta.py", line 301, in _get_build_requires
self.run_setup()
File "/tmp/pip-build-env-6by6cyoh/overlay/lib/python3.12/site-packages/setuptools/build_meta.py", line 512, in run_setup
super().run_setup(setup_script=setup_script)
File "/tmp/pip-build-env-6by6cyoh/overlay/lib/python3.12/site-packages/setuptools/build_meta.py", line 317, in run_setup
exec(code, locals())
File "<string>", line 1016, in <module>
File "<string>", line 676, in get_extension
File "<string>", line 93, in __init__
File "<string>", line 235, in configure_unix
ConfigurationError: Could not run curl-config: [Errno 2] No such file or directory: 'curl-config'
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.
2. 安装完pyspider后在Terminal中输入 pyspider命令运行后出现如下错误,是因为Python3 中的 async 已经变成了关键字。
解决方案:将 async 替换成其他变量,如 将下面位置的 async 改为 mark_async
pyspider/run.py 的231行、245行(两个)、365行
pyspider/webui/app.py 的95行
pyspider/fetcher/tornado_fetcher.py 的81行、89行(两个)、95行、117行
错误详情:
$ pyspider
Traceback (most recent call last):
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/bin/pyspider", line 5, in <module>
from pyspider.run import main
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/run.py", line 231
async=True, get_object=False, no_input=False):
^^^^^
SyntaxError: invalid syntax
3. 再次尝试运行 pyspider 命令,出现如下错误,是因为pyspider 在 Python 3.12 环境中运行时存在兼容性问题。这是由于 Python 3.12 对部分旧模块进行了移除或者调整,而 pyspider 尚未完全适配这些改动。
解决方案:通过如下步骤手动修改pyspider代码来解决兼容性问题
a). 修复 UserDict 和 Mapping 导入问题:
把 pyspider/libs/counter.py 文件里的python代码第14行:
try:
from UserDict import DictMixin
except ImportError:
from collections import Mapping as DictMixin
改成:
try:
from collections import UserDict as DictMixin
except ImportError:
from collections.abc import Mapping as DictMixin
把 pyspider/scheduler/task_queue.py 文件里的python代码第12行
try:
from UserDict import DictMixin
except ImportError:
from collections import Mapping as DictMixin
改成:
try:
from collections import UserDict as DictMixin
except ImportError:
from collections.abc import Mapping as DictMixin
b). 修复 imp 模块缺失问题:
把 pyspider/processor/project_module.py 文件中的python代码第11行:
import imp
改成:
import importlib.util
c). 修复 MutableMapping 导入问题:
tornado 库引用MutableMapping出现错误,可修改 tornado/httputil.py 文件中的python代码第106行:
class HTTPHeaders(collections.MutableMapping):
改成:
class HTTPHeaders(collections.abc.MutableMapping):
错误详情:$ pyspider
[W 250515 11:30:54 run:413] phantomjs not found, continue running without it.
[I 250515 11:30:56 result_worker:49] result_worker starting...
Process Process-5:
Traceback (most recent call last):
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/libs/counter.py", line 14, in <module>
from UserDict import DictMixin
ModuleNotFoundError: No module named 'UserDict'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/local/lib/python3.12/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 794, in invoke
return callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/decorators.py", line 34, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/run.py", line 192, in scheduler
Scheduler = load_cls(None, None, scheduler_cls)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/run.py", line 48, in load_cls
return utils.load_object(value)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/libs/utils.py", line 369, in load_object
module = __import__(module_name, globals(), locals(), [object_name])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/scheduler/__init__.py", line 1, in <module>
from .scheduler import Scheduler, OneScheduler, ThreadBaseScheduler # NOQA
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/scheduler/scheduler.py", line 19, in <module>
from pyspider.libs import counter, utils
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/libs/counter.py", line 16, in <module>
from collections import Mapping as DictMixin
ImportError: cannot import name 'Mapping' from 'collections' (/usr/local/lib/python3.12/collections/__init__.py)
Process Process-4:
Traceback (most recent call last):
File "/usr/local/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/local/lib/python3.12/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 794, in invoke
return callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/decorators.py", line 34, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/run.py", line 236, in fetcher
Fetcher = load_cls(None, None, fetcher_cls)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/run.py", line 48, in load_cls
return utils.load_object(value)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/libs/utils.py", line 369, in load_object
module = __import__(module_name, globals(), locals(), [object_name])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/fetcher/__init__.py", line 1, in <module>
from .tornado_fetcher import Fetcher
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/fetcher/tornado_fetcher.py", line 21, in <module>
import tornado.httputil
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/tornado/httputil.py", line 106, in <module>
class HTTPHeaders(collections.MutableMapping):
^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: module 'collections' has no attribute 'MutableMapping'
Process Process-3:
Traceback (most recent call last):
File "/usr/local/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/local/lib/python3.12/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 794, in invoke
return callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/decorators.py", line 34, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/run.py", line 273, in processor
Processor = load_cls(None, None, processor_cls)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/run.py", line 48, in load_cls
return utils.load_object(value)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/libs/utils.py", line 369, in load_object
module = __import__(module_name, globals(), locals(), [object_name])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/processor/__init__.py", line 1, in <module>
from .processor import ProcessorResult, Processor
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/processor/processor.py", line 20, in <module>
from .project_module import ProjectManager, ProjectFinder
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/processor/project_module.py", line 11, in <module>
import imp
ModuleNotFoundError: No module named 'imp'
Traceback (most recent call last):
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/bin/pyspider", line 8, in <module>
sys.exit(main())
^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/run.py", line 754, in main
cli()
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 1442, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 1363, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 1808, in invoke
rv = super().invoke(ctx)
^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 1226, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 794, in invoke
return callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/decorators.py", line 34, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/run.py", line 165, in cli
ctx.invoke(all)
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 794, in invoke
return callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/decorators.py", line 34, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/run.py", line 497, in all
ctx.invoke(webui, **webui_config)
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 794, in invoke
return callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/decorators.py", line 34, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/run.py", line 333, in webui
app = load_cls(None, None, webui_instance)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/run.py", line 48, in load_cls
return utils.load_object(value)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/libs/utils.py", line 369, in load_object
module = __import__(module_name, globals(), locals(), [object_name])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/webui/__init__.py", line 8, in <module>
from . import app, index, debug, task, result, login
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/webui/app.py", line 17, in <module>
from pyspider.fetcher import tornado_fetcher
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/fetcher/__init__.py", line 1, in <module>
from .tornado_fetcher import Fetcher
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/fetcher/tornado_fetcher.py", line 21, in <module>
import tornado.httputil
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/tornado/httputil.py", line 106, in <module>
class HTTPHeaders(collections.MutableMapping):
^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: module 'collections' has no attribute 'MutableMapping'
4. 再次尝试运行 pyspider 命令,出现如下import backports.ssl_match_hostname ModuleNotFoundError: No module named 'backports'错误,是因为pyspider 依赖的 tornado 库在 Python 3.12 环境下需要 backports.ssl_match_hostname 模块,而 pyspider 尚未完全适配这些改动。
解决方案:
backports.ssl_match_hostname 模块缺失问题,可通过pycharm的Terminal中执行如下命令解决
$ pip install backports.ssl_match_hostname
补充: Error: Could not create web server listening on port 25555 属于端口占用问题,将其他问题都解决了再处理此问题
错误详情:
$ pyspider
Error: Could not create web server listening on port 25555
Error: Could not create web server listening on port 25555
[I 250515 12:54:49 processor:211] processor starting...
Process Process-4:
Traceback (most recent call last):
File "/usr/local/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/local/lib/python3.12/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 794, in invoke
return callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/decorators.py", line 34, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/run.py", line 236, in fetcher
Fetcher = load_cls(None, None, fetcher_cls)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/run.py", line 48, in load_cls
return utils.load_object(value)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/libs/utils.py", line 369, in load_object
module = __import__(module_name, globals(), locals(), [object_name])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/fetcher/__init__.py", line 1, in <module>
from .tornado_fetcher import Fetcher
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/fetcher/tornado_fetcher.py", line 31, in <module>
from tornado.simple_httpclient import SimpleAsyncHTTPClient
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/tornado/simple_httpclient.py", line 8, in <module>
from tornado.http1connection import HTTP1Connection, HTTP1ConnectionParameters
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/tornado/http1connection.py", line 30, in <module>
from tornado import iostream
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/tornado/iostream.py", line 40, in <module>
from tornado.netutil import ssl_wrap_socket, ssl_match_hostname, SSLCertificateError, _client_ssl_defaults, _server_ssl_defaults
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/tornado/netutil.py", line 56, in <module>
import backports.ssl_match_hostname
ModuleNotFoundError: No module named 'backports'
Error: Could not create web server listening on port 25555
Error: Could not create web server listening on port 25555
Traceback (most recent call last):
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/bin/pyspider", line 8, in <module>
sys.exit(main())
^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/run.py", line 754, in main
cli()
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 1442, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 1363, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 1808, in invoke
rv = super().invoke(ctx)
^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 1226, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 794, in invoke
return callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/decorators.py", line 34, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/run.py", line 165, in cli
ctx.invoke(all)
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 794, in invoke
return callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/decorators.py", line 34, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/run.py", line 497, in all
ctx.invoke(webui, **webui_config)
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 794, in invoke
return callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/decorators.py", line 34, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/run.py", line 333, in webui
app = load_cls(None, None, webui_instance)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/run.py", line 48, in load_cls
return utils.load_object(value)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/libs/utils.py", line 369, in load_object
module = __import__(module_name, globals(), locals(), [object_name])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/webui/__init__.py", line 8, in <module>
from . import app, index, debug, task, result, login
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/webui/app.py", line 17, in <module>
from pyspider.fetcher import tornado_fetcher
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/fetcher/__init__.py", line 1, in <module>
from .tornado_fetcher import Fetcher
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/fetcher/tornado_fetcher.py", line 31, in <module>
from tornado.simple_httpclient import SimpleAsyncHTTPClient
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/tornado/simple_httpclient.py", line 8, in <module>
from tornado.http1connection import HTTP1Connection, HTTP1ConnectionParameters
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/tornado/http1connection.py", line 30, in <module>
from tornado import iostream
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/tornado/iostream.py", line 40, in <module>
from tornado.netutil import ssl_wrap_socket, ssl_match_hostname, SSLCertificateError, _client_ssl_defaults, _server_ssl_defaults
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/tornado/netutil.py", line 56, in <module>
import backports.ssl_match_hostname
ModuleNotFoundError: No module named 'backports'
5. 再次尝试运行 pyspider 命令,出现如下错误信息AttributeError: module 'fractions' has no attribute 'gcd',是因为pyspider 在 Python 3.12 环境下运行时存在兼容性问题,fractions模块已被移除,而 pyspider 尚未完全适配这些改动。
解决方案:
a). pyspider/libs/base_handler.py的python代码第12行空白行新增
import math
b).修改 pyspider/libs/base_handler.py的python代码将其中的第115行
min_tick = fractions.gcd(min_tick, each.tick)
改成:
min_tick = math.gcd(min_tick, each.tick)
错误详情:
$ pyspider
Error: Could not create web server listening on port 25555
Error: Could not create web server listening on port 25555
Traceback (most recent call last):
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/bin/pyspider", line 8, in <module>
sys.exit(main())
^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/run.py", line 754, in main
cli()
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 1442, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 1363, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 1808, in invoke
rv = super().invoke(ctx)
^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 1226, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 794, in invoke
return callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/decorators.py", line 34, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/run.py", line 165, in cli
ctx.invoke(all)
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 794, in invoke
return callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/decorators.py", line 34, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/run.py", line 497, in all
ctx.invoke(webui, **webui_config)
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 794, in invoke
return callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/decorators.py", line 34, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/run.py", line 333, in webui
app = load_cls(None, None, webui_instance)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/run.py", line 48, in load_cls
return utils.load_object(value)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/libs/utils.py", line 369, in load_object
module = __import__(module_name, globals(), locals(), [object_name])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/webui/__init__.py", line 8, in <module>
from . import app, index, debug, task, result, login
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/webui/debug.py", line 22, in <module>
from pyspider.libs import utils, sample_handler, dataurl
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/libs/sample_handler.py", line 9, in <module>
class Handler(BaseHandler):
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/libs/base_handler.py", line 115, in __new__
min_tick = fractions.gcd(min_tick, each.tick)
^^^^^^^^^^^^^
AttributeError: module 'fractions' has no attribute 'gcd'
6. 再次尝试运行 pyspider 命令,出现如下Flask 兼容性问题错误信息(AttributeError: 'QuitableFlask' object has no attribute 'before_first_request'),这个错误是因为 pyspider 使用的 Flask 版本不兼容。pyspider 是基于 Flask 旧版本开发的,而新版本(如 Flask 2.3+)移除了 before_first_request 装饰器
解决方案:
修改 pyspider/webui/debug.py的python代码将其中的第 64 行
@app.before_first_request
def enable_projects_import():
sys.meta_path.append(ProjectFinder(app.config['projectdb']))
改成:
@app.before_request
def enable_projects_import():
if not hasattr(app, '_got_first_request'):
app._got_first_request = True
sys.meta_path.append(ProjectFinder(app.config['projectdb']))
错误详情:
$ pyspider
Error: Could not create web server listening on port 25555
Error: Could not create web server listening on port 25555
[I 250515 13:42:09 result_worker:49] result_worker starting...
Error: Could not create web server listening on port 25555
[I 250515 13:42:09 processor:211] processor starting...
[I 250515 13:42:09 tornado_fetcher:638] fetcher starting...
Error: Could not create web server listening on port 25555
[I 250515 13:42:09 scheduler:647] scheduler starting...
[I 250515 13:42:10 scheduler:782] scheduler.xmlrpc listening on 127.0.0.1:23333
Error: Could not create web server listening on port 25555
[I 250515 13:42:10 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
Traceback (most recent call last):
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/bin/pyspider", line 8, in <module>
sys.exit(main())
^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/run.py", line 754, in main
cli()
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 1442, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 1363, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 1808, in invoke
rv = super().invoke(ctx)
^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 1226, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 794, in invoke
return callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/decorators.py", line 34, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/run.py", line 165, in cli
ctx.invoke(all)
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 794, in invoke
return callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/decorators.py", line 34, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/run.py", line 497, in all
ctx.invoke(webui, **webui_config)
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 794, in invoke
return callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/decorators.py", line 34, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/run.py", line 333, in webui
app = load_cls(None, None, webui_instance)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/run.py", line 48, in load_cls
return utils.load_object(value)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/libs/utils.py", line 369, in load_object
module = __import__(module_name, globals(), locals(), [object_name])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/webui/__init__.py", line 8, in <module>
from . import app, index, debug, task, result, login
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/webui/debug.py", line 64, in <module>
@app.before_first_request
^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'QuitableFlask' object has no attribute 'before_first_request'. Did you mean: '_got_first_request'?
Error: Could not create web server listening on port 25555
7. 再次尝试运行 pyspider 命令,出现错误 TypeError: Can't instantiate abstract class ScriptProvider without an implementation for abstract method 'get_resource_inst' ,原因是pyspider 的 WebDAV 模块在 Python 3.12 环境下存在兼容性问题。在 Python 3.12 中,抽象基类(ABC)的检查变得更加严格,ScriptProvider 类没有实现其基类要求的所有抽象方法。
解决方案:
修改pyspider/webui/webdav.py的python代码第165行
class ScriptProvider(DAVProvider):
def __init__(self, app):
super(ScriptProvider, self).__init__()
self.app = app
def __repr__(self):
return "pyspiderScriptProvider"
def getResourceInst(self, path, environ):
path = os.path.normpath(path).replace('\\', '/')
if path in ('/', '.', ''):
path = '/'
return RootCollection(path, environ, self.app)
else:
return ScriptResource(path, environ, self.app)
改为:
class ScriptProvider(DAVProvider):
def __init__(self, app):
super(ScriptProvider, self).__init__()
self.app = app
def __repr__(self):
return "pyspiderScriptProvider"
def getResourceInst(self, path, environ):
path = os.path.normpath(path).replace('\\', '/')
if path in ('/', '.', ''):
path = '/'
return RootCollection(path, environ, self.app)
else:
return ScriptResource(path, environ, self.app)
# 添加缺失的抽象方法实现
def get_resource_inst(self, path, environ):
return ScriptResource(path, self, environ)
错误详情:
$ pyspider
Error: Could not create web server listening on port 25555
Error: Could not create web server listening on port 25555
[I 250515 14:05:24 result_worker:49] result_worker starting...
Error: Could not create web server listening on port 25555
[I 250515 14:05:24 processor:211] processor starting...
Error: Could not create web server listening on port 25555
[I 250515 14:05:25 scheduler:647] scheduler starting...
[I 250515 14:05:25 tornado_fetcher:638] fetcher starting...
[I 250515 14:05:25 scheduler:782] scheduler.xmlrpc listening on 127.0.0.1:23333
[I 250515 14:05:25 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
Error: Could not create web server listening on port 25555
[I 250515 14:05:25 app:84] webui exiting...
Traceback (most recent call last):
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/bin/pyspider", line 8, in <module>
sys.exit(main())
^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/run.py", line 754, in main
Error: Could not create web server listening on port 25555
cli()
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 1442, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 1363, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 1808, in invoke
rv = super().invoke(ctx)
^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 1226, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 794, in invoke
return callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/decorators.py", line 34, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/run.py", line 165, in cli
ctx.invoke(all)
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 794, in invoke
return callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/decorators.py", line 34, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/run.py", line 497, in all
ctx.invoke(webui, **webui_config)
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 794, in invoke
return callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/decorators.py", line 34, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/run.py", line 384, in webui
app.run(host=host, port=port)
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/webui/app.py", line 59, in run
from .webdav import dav_app
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/webui/webdav.py", line 207, in <module>
'/': ScriptProvider(app)
^^^^^^^^^^^^^^^^^^^
TypeError: Can't instantiate abstract class ScriptProvider without an implementation for abstract method 'get_resource_inst'
8. 再次尝试运行 pyspider 命令,出现错误 ValueError: Invalid configuration: - Deprecated option 'domaincontroller': use 'http_authenticator.domain_controller' instead.,原因是 WsgiDAV 库的配置格式发生了改变,domaincontroller 选项已被弃用,需要使用 http_authenticator.domain_controller 替代。
解决方案:
将pyspider/webui/webdav.py的python代码第207行
config = DEFAULT_CONFIG.copy()
config.update({
'mount_path': '/dav',
'provider_mapping': {
'/': ScriptProvider(app)
},
'domaincontroller': NeedAuthController(app),
'verbose': 1 if app.debug else 0,
'dir_browser': {'davmount': False,
'enable': True,
'msmount': False,
'response_trailer': ''},
})
dav_app = WsgiDAVApp(config)
改成:
config = DEFAULT_CONFIG.copy()
config.update({
'mount_path': '/dav',
'provider_mapping': {
'/': ScriptProvider(app)
},
# 更新认证配置
"http_authenticator": {
"domain_controller": NeedAuthController, # 移动到 http_authenticator 下
"accept_basic": True,
"accept_digest": False,
"default_to_digest": False,
},
'verbose': 1 if app.debug else 0,
'dir_browser': {'davmount': False,
'enable': True,
'msmount': False,
'response_trailer': ''},
})
dav_app = WsgiDAVApp(config)
错误详情:
$ pyspider
Error: Could not create web server listening on port 25555
Error: Could not create web server listening on port 25555
[I 250515 14:14:01 result_worker:49] result_worker starting...
Error: Could not create web server listening on port 25555
[I 250515 14:14:01 processor:211] processor starting...
[I 250515 14:14:01 scheduler:647] scheduler starting...
Error: Could not create web server listening on port 25555
[I 250515 14:14:01 tornado_fetcher:638] fetcher starting...
[I 250515 14:14:01 scheduler:782] scheduler.xmlrpc listening on 127.0.0.1:23333
[I 250515 14:14:01 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
Error: Could not create web server listening on port 25555
Error: Could not create web server listening on port 25555
[I 250515 14:14:01 app:84] webui exiting...
Traceback (most recent call last):
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/bin/pyspider", line 8, in <module>
sys.exit(main())
^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/run.py", line 754, in main
cli()
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 1442, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 1363, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 1808, in invoke
rv = super().invoke(ctx)
^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 1226, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 794, in invoke
return callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/decorators.py", line 34, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/run.py", line 165, in cli
ctx.invoke(all)
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 794, in invoke
return callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/decorators.py", line 34, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/run.py", line 497, in all
ctx.invoke(webui, **webui_config)
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 794, in invoke
return callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/decorators.py", line 34, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/run.py", line 384, in webui
app.run(host=host, port=port)
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/webui/app.py", line 59, in run
from .webdav import dav_app
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/webui/webdav.py", line 220, in <module>
dav_app = WsgiDAVApp(config)
^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/wsgidav/wsgidav_app.py", line 155, in __init__
_check_config(config)
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/wsgidav/wsgidav_app.py", line 129, in _check_config
raise ValueError("Invalid configuration:\n - " + "\n - ".join(errors))
ValueError: Invalid configuration:
- Deprecated option 'domaincontroller': use 'http_authenticator.domain_controller' instead.
9. 再次尝试运行 pyspider 命令,出现错误”TypeError: NeedAuthController.__init__() takes 2 positional arguments but 3 were given“,原因是WsgiDAV 的认证控制器接口发生变化,存在兼容性问题
解决方案:
pyspider/webui/webdav.py的python代码第186行:
class NeedAuthController(object):
def __init__(self, app):
self.app = app
def getDomainRealm(self, inputRelativeURL, environ):
return 'need auth'
def requireAuthentication(self, realmname, environ):
return self.app.config.get('need_auth', False)
def isRealmUser(self, realmname, username, environ):
return username == self.app.config.get('webui_username')
def getRealmUserPassword(self, realmname, username, environ):
return self.app.config.get('webui_password')
def authDomainUser(self, realmname, username, password, environ):
return username == self.app.config.get('webui_username') \
and password == self.app.config.get('webui_password')
改成:
class NeedAuthController(object):
def __init__(self, app, config=None):
self.app = app
# 处理额外的config参数,使其兼容WsgiDAV的初始化方式
if config is not None:
self.config = config
else:
# 如果config未提供,尝试从app中获取
self.config = app.config.get("http_authenticator", {})
def getDomainRealm(self, inputRelativeURL, environ):
return 'need auth'
def requireAuthentication(self, realmname, environ):
return self.app.config.get('need_auth', False)
def isRealmUser(self, realmname, username, environ):
return username == self.app.config.get('webui_username')
def getRealmUserPassword(self, realmname, username, environ):
return self.app.config.get('webui_password')
def authDomainUser(self, realmname, username, password, environ):
return username == self.app.config.get('webui_username') \
and password == self.app.config.get('webui_password')
# 添加WsgiDAV期望的接口方法,转发到原有方法
def get_domain_realm(self, input_path, environ):
return self.getDomainRealm(input_path, environ)
def basic_auth_user(self, realm, user_name, password, environ):
return self.authDomainUser(realm, user_name, password, environ)
def supports_http_digest_auth(self):
return False # 我们不支持摘要认证
def is_share_anonymous(self, share_path):
"""检查指定的共享路径是否允许匿名访问"""
# 如果不需要认证,则所有共享都允许匿名访问
return not self.app.config.get('need_auth', False)
错误详情:
$ pyspider
Error: Could not create web server listening on port 25555
Error: Could not create web server listening on port 25555
[I 250515 14:42:21 result_worker:49] result_worker starting...
Error: Could not create web server listening on port 25555
[I 250515 14:42:21 processor:211] processor starting...
[I 250515 14:42:21 tornado_fetcher:638] fetcher starting...
Error: Could not create web server listening on port 25555
[I 250515 14:42:21 scheduler:647] scheduler starting...
[I 250515 14:42:21 scheduler:782] scheduler.xmlrpc listening on 127.0.0.1:23333
[I 250515 14:42:21 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
Error: Could not create web server listening on port 25555
[I 250515 14:42:21 app:84] webui exiting...
Traceback (most recent call last):
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/bin/pyspider", line 8, in <module>
sys.exit(main())
^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/run.py", line 754, in main
cli()
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 1442, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 1363, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 1808, in invoke
rv = super().invoke(ctx)
^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 1226, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 794, in invoke
return callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/decorators.py", line 34, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/run.py", line 165, in cli
ctx.invoke(all)
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 794, in invoke
return callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/decorators.py", line 34, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/run.py", line 497, in all
ctx.invoke(webui, **webui_config)
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 794, in invoke
return callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/decorators.py", line 34, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/run.py", line 384, in webui
app.run(host=host, port=port)
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/webui/app.py", line 59, in run
from .webdav import dav_app
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/webui/webdav.py", line 226, in <module>
dav_app = WsgiDAVApp(config)
^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/wsgidav/wsgidav_app.py", line 257, in __init__
app = mw(self, self.application, config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/wsgidav/http_authenticator.py", line 140, in __init__
dc = make_domain_controller(wsgidav_app, config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/wsgidav/http_authenticator.py", line 111, in make_domain_controller
dc = dc(wsgidav_app, config)
^^^^^^^^^^^^^^^^^^^^^^^
TypeError: NeedAuthController.__init__() takes 2 positional arguments but 3 were given
Error: Could not create web server listening on port 25555
10. 再次尝试运行 pyspider 命令,出现错误ImportError: cannot import name 'DispatcherMiddleware' from 'werkzeug.wsgi',原因是 Werkzeug 库版本与 pyspider 不兼容。从 Python 3.12 开始,Werkzeug v2.3.0 及以上版本已经移除了DispatcherMiddleware,将其移至独立的werkzeug.middleware.dispatcher模块中。而 pyspider 仍在使用旧的导入方式。
解决方案:
修改 pyspider/webui/app.py的python代码第64行
from werkzeug.wsgi import DispatcherMiddleware
改成:
from werkzeug.middleware.dispatcher import DispatcherMiddleware
错误详情:
$ pyspider
Error: Could not create web server listening on port 25555
Error: Could not create web server listening on port 25555
Error: Could not create web server listening on port 25555
Error: Could not create web server listening on port 25555
[I 250515 14:58:59 result_worker:49] result_worker starting...
Error: Could not create web server listening on port 25555
[I 250515 14:58:59 processor:211] processor starting...
Error: Could not create web server listening on port 25555
[I 250515 14:58:59 scheduler:647] scheduler starting...
[I 250515 14:58:59 tornado_fetcher:638] fetcher starting...
[I 250515 14:58:59 scheduler:782] scheduler.xmlrpc listening on 127.0.0.1:23333
[I 250515 14:58:59 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
Error: Could not create web server listening on port 25555
Error: Could not create web server listening on port 25555
[I 250515 14:59:00 app:84] webui exiting...
Traceback (most recent call last):
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/bin/pyspider", line 8, in <module>
sys.exit(main())
^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/run.py", line 754, in main
cli()
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 1442, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 1363, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 1808, in invoke
rv = super().invoke(ctx)
^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 1226, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 794, in invoke
return callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/decorators.py", line 34, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/run.py", line 165, in cli
ctx.invoke(all)
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 794, in invoke
return callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/decorators.py", line 34, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/run.py", line 497, in all
ctx.invoke(webui, **webui_config)
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/core.py", line 794, in invoke
return callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/click/decorators.py", line 34, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/run.py", line 384, in webui
app.run(host=host, port=port)
File "/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/pyspider/webui/app.py", line 64, in run
from werkzeug.wsgi import DispatcherMiddleware
ImportError: cannot import name 'DispatcherMiddleware' from 'werkzeug.wsgi' (/home/wanghu/PycharmProjects/PythonProject/getHarmonyVideoDatas/.venv/lib/python3.12/site-packages/werkzeug/wsgi.py)
Error: Could not create web server listening on port 25555
11. 再次尝试运行 pyspider 命令,出现错误Error: Could not create web server listening on port 25555,原因是25555端口被占用,需要释放端口重新执行 pyspider 命令
解决方案: 使用 lsof -i 25555查看端口被哪个PID占用,然后用 kill -9 <PID> 释放端口后再重新执行 pyspider 命令。看到 如下打印结果表示成功启动 pyspider,并且启用了5000端口。浏览器可以访问 http://localhost:5000/ 进入PySpider网页控制台爬数据了。
$ lsof -i :25555
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
phantomjs 16615 wanghu 7u IPv4 278700 0t0 TCP *:25555 (LISTEN)
(.venv) wanghu@td-1:~/PycharmProjects/PythonProject/getHarmonyVideoDatas$ kill -9 16615
(.venv) wanghu@td-1:~/PycharmProjects/PythonProject/getHarmonyVideoDatas$ pyspider
phantomjs fetcher running on port 25555
[I 250515 15:08:09 result_worker:49] result_worker starting...
[I 250515 15:08:10 processor:211] processor starting...
[I 250515 15:08:10 tornado_fetcher:638] fetcher starting...
[I 250515 15:08:10 scheduler:647] scheduler starting...
[I 250515 15:08:10 scheduler:782] scheduler.xmlrpc listening on 127.0.0.1:23333
[I 250515 15:08:10 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
[I 250515 15:08:10 app:76] webui running on 0.0.0.0:5000
错误详情:
$ pyspider
Error: Could not create web server listening on port 25555
Error: Could not create web server listening on port 25555
Error: Could not create web server listening on port 25555
Error: Could not create web server listening on port 25555
Error: Could not create web server listening on port 25555
[I 250515 15:06:48 result_worker:49] result_worker starting...
Error: Could not create web server listening on port 25555
[I 250515 15:06:49 processor:211] processor starting...
[I 250515 15:06:49 tornado_fetcher:638] fetcher starting...
[I 250515 15:06:49 scheduler:647] scheduler starting...
[I 250515 15:06:49 scheduler:782] scheduler.xmlrpc listening on 127.0.0.1:23333
Error: Could not create web server listening on port 25555
[I 250515 15:06:49 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
Error: Could not create web server listening on port 25555
[I 250515 15:06:49 app:76] webui running on 0.0.0.0:5000
Ubuntu 16.04 系统(解释器为 python3.12)在Pycharm虚拟环境中安装 pyspider 爬虫工具Ubuntu 16.04 系统(解释器为 python3.12)在Pycharm虚拟环境中安装 pyspider 爬虫工具Ubuntu 16.04 系统(解释器为 python3.12)在Pycharm虚拟环境中安装 pyspider 爬虫工具Ubuntu 16.04 系统(解释器为 python3.12)在Pycharm虚拟环境中安装 pyspider 爬虫工具Ubuntu 16.04 系统(解释器为 python3.12)在Pycharm虚拟环境中安装 pyspider 爬虫工具Ubuntu 16.04 系统(解释器为 python3.12)在Pycharm虚拟环境中安装 pyspider 爬虫工具Ubuntu 16.04 系统(解释器为 python3.12)在Pycharm虚拟环境中安装 pyspider 爬虫工具Ubuntu 16.04 系统(解释器为 python3.12)在Pycharm虚拟环境中安装 pyspider 爬虫工具Ubuntu 16.04 系统(解释器为 python3.12)在Pycharm虚拟环境中安装 pyspider 爬虫工具