应用案例
from http.server import HTTPServer, BaseHTTPRequestHandler
IP = '127.0.0.1'
PORT = 8000
class Handler(BaseHTTPRequestHandler):
def do_GET(self):
self.send_response(200)
self.send_header('Content-type', 'text/html')
self.end_headers()
message = "Hello, World!"
self.wfile.write(bytes(message, "utf8"))
with HTTPServer((IP, PORT), Handler) as httpd:
print("serving at port", PORT)
httpd.serve_forever()
以上是使用内置模块 http.server
实现的一个最简单的 http 服务器,能处理 http GET 请求。
python 内置的 http server 主要集中在两个代码文件上,分别是 socketserver.py
和 http/server.py
。socketserver.py
提供 socket 通信能力的 Server 封装并预留了用户自定义请求处理的接口;http/server.py
基于前者做进一步封装,用得比较多的是 HTTP 的封装。
从开头的例子出发阅读代码(python 3.10.1),大致梳理出以下代码结构,图画得很随意无规范可言,只是为了更具象化解释。
问题一:实现一个 HTTP 服务器大致需要什么要素
先看图 1,左边 BaseServer
一列是类,从上往下是父类到子类;右边 server_forever()
一列是方法,从上往下是逐步深入的调用链。
从父类到子类 主线流程
+----------------+ +------------------+
| | | |
| BaseServer +--------------------->| serve_forever() |
| | | |
+--------+-------+ +--------=+--------+
| |
| |
| |
V V
+----------------+ +----------------------------+
| | | |
| TCPServer | | _handle_request_noblock() |
| | | |
+--------+-------+ +-------------+--------------+
| |
+-----------+------------+ |
| | |
V V V
+----------------+ +----------------+ +------------------+
| | | | | |
| HTTPServer | | UDPServer | | process_request()|
| | | | | |
+----------------+ +----------------+ +---------+--------+
|
|
|
V
+------------------+
| |
| finish_request() |
| |
+------------------+
图 1
例子中使用了 HTTPServer
这个类,字面意思,这个类就是一个 HTTP 服务器,顺着继承链看到 HTTPServer
是 TCPServer
的子类,符合 HTTP 报文是基于 TCP 协议传输的认知,HTTPServer
类其实没什么内容,代码如下:
class HTTPServer(socketserver.TCPServer):
allow_reuse_address = 1 # Seems to make sense in testing environment
def server_bind(self):
"""Override server_bind to store the server name."""
socketserver.TCPServer.server_bind(self)
host, port = self.server_address[:2]
self.server_name = socket.getfqdn(host)
self.server_port = port
TCPServer
的源码实现得益于父类的预留接口,只需要 TCP socket 走一遍 bind
、listen
、accept
、close
流程(子类 UDPServer
同理)。
重点关注 BaseServer
,这里是网络请求处理核心流程的实现,文章最开头的例子中 serve_forever()
这个入口方法就是在此类被实现,我在源码上加了些简单的注释:
def serve_forever(self, poll_interval=0.5):
"""Handle one request at a time until shutdown.
Polls for shutdown every poll_interval seconds. Ignores
self.timeout. If you need to do periodic tasks, do them in
another thread.
"""
self.__is_shut_down.clear()
try:
# XXX: Consider using another file descriptor or connecting to the
# socket to wake this up instead of polling. Polling reduces our
# responsiveness to a shutdown request and wastes cpu at all other
# times. with _ServerSelector() as selector:
selector.register(self, selectors.EVENT_READ) # 注册Server描述符并监听I/O读事件
while not self.__shutdown_request:
ready = selector.select(poll_interval) # 超时时长poll_interval避免长时间阻塞,在while循环下实现轮询
# bpo-35017: shutdown() called during select(), exit immediately.
if self.__shutdown_request:
break
if ready:
self._handle_request_noblock() # 请求过来,I/O读事件准备好,开始处理请求
self.service_actions()
finally:
self.__shutdown_request = False
self.__is_shut_down.set()
从 _handle_request_noblock()
中看到,一个网络请求的处理流程无非就是 verify_request()
、process_request()
、shoutdown_request()
加上些许异常处理逻辑,比较简明。在 finish_request()
中出现 RequestHandlerClass
的类对象创建,这里其实就是用户自定义的 RequestHandler(在 BaseServer
的 __int__()
中被初始化)。源码如下,较好理解:
def _handle_request_noblock(self):
"""Handle one request, without blocking.
I assume that selector.select() has returned that the socket is
readable before this function was called, so there should be no risk of
blocking in get_request().
"""
try:
request, client_address = self.get_request()
except OSError:
return
if self.verify_request(request, client_address): # 从这里开始就是网络请求的处理流程
try:
self.process_request(request, client_address)
except Exception:
self.handle_error(request, client_address)
self.shutdown_request(request)
except:
self.shutdown_request(request)
raise
else:
self.shutdown_request(request)
def process_request(self, request, client_address):
"""Call finish_request.
Overridden by ForkingMixIn and ThreadingMixIn.
"""
self.finish_request(request, client_address)
self.shutdown_request(request)
def finish_request(self, request, client_address):
"""Finish one request by instantiating RequestHandlerClass."""
self.RequestHandlerClass(request, client_address, self)
def shutdown_request(self, request):
"""Called to shutdown and close an individual request."""
self.close_request(request)
总结:要实现一个 HTTP 服务器,需要包含 TCP socket 实现,网络请求流程大致抽象为
verify_request()
、process_request()
、shoutdown_request()
。如果考虑支持用户自定义请求处理,还需要预留接口提供扩展性。当然如何要支持处理 HTTP 协议,还需要具备解析 HTTP 报文的能力,下文继续探讨。
问题二:python 内置的 HTTP Server 是怎么实现的
前文介绍了内置一个网络请求的处理流程(等价于 HTTP Server 的运行流程),一定程度上解释了本节的问题,但欠缺一点细节,没有体现 HTTP 报文的解析逻辑在哪里实现。其实内置的 HTTP Server 的把 HTTP 协议解析的工作解耦出去,单独做成 BaseHTTPRequestHandler
类,这样允许用户自行实现任意应用层的协议解析工作,参考下面图 2:
+----------------------+ +----------------+
| | | |
| BaseRequestHandler +------->| __init__() |
| | | |
+-----------+----------+ +----------------+
|
|
| +----------------+
| | |
V +--->| setup() |
+----------------------+ | | |
| | | +----------------+
| StreamRequestHandler +---+
| | |
+-----------+----------+ | +----------------+
| | | |
| +----> finish() |
V | |
+------------------------+ +----------------------+ +----------------+
| | | |
|SimpleHTTPRequestHandler|<---+BaseHTTPRequestHandler|
| | | |
+------------------------+ +-----------+----------+
|
|
|
V
+------------------+
| |
| handler() |
| |
+---------+--------+ +----------------+
| | |
| +--->| parse_request()|
| | | |
V | +----------------+
+----------------------+ |
| | |
| handler_one_request()+---+
| | | +----------------+
+----------------------+ | | |
+--->| do_XXX() |
| |
+----------------+
图 2
图 2 中,但凡带括号的都是方法,不带括号的是类,从上往下也是父类到子类。本着代码最大化复用的原则,父类 BaseRequestHandler
的 __init__()
中将工作流程确定下来,分别是 setup()
、handler()
、finish()
的先后调用顺序。setup()
和 finish()
在子类 StreamRequestHandler
被实现,最后在 BaseHTTPRequestHandler
类中实现 HTTP 协议解析功能,以及用 HTTP method 来决定调用哪个用户自定义的 do_XXX()
方法,如 do_GET()
、do_POST()
等。代码如下:
class BaseRequestHandler:
"""Base class for request handler classes.
......
"""
def __init__(self, request, client_address, server):
self.request = request
self.client_address = client_address
self.server = server
self.setup()
try:
self.handle()
finally:
self.finish()
def setup(self):
pass
def handle(self):
pass
def finish(self):
pass
class StreamRequestHandler(BaseRequestHandler):
"""Define self.rfile and self.wfile for stream sockets."""
# 省略代码
def setup(self):
# 设置链接超时时长、nagle算法、读写缓冲区
self.connection = self.request
if self.timeout is not None:
self.connection.settimeout(self.timeout)
if self.disable_nagle_algorithm:
self.connection.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, True)
self.rfile = self.connection.makefile('rb', self.rbufsize)
if self.wbufsize == 0:
self.wfile = _SocketWriter(self.connection)
else:
self.wfile = self.connection.makefile('wb', self.wbufsize)
def finish(self):
if not self.wfile.closed:
try:
self.wfile.flush()
except socket.error:
# A final socket error may have occurred here, such as
# the local error ECONNABORTED.
pass
self.wfile.close()
self.rfile.close()
HTTP 协议解析关注 parse_request()
方法,由于代码较多不单独贴过来,思路如下:
- 解析 HTTP 协议版本号,确定版本解析是否支持(1.1 <= version < 2.0)
- 获取 HTTP method
- 解析 HTTP header 解析完 HTTP 协议后,根据所获取的 HTTP method,调用用户自定义的对应方法,至此结束。
总结
python 内置的 HTTP Server 实现比较简洁,功能相对简单。如果要自行从零实现一个 HTTP Server,设计上参考 python 的实现,应该具备以下要素:
- TCP socket 通信
- HTTP 协议的报文解析
- 用户自定义的 RequestHandler 调用(设计上需要引入拓展)