LLM inference in C/C++
A high-throughput and memory-efficient inference and serving engine for LLMs
SGLang is a fast serving framework for large language models and vision language models.
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Nano vLLM
AI coding agent, built for the terminal.