PyPI · vllm
vLLM is an inference and serving engine for large language models (LLMs). Prior to version 0.9.0, when a new prompt is processed, if the PageAttention mechanism finds a matching prefix chunk, the prefill process speeds up, which is reflected in the TTFT (Time to First Token). These timing difference
Is your project exposed to this? Stateward checks every dependency on every pull request and flags it only if your code actually reaches it.
Check my repoSources: CISA KEV (public domain), OSV.dev & GitHub Advisory Database (CC-BY-4.0), FIRST EPSS, NVD/CWE (public domain). Served live from the Stateward advisory database.