Blog
HIGHLIGHTS
- May 19, 2026
- 5 min read
Accelerating Inference on Friendli Dedicated Endpoints with Draft-Model Speculative Decoding

Search
[

- May 15, 2026
- 5 min read
What's So Special About DeepSeek V4? Find Out On FriendliAI
DeepSeek-V4
Inference
Dedicated Endpoints
](https://friendli.ai/blog/deepseek-v4-pro-flash)[

- May 11, 2026
- 3 min read
FriendliAI Expands to San Francisco to Scale Frontier AI Inference for Open-Weight and Custom Models
Expansion
Growth
Scale
](https://friendli.ai/blog/friendliai-sf-office)[

- May 7, 2026
- 5 min read
Gemma-4-31B-it API on FriendliAI: #1 Output Speed & Response Time
Gemma
Inference
Model APIs
](https://friendli.ai/blog/gemma-4-31b-it)[

- April 29, 2026
- 4 min read
Scale Beyond GPU Memory Limits with Host KV Cache for Dedicated Endpoints
KV Cache
Dedicated Endpoints
Long-Context Inference
](https://friendli.ai/blog/host-kv-cache-dedicated-endpoints)[

- April 29, 2026
- 5 min read
NVIDIA Nemotron™ 3 Nano Omni, Day-0 on FriendliAI: Unified Multimodal Reasoning, at Peak Performance
NVIDIA
Nemotron
](https://friendli.ai/blog/nvidia-nemotron-3-nano-omni)[

- April 23, 2026
- 2 min read
Vulnerability Discovery with Open-Weight GLM-5: Frontier Quality at 1/7 the Cost of Closed Models
GLM-5
Vulnerability Discovery
Inference
](https://friendli.ai/blog/vulnerability-discovery-glm5)[

- April 20, 2026
- 4 min read
GLM-5.1 on FriendliAI: The Long-Horizon Agentic Engineering Model at Peak Performance
GLM-5.1
Agentic Coding
Inference
](https://friendli.ai/blog/glm-5-1-is-available-on-friendliai)[

- April 15, 2026
- 8 min read
FriendliAI Now Supports Anthropic Messages API
Anthropic
Claude
Inference
](https://friendli.ai/blog/friendliai-supports-anthropic-messages-api)[

- April 14, 2026
- 3 min read
FriendliAI and Samsung Cloud Platform Forge Strategic Alliance to Power Frontier Model AI Inference on NVIDIA B300 GPUs
Samsung Cloud Platform
NVIDIA
Alliance
](https://friendli.ai/blog/friendliai-collaborates-with-samsung-cloud-platform)