Blog

HIGHLIGHTS

  • May 19, 2026
  • 5 min read

Accelerating Inference on Friendli Dedicated Endpoints with Draft-Model Speculative Decoding

Read full article

Accelerating Inference on Friendli Dedicated Endpoints with Draft-Model Speculative Decoding thumbnail


Search

[

What's So Special About DeepSeek V4? Find Out On FriendliAI thumbnail

  • May 15, 2026
  • 5 min read

What's So Special About DeepSeek V4? Find Out On FriendliAI

DeepSeek-V4

Inference

Dedicated Endpoints

](https://friendli.ai/blog/deepseek-v4-pro-flash)[

FriendliAI Expands to San Francisco to Scale Frontier AI Inference for Open-Weight and Custom Models thumbnail

  • May 11, 2026
  • 3 min read

FriendliAI Expands to San Francisco to Scale Frontier AI Inference for Open-Weight and Custom Models

Expansion

Growth

Scale

](https://friendli.ai/blog/friendliai-sf-office)[

Gemma-4-31B-it API on FriendliAI: #1 Output Speed & Response Time thumbnail

  • May 7, 2026
  • 5 min read

Gemma-4-31B-it API on FriendliAI: #1 Output Speed & Response Time

Gemma

Inference

Model APIs

](https://friendli.ai/blog/gemma-4-31b-it)[

Scale Beyond GPU Memory Limits with Host KV Cache for Dedicated Endpoints thumbnail

  • April 29, 2026
  • 4 min read

Scale Beyond GPU Memory Limits with Host KV Cache for Dedicated Endpoints

KV Cache

Dedicated Endpoints

Long-Context Inference

](https://friendli.ai/blog/host-kv-cache-dedicated-endpoints)[

NVIDIA Nemotron™ 3 Nano Omni, Day-0 on FriendliAI: Unified Multimodal Reasoning, at Peak Performance thumbnail

  • April 29, 2026
  • 5 min read

NVIDIA Nemotron™ 3 Nano Omni, Day-0 on FriendliAI: Unified Multimodal Reasoning, at Peak Performance

NVIDIA

Nemotron

](https://friendli.ai/blog/nvidia-nemotron-3-nano-omni)[

Vulnerability Discovery with Open-Weight GLM-5: Frontier Quality at 1/7 the Cost of Closed Models thumbnail

  • April 23, 2026
  • 2 min read

Vulnerability Discovery with Open-Weight GLM-5: Frontier Quality at 1/7 the Cost of Closed Models

GLM-5

Vulnerability Discovery

Inference

](https://friendli.ai/blog/vulnerability-discovery-glm5)[

GLM-5.1 on FriendliAI: The Long-Horizon Agentic Engineering Model at Peak Performance thumbnail

  • April 20, 2026
  • 4 min read

GLM-5.1 on FriendliAI: The Long-Horizon Agentic Engineering Model at Peak Performance

GLM-5.1

Agentic Coding

Inference

](https://friendli.ai/blog/glm-5-1-is-available-on-friendliai)[

FriendliAI Now Supports Anthropic Messages API thumbnail

  • April 15, 2026
  • 8 min read

FriendliAI Now Supports Anthropic Messages API

Anthropic

Claude

Inference

](https://friendli.ai/blog/friendliai-supports-anthropic-messages-api)[

FriendliAI and Samsung Cloud Platform Forge Strategic Alliance to Power Frontier Model AI Inference on NVIDIA B300 GPUs thumbnail

  • April 14, 2026
  • 3 min read

FriendliAI and Samsung Cloud Platform Forge Strategic Alliance to Power Frontier Model AI Inference on NVIDIA B300 GPUs

Samsung Cloud Platform

NVIDIA

Alliance

](https://friendli.ai/blog/friendliai-collaborates-with-samsung-cloud-platform)