Blog | xifan.uno

HIGHLIGHTS

May 19, 2026
5 min read

Accelerating Inference on Friendli Dedicated Endpoints with Draft-Model Speculative Decoding

Read full article

Accelerating Inference on Friendli Dedicated Endpoints with Draft-Model Speculative Decoding thumbnail

[

What's So Special About DeepSeek V4? Find Out On FriendliAI thumbnail

May 15, 2026
5 min read

What's So Special About DeepSeek V4? Find Out On FriendliAI

DeepSeek-V4

Inference

Dedicated Endpoints

](https://friendli.ai/blog/deepseek-v4-pro-flash)[

FriendliAI Expands to San Francisco to Scale Frontier AI Inference for Open-Weight and Custom Models thumbnail

May 11, 2026
3 min read

FriendliAI Expands to San Francisco to Scale Frontier AI Inference for Open-Weight and Custom Models

Expansion

Growth

Scale

](https://friendli.ai/blog/friendliai-sf-office)[

Gemma-4-31B-it API on FriendliAI: #1 Output Speed & Response Time thumbnail

May 7, 2026
5 min read

Gemma-4-31B-it API on FriendliAI: #1 Output Speed & Response Time

Gemma

Inference

Model APIs

](https://friendli.ai/blog/gemma-4-31b-it)[

Scale Beyond GPU Memory Limits with Host KV Cache for Dedicated Endpoints thumbnail

April 29, 2026
4 min read

Scale Beyond GPU Memory Limits with Host KV Cache for Dedicated Endpoints

KV Cache

Dedicated Endpoints

Long-Context Inference

](https://friendli.ai/blog/host-kv-cache-dedicated-endpoints)[

NVIDIA Nemotron™ 3 Nano Omni, Day-0 on FriendliAI: Unified Multimodal Reasoning, at Peak Performance thumbnail

April 29, 2026
5 min read

NVIDIA Nemotron™ 3 Nano Omni, Day-0 on FriendliAI: Unified Multimodal Reasoning, at Peak Performance

NVIDIA

Nemotron

](https://friendli.ai/blog/nvidia-nemotron-3-nano-omni)[

Vulnerability Discovery with Open-Weight GLM-5: Frontier Quality at 1/7 the Cost of Closed Models thumbnail

April 23, 2026
2 min read

Vulnerability Discovery with Open-Weight GLM-5: Frontier Quality at 1/7 the Cost of Closed Models

GLM-5

Vulnerability Discovery

Inference

](https://friendli.ai/blog/vulnerability-discovery-glm5)[

GLM-5.1 on FriendliAI: The Long-Horizon Agentic Engineering Model at Peak Performance thumbnail

April 20, 2026
4 min read

GLM-5.1 on FriendliAI: The Long-Horizon Agentic Engineering Model at Peak Performance

GLM-5.1

Agentic Coding

Inference

](https://friendli.ai/blog/glm-5-1-is-available-on-friendliai)[

FriendliAI Now Supports Anthropic Messages API thumbnail

April 15, 2026
8 min read

FriendliAI Now Supports Anthropic Messages API

Anthropic

Claude

Inference

](https://friendli.ai/blog/friendliai-supports-anthropic-messages-api)[

FriendliAI and Samsung Cloud Platform Forge Strategic Alliance to Power Frontier Model AI Inference on NVIDIA B300 GPUs thumbnail

April 14, 2026
3 min read

FriendliAI and Samsung Cloud Platform Forge Strategic Alliance to Power Frontier Model AI Inference on NVIDIA B300 GPUs

Samsung Cloud Platform

NVIDIA

Alliance

](https://friendli.ai/blog/friendliai-collaborates-with-samsung-cloud-platform)