Blog

Inside the LPU
Deconstructing Groq's Speed
Legacy hardware forces a choice: faster inference with quality degradation, or accurate inference with unacceptable latency. This tradeoff exists because GPU architectures optimize for training workloads. The LPU–purpose-built hardware for inference–preserves quality while eliminating architectural bottlenecks which create latency in the first place.
- Apr 09, 2026

Canopy Labs’ Orpheus TTS is live on GroqCloud
- Feb 16, 2026

GroqCloud: Expanding to Meet Demand
- Dec 16, 2025

Advancing the American AI Stack
- Dec 01, 2025

Groq Recognized in 2025 Gartner® Cool Vendor in AI Infrastructure report
- Nov 25, 2025

Introducing MCP Connectors in Beta on GroqCloud
- Oct 29, 2025

Day Zero Support for OpenAI Open Safety Model
- Oct 22, 2025

LLMs Inside the Product: A Practical Field Guide
- Oct 16, 2025

GPT‑OSS Improvements: Prompt Caching & Lower Pricing
- Sep 23, 2025

Introducing Remote MCP Support in Beta on GroqCloud
- Sep 04, 2025

Introducing the Next Generation of Compound on GroqCloud
- Sep 04, 2025

Introducing Kimi K2‑0905 on GroqCloud
- Aug 20, 2025

Introducing Prompt Caching on GroqCloud
- Aug 05, 2025

Day Zero Support for OpenAI Open Models
- Aug 01, 2025

Inside the LPU: Deconstructing Groq’s Speed
- Jul 31, 2025

OpenBench: Reproducible LLM Evals Made Easy
- Jun 16, 2025

Build Faster with Groq + Hugging Face
- Jun 10, 2025

GroqCloud™ Now Supports Qwen3 32B
- Jun 03, 2025

LoRA Fine-Tune Support Now Live on GroqCloud
- May 27, 2025

From Speed to Scale: How Groq Is Optimized for MoE & Other Large Models
- May 16, 2025

How to Build Your Own AI Research Agent with One Groq API Call
- Apr 29, 2025

Official Llama API Now Fastest via Groq Inference
- Apr 15, 2025

Now in Preview: Groq’s First Compound AI System
- Apr 05, 2025

Llama 4 Inference Fast & Affordable – Now Live on GroqCloud
- Mar 26, 2025

Build Fast with Text-to-Speech AI – Dialog Model on Groq
1
Build Fast
Seamlessly integrate Groq starting with just a few lines of code