Gemma 4
The Most Powerful Open Model Per Parameter
Designed for advanced reasoning and agentic workflows. Google DeepMind's groundbreaking open-source model family — Apache 2.0 licensed, cutting-edge performance, runs on your own hardware.
Why Gemma 4?
Built on the same research and technology used to create Gemini
Native Multimodality
Process text, images, audio, and video in a single model. Understand complex multi-modal inputs with state-of-the-art performance.
Advanced Reasoning
Exceptional performance on complex reasoning tasks. Solve math problems, analyze code, and handle multi-step logical challenges.
Agentic Capabilities
Built for autonomous workflows. Function calling, tool use, and multi-turn interactions out of the box.
Extended Context
Support for ultra-long context windows up to 128K tokens. Perfect for document analysis and complex conversations.
Efficient Deployment
Multiple sizes from 4B to 27B parameters. Run locally on consumer hardware or scale in the cloud.
Open & Free
Apache 2.0 license. Full commercial use allowed. No strings attached.
Model Variants
Choose the right size for your needs
| Model | Parameters | Context | Modality | Best For |
|---|---|---|---|---|
| Gemma 4 4B | 4B | 32K | Text | Edge devices, mobile apps |
| Gemma 4 12B | 12B | 128K | Text + Vision | General purpose, balanced |
| Gemma 4 27B Flagship | 27B | 128K | Text + Vision + Audio | Complex reasoning, production |
Performance
Comparing with leading open models
Higher is better. Results from official benchmarks.
Get Started in Minutes
Multiple ways to run Gemma 4
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"google/gemma-4-27b-it",
device_map="auto",
torch_dtype="auto"
)
tokenizer = AutoTokenizer.from_pretrained("google/gemma-4-27b-it")
messages = [
{"role": "user", "content": "Explain quantum computing in simple terms."}
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt")
outputs = model.generate(inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0])) # Install and run with Ollama
ollama run gemma4:27b
# Or pull first, then run
ollama pull gemma4:27b
ollama run gemma4:27b
# Use with API
curl http://localhost:11434/api/generate -d '{
"model": "gemma4:27b",
"prompt": "Explain quantum computing in simple terms."
}' # No installation needed!
# 1. Go to https://aistudio.google.com/
# 2. Select Gemma 4 model
# 3. Start prompting
# Or use the API:
import google.generativeai as genai
genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel("gemma-4-27b-it")
response = model.generate_content(
"Explain quantum computing in simple terms."
)
print(response.text) Use Cases
What can you build with Gemma 4?
AI Assistants
Build intelligent chatbots and virtual assistants with advanced reasoning capabilities.
Code Generation
Generate, analyze, and debug code across multiple programming languages.
Document Analysis
Extract insights from long documents, reports, and multi-page PDFs.
Vision Tasks
Image understanding, visual Q&A, chart analysis, and OCR.
Research & Education
Academic research, tutoring systems, and educational content generation.
Enterprise Applications
Customer support, content moderation, and workflow automation.
Frequently Asked Questions
Yes! Gemma 4 is released under the Apache 2.0 license, allowing full commercial use without restrictions.
Gemma 4 4B can run on consumer GPUs with 8GB+ VRAM. The 12B model needs 16GB+, and 27B requires 32GB+ or multi-GPU setups. Quantized versions reduce requirements significantly.
Gemma 4 27B achieves competitive performance with proprietary models on many benchmarks while being fully open-source and runnable locally.
Absolutely. Gemma 4 supports LoRA, QLoRA, and full fine-tuning. The open weights allow complete customization for your use case.
Official weights are available on Hugging Face, Kaggle, and Google AI Studio. See the Quick Start section above.
Ready to Build with Gemma 4?
Join millions of developers using Gemma