G4
教程
Gemma 4 安装教程
在您的机器上安装和运行 Gemma 4 的分步指南——从搭建 Python 环境到运行首次推理。涵盖 Python SDK 和 Ollama 两种方式。
Python pip Ollama CUDA
前提条件
系统要求
| OS | Linux, macOS, or Windows (WSL2) |
| Python | 3.9 or higher (3.11 recommended) |
| GPU | NVIDIA with 6 GB+ VRAM (optional but recommended) |
| CUDA | 12.1+ (if using GPU) |
| RAM | 16 GB+ system RAM |
| Disk | 20–60 GB free space per model |
Hugging Face 账号
- 在 huggingface.co 创建免费账号
- 访问模型页面(如 google/gemma-4-E4B-it)
- 点击"访问仓库"并接受许可协议
- 在 Settings → Access Tokens 生成读取 Token
模型访问免费——Google 仅要求同意许可协议。
第一步 — 配置 Python 环境
使用 Conda(推荐)
# Create a dedicated Python environment
conda create -n gemma4 python=3.11 -y
conda activate gemma4使用 venv
python -m venv gemma4-env
# Linux/macOS:
source gemma4-env/bin/activate
# Windows:
gemma4-env\Scripts\activate第二步 — 安装依赖
# Core dependencies
pip install -U transformers torch accelerate
# Optional: quantization support
pip install bitsandbytes
# Optional: faster inference
pip install flash-attn --no-build-isolation对于 GPU 用户:PyTorch 会自动检测 CUDA。使用以下命令验证: python -c "import torch; print(torch.cuda.is_available())".
第三步 — 登录 Hugging Face
# Install Hugging Face CLI
pip install huggingface_hub
# Authenticate (get your token at huggingface.co/settings/tokens)
huggingface-cli login第四步 — 下载模型
通过 Python 下载
# Download the E4B model (recommended for 8-16 GB VRAM)
from huggingface_hub import snapshot_download
snapshot_download(
repo_id="google/gemma-4-E4B-it",
local_dir="./models/gemma4-e4b"
)模型大小参考
| Model | Download Size |
|---|---|
| E2B | ~4.6 GB |
| E4B | ~8.0 GB |
| 31B | ~58 GB |
| 26B A4B | ~48 GB |
第五步 — 验证安装
import torch
from transformers import AutoProcessor, AutoModelForCausalLM
print("PyTorch:", torch.__version__)
print("CUDA available:", torch.cuda.is_available())
if torch.cuda.is_available():
print("GPU:", torch.cuda.get_device_name(0))
print("VRAM:", round(torch.cuda.get_device_properties(0).total_memory / 1e9, 1), "GB")
# Quick load test
processor = AutoProcessor.from_pretrained("google/gemma-4-E4B-it")
print("Processor loaded OK")第六步 — 运行首次推理
from transformers import AutoProcessor, AutoModelForCausalLM
import torch
MODEL_ID = "google/gemma-4-E4B-it"
processor = AutoProcessor.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(
MODEL_ID,
torch_dtype=torch.bfloat16,
device_map="auto"
)
messages = [{"role": "user", "content": "Explain what Gemma 4 is in 2 sentences."}]
text = processor.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
inputs = processor(text=text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))替代方案 — Ollama(无需 Python)
偏好更简单的安装方式?Ollama 会自动处理所有事项:
# Linux / macOS
curl -fsSL https://ollama.com/install.sh | sh
# Windows: download installer from ollama.com
# Then in terminal:
ollama pull gemma4:e4b
ollama run gemma4:e4b