チュートリアル

Gemma 4 インストールチュートリアル

Python環境のセットアップから初回推理実行まで、Gemma 4をマシンにインストールして実行するためのステップバイステップガイド。Python SDKとOllamaの両方をカバー。

Python pip Ollama CUDA

前提条件

システム要件

OS	Linux, macOS, or Windows (WSL2)
Python	3.9 or higher (3.11 recommended)
GPU	NVIDIA with 6 GB+ VRAM (optional but recommended)
CUDA	12.1+ (if using GPU)
RAM	16 GB+ system RAM
Disk	20–60 GB free space per model

Hugging Face アカウント

huggingface.co で無料アカウントを作成
モデルページを訪問（例：google/gemma-4-E4B-it）
「リポジトリにアクセス」をクリックしてライセンスに同意
Settings → Access Tokens で読み取りトークンを生成

モデルアクセスは無料 — Googleはライセンス同意のみ要求。

ステップ1 — Python環境のセットアップ

Conda を使用（推奨）

# Create a dedicated Python environment
conda create -n gemma4 python=3.11 -y
conda activate gemma4

venv を使用

python -m venv gemma4-env
# Linux/macOS:
source gemma4-env/bin/activate
# Windows:
gemma4-env\Scripts\activate

ステップ2 — 依存関係のインストール

# Core dependencies
pip install -U transformers torch accelerate

# Optional: quantization support
pip install bitsandbytes

# Optional: faster inference
pip install flash-attn --no-build-isolation

GPUユーザー向け：PyTorchは利用可能な場合、CUDAを自動的に選択します。以下で確認： python -c "import torch; print(torch.cuda.is_available())".

ステップ3 — Hugging Faceで認証

# Install Hugging Face CLI
pip install huggingface_hub

# Authenticate (get your token at huggingface.co/settings/tokens)
huggingface-cli login

ステップ4 — モデルのダウンロード

Pythonでダウンロード

# Download the E4B model (recommended for 8-16 GB VRAM)
from huggingface_hub import snapshot_download

snapshot_download(
    repo_id="google/gemma-4-E4B-it",
    local_dir="./models/gemma4-e4b"
)

モデルサイズの参考

Model	Download Size
E2B	~4.6 GB
E4B	~8.0 GB
31B	~58 GB
26B A4B	~48 GB

ステップ5 — インストールの確認

import torch
from transformers import AutoProcessor, AutoModelForCausalLM

print("PyTorch:", torch.__version__)
print("CUDA available:", torch.cuda.is_available())
if torch.cuda.is_available():
    print("GPU:", torch.cuda.get_device_name(0))
    print("VRAM:", round(torch.cuda.get_device_properties(0).total_memory / 1e9, 1), "GB")

# Quick load test
processor = AutoProcessor.from_pretrained("google/gemma-4-E4B-it")
print("Processor loaded OK")

ステップ6 — 初回推理の実行

from transformers import AutoProcessor, AutoModelForCausalLM
import torch

MODEL_ID = "google/gemma-4-E4B-it"

processor = AutoProcessor.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

messages = [{"role": "user", "content": "Explain what Gemma 4 is in 2 sentences."}]
text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
inputs = processor(text=text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))

代替方法 — Ollama（Python不要）

よりシンプルなセットアップを好む場合、Ollamaがすべてを自動的に処理します：

# Linux / macOS
curl -fsSL https://ollama.com/install.sh | sh

# Windows: download installer from ollama.com
# Then in terminal:
ollama pull gemma4:e4b
ollama run gemma4:e4b