Llama 4

Getting Started

Visit llama.meta.com and request access to the Llama 4 model weights by agreeing to the community license.
Download the model weights via the official CLI or through Hugging Face Hub.
Run Llama locally using tools like Ollama, vLLM, or llama.cpp for optimized inference on consumer hardware.
Fine-tune the model on your own data using frameworks like Hugging Face Transformers or Axolotl.

Multiple model sizes ranging from 8B to 405B parameters, enabling deployment from edge devices to data centers.
Open-weight license allows commercial use, fine-tuning, and redistribution within Meta’s community license terms.
State-of-the-art reasoning competitive with closed-source models on math, coding, and general knowledge benchmarks.
Extensive ecosystem with support across all major inference frameworks, cloud providers, and fine-tuning tools.
Multilingual support across dozens of languages with strong performance on non-English benchmarks.
Optimized for efficiency with grouped-query attention and other architectural improvements for faster inference.