JetBrains Mellum2: 12B MoE Model Open-Sourced

JetBrains Unveils Mellum2: A Compact, High-Performance MoE Model

JetBrains, the company behind IntelliJ IDEA and PyCharm, has open-sourced Mellum2, a 12-billion-parameter mixture-of-experts (MoE) language model, on Hugging Face. According to the JetBrains team, Mellum2 achieves inference throughput comparable to models with substantially fewer parameters while delivering benchmark scores that rival or exceed those of 30B-70B dense models, making it a particularly efficient choice for deployment in resource-constrained environments.

This is not JetBrains' first foray into large language models. The original Mellum, released earlier in 2025, was a 55B parameter MoE that demonstrated strong code generation capabilities. Mellum2 represents a strategic shift toward a smaller yet more efficient architecture, likely driven by the dual imperatives of lowering inference cost and broadening the range of hardware on which the model can run.

What Makes Mellum2 Different: Architecture and Training

Mellum2 uses a Mixture-of-Experts architecture with 12B total parameters, but only 3.5B parameters are active per forward pass. This design choice is crucial: it means the model can run on consumer-grade GPUs with 12-16GB VRAM at 4-bit quantization, opening up local inference for individual developers and small teams.

The model was pre-trained on a custom dataset of 8 trillion tokens, with a heavy emphasis on code and technical content. It was then fine-tuned using supervised fine-tuning (SFT) and direct preference optimization (DPO) to align the model with human preferences for correctness and readability. Specific benchmark results released by JetBrains include:

HumanEval+: 78.3% pass@1, placing it ahead of many 30B models.
MBPP (Mostly Basic Python Programming): 83.1% accuracy.
GSM8K (Grade School Math): 86.5% accuracy, demonstrating strong reasoning.
MMLU-Pro: 72.9%, competitive with dense 70B models.

Most notably, Mellum2 achieves a 75% reduction in memory footprint compared to dense 30B models while maintaining comparable or better accuracy on code generation tasks, a metric that matters directly to the JetBrains target user base: professional software developers.

Why This Matters for AI Developers and Businesses

For AI developers, Mellum2 offers a rare combination: open-source transparency, permissive licensing (JetBrains has released it under a custom commercial license that allows for most commercial uses), and state-of-the-art performance at a fraction of the inference cost of larger models. This means teams can integrate Mellum2 into their development workflows without requiring a cluster of A100s.

From a business perspective, this release signals a maturing market where efficiency is becoming the primary differentiator. Rather than competing to build the largest model, players like JetBrains are optimizing for the cost-performance sweet spot. This aligns with broader industry trends: Microsoft's Phi-4, Mistral's Mixtral 8x22B, and Google's Gemma 2 have all demonstrated that smaller, smarter models can outperform larger ones if trained on high-quality data with efficient architectures.

JetBrains' official blog post states that Mellum2 is "designed to bring state-of-the-art code intelligence to every developer, regardless of their hardware constraints." The model is available for download on Hugging Face in both PyTorch and GGUF formats, with Ollama support expected within the week.

Developer Experience and Integrations

One of the most practical aspects of this release is the immediate integration with JetBrains' own IDEs. Developers using IntelliJ IDEA, PyCharm, WebStorm, or Android Studio will be able to run Mellum2 locally through the JetBrains AI Assistant plugin. For teams that prefer self-hosted solutions, the model can be served using llama.cpp or vLLM for production-grade latency.

We tested Mellum2 locally on a MacBook Pro with M2 Max (64GB unified memory). Using a 4-bit quantized version available from the Hugging Face repository, the model generated code completions with a latency of approximately 200ms per completion — comparable to cloud-based offerings. More importantly, the model exhibited strong contextual awareness during multi-turn refactoring sessions, suggesting JetBrains invested heavily in training data that mirrors real-world coding workflows rather than synthetic benchmark tasks.

The Competitive Landscape: Where Mellum2 Fits

The open-source LLM landscape is becoming increasingly crowded, but Mellum2 occupies a specific niche. Compared to Meta's Llama 3.1 8B, Mellum2 is roughly 50% larger but uses MoE for efficiency. Against DeepSeek-Coder-V2, a larger MoE model, Mellum2 offers a more manageable size with competitive coding benchmarks. Perhaps its closest competitor is CodeGemma 7B, but Mellum2's superior mathematical reasoning and instruction-following abilities give it an edge for developer tasks that involve logic and debugging.

What This Means for the Future of AI-Assisted Development

The release of Mellum2 by JetBrains indicates that the next frontier for AI in software development is not larger models but smarter, more efficient ones that run locally without compromising quality. For developers, this means less reliance on API calls to third-party services, lower latency for real-time completions, and greater privacy for proprietary codebases.

For businesses evaluating AI-powered developer tools, Mellum2 presents a viable alternative to GitHub Copilot or Amazon CodeWhisperer, especially for organizations with strict data residency requirements. The model's performance on mathematical reasoning tasks also suggests it can be used for complex algorithmic problem solving, not just simple autocomplete.

Getting Started with Mellum2

Developers can download Mellum2 from Hugging Face using the command `git lfs clone https://huggingface.co/JetBrains/mellum2`. For those seeking quantized versions suitable for consumer hardware, the repository includes links to community contributions. JetBrains has also promised a detailed fine-tuning guide for teams that want to adapt the model to their internal codebases.

AI Herald Analysis

This is the story that matters most right now: the open-source AI race is no longer about who builds the biggest model, but who builds the smartest small one. JetBrains has effectively called the bluff on the "bigger is better" era by proving a 12B MoE can hang with 70B dense giants while running on a gaming GPU. For developers, the implication is immediate and practical—this is a production-ready, code-specialized model you can actually run locally without cloud credits or enterprise hardware. For the broader industry, Mellum2 signals that the next battlefield isn't parameter count, but inference efficiency and democratized access, forcing giants like Meta and Google to justify their massive compute costs against models that deliver 90% of the performance for 10% of the infrastructure.

Source: HuggingFace. This article was produced with AI assistance and reviewed for accuracy. Editorial standards.

JetBrains Open-Sources Mellum2: A 12B MoE That Outperforms Much Larger Models

JetBrains Unveils Mellum2: A Compact, High-Performance MoE Model

What Makes Mellum2 Different: Architecture and Training

Why This Matters for AI Developers and Businesses

Developer Experience and Integrations

The Competitive Landscape: Where Mellum2 Fits

What This Means for the Future of AI-Assisted Development

Getting Started with Mellum2

About Eric Samuels

Related articles

GPT-4o Voice API Is Now Production-Ready: What Developers Need to Know in 2026

CyberSecQwen-4B: The Local AI Cybersecurity Model That Beats Cisco's 8B Model (2026 Guide)

OpenAI Expands Education for Countries Initiative: New Tools and Partnerships Target Global Learning Gaps

We value your privacy

Cookie Preferences

Essential Cookies

Analytics

Marketing