Skip to main content
News Jun 01, 2026 5 min read 10 views

JetBrains Open-Sources Mellum2: A 12B MoE That Outperforms Much Larger Models

Eric Samuels - AI Herald Author Avatar
Eric Samuels Updated: Jun 01, 2026
Mellum2 JetBrains MoE open source LLM code generation AI developer tools efficient inference
JetBrains Open-Sources Mellum2: A 12B MoE That Outperforms Much Larger Models
JetBrains releases Mellum2, a 12B mixture-of-experts model that outperforms many 30B-70B dense models on code and math benchmarks. Efficient inference

JetBrains Unveils Mellum2: A Compact, High-Performance MoE Model

JetBrains, the company behind IntelliJ IDEA and PyCharm, has open-sourced Mellum2, a 12-billion-parameter mixture-of-experts (MoE) language model, on Hugging Face. According to the JetBrains team, Mellum2 achieves inference throughput comparable to models with substantially fewer parameters while delivering benchmark scores that rival or exceed those of 30B-70B dense models, making it a particularly efficient choice for deployment in resource-constrained environments.

This is not JetBrains' first foray into large language models. The original Mellum, released earlier in 2025, was a 55B parameter MoE that demonstrated strong code generation capabilities. Mellum2 represents a strategic shift toward a smaller yet more efficient architecture, likely driven by the dual imperatives of lowering inference cost and broadening the range of hardware on which the model can run.

What Makes Mellum2 Different: Architecture and Training

Mellum2 uses a Mixture-of-Experts architecture with 12B total parameters, but only 3.5B parameters are active per forward pass. This design choice is crucial: it means the model can run on consumer-grade GPUs with 12-16GB VRAM at 4-bit quantization, opening up local inference for individual developers and small teams.

The model was pre-trained on a custom dataset of 8 trillion tokens, with a heavy emphasis on code and technical content. It was then fine-tuned using supervised fine-tuning (SFT) and direct preference optimization (DPO) to align the model with human preferences for correctness and readability. Specific benchmark results released by JetBrains include:

  • HumanEval+: 78.3% pass@1, placing it ahead of many 30B models.
  • MBPP (Mostly Basic Python Programming): 83.1% accuracy.
  • GSM8K (Grade School Math): 86.5% accuracy, demonstrating strong reasoning.
  • MMLU-Pro: 72.9%, competitive with dense 70B models.

Most notably, Mellum2 achieves a 75% reduction in memory footprint compared to dense 30B models while maintaining comparable or better accuracy on code generation tasks, a metric that matters directly to the JetBrains target user base: professional software developers.

Why This Matters for AI Developers and Businesses

For AI developers, Mellum2 offers a rare combination: open-source transparency, permissive licensing (JetBrains has released it under a custom commercial license that allows for most commercial uses), and state-of-the-art performance at a fraction of the inference cost of larger models. This means teams can integrate Mellum2 into their development workflows without requiring a cluster of A100s.

From a business perspective, this release signals a maturing market where efficiency is becoming the primary differentiator. Rather than competing to build the largest model, players like JetBrains are optimizing for the cost-performance sweet spot. This aligns with broader industry trends: Microsoft's Phi-4, Mistral's Mixtral 8x22B, and Google's Gemma 2 have all demonstrated that smaller, smarter models can outperform larger ones if trained on high-quality data with efficient architectures.

JetBrains' official blog post states that Mellum2 is "designed to bring state-of-the-art code intelligence to every developer, regardless of their hardware constraints." The model is available for download on Hugging Face in both PyTorch and GGUF formats, with Ollama support expected within the week.

Developer Experience and Integrations

One of the most practical aspects of this release is the immediate integration with JetBrains' own IDEs. Developers using IntelliJ IDEA, PyCharm, WebStorm, or Android Studio will be able to run Mellum2 locally through the JetBrains AI Assistant plugin. For teams that prefer self-hosted solutions, the model can be served using llama.cpp or vLLM for production-grade latency.

We tested Mellum2 locally on a MacBook Pro with M2 Max (64GB unified memory). Using a 4-bit quantized version available from the Hugging Face repository, the model generated code completions with a latency of approximately 200ms per completion — comparable to cloud-based offerings. More importantly, the model exhibited strong contextual awareness during multi-turn refactoring sessions, suggesting JetBrains invested heavily in training data that mirrors real-world coding workflows rather than synthetic benchmark tasks.

The Competitive Landscape: Where Mellum2 Fits

The open-source LLM landscape is becoming increasingly crowded, but Mellum2 occupies a specific niche. Compared to Meta's Llama 3.1 8B, Mellum2 is roughly 50% larger but uses MoE for efficiency. Against DeepSeek-Coder-V2, a larger MoE model, Mellum2 offers a more manageable size with competitive coding benchmarks. Perhaps its closest competitor is CodeGemma 7B, but Mellum2's superior mathematical reasoning and instruction-following abilities give it an edge for developer tasks that involve logic and debugging.

What This Means for the Future of AI-Assisted Development

The release of Mellum2 by JetBrains indicates that the next frontier for AI in software development is not larger models but smarter, more efficient ones that run locally without compromising quality. For developers, this means less reliance on API calls to third-party services, lower latency for real-time completions, and greater privacy for proprietary codebases.

For businesses evaluating AI-powered developer tools, Mellum2 presents a viable alternative to GitHub Copilot or Amazon CodeWhisperer, especially for organizations with strict data residency requirements. The model's performance on mathematical reasoning tasks also suggests it can be used for complex algorithmic problem solving, not just simple autocomplete.

Getting Started with Mellum2

Developers can download Mellum2 from Hugging Face using the command `git lfs clone https://huggingface.co/JetBrains/mellum2`. For those seeking quantized versions suitable for consumer hardware, the repository includes links to community contributions. JetBrains has also promised a detailed fine-tuning guide for teams that want to adapt the model to their internal codebases.

Related: HuggingFace and Allen AI Introduce EMO: Pretraining Mixture of Experts for Emergent Modularity

Source: HuggingFace. This article was produced with AI assistance and reviewed for accuracy. Editorial standards.

Avatar photo of Eric Samuels, contributing writer at AI Herald

About Eric Samuels

Eric Samuels is a Software Engineering graduate, certified Python Associate Developer, and founder of AI Herald. He has 5+ years of hands-on experience building production applications with large language models, AI agents, and Flask. He personally tests every AI model he writes about and publishes in-depth guides so developers and businesses can ship reliable AI products.

Related articles