Skip to main content
AI Jun 09, 2026 4 min read 23 views

PathoSage: New Agent Workflow Resolves Conflicting Evidence in AI Pathology Diagnoses

PathoSage computational pathology agentic workflow MLLM AI in healthcare evidence adjudication multimodal AI
PathoSage: New Agent Workflow Resolves Conflicting Evidence in AI Pathology Diagnoses
New agent workflow PathoSage resolves conflicting evidence in AI pathology diagnoses, reducing hallucination and contamination in multimodal LLM syste

Pathology AI's New Adjudication Layer

A research team has introduced PathoSage, an experience-aware agentic workflow that tackles one of computational pathology's most stubborn problems: how to resolve conflicting evidence when different AI tools disagree on a tissue sample. According to the paper published on arXiv (2606.07549), the system moves beyond the current paradigm of merging all tool outputs into a shared context—an approach they argue leads to 'context contamination' and unreliable conclusions.

What PathoSage Does Differently

The core innovation in PathoSage is its multi-source evidence adjudication layer. Where existing MLLMs and agentic systems treat all retrieved information as equally valid, PathoSage introduces an experience-aware module that weighs each piece of evidence based on its source reliability, tool accuracy history, and consistency with known pathological patterns. This is particularly critical for patch-level reasoning—analyzing microscopic tissue regions where subtle morphological features can make the difference between a benign and malignant diagnosis.

Why This Matters for Developers

For teams building medical AI systems, the PathoSage paper exposes a fundamental weakness in how current MLLM-based agents handle contradictory information. The researchers demonstrate that end-to-end pathology MLLMs frequently hallucinate morphological features, and even tool-augmented agent workflows fall prey to what they call 'evidence contamination'—where an incorrect output from one tool pollutes the reasoning about outputs from other tools.

PathoSage introduces a modular architecture where each evidence source is evaluated independently before a final adjudication step. This means a stain artifact detection model's output doesn't directly influence a cell morphology classifier's results in the same context window. Instead, a separate adjudication module, trained on pathologist feedback, makes the final determination.

Implications for Clinical Deployment

For healthcare enterprises evaluating AI pathology systems, PathoSage points toward a new quality standard. The paper suggests that simple model stacking or sequential tool usage—common in commercial pathology AI—may produce results that look correct on aggregate but fail on edge cases where evidence conflicts. Their approach explicitly addresses the challenge of discrepant findings, such as when a segmentation model identifies a region as suspicious but a feature extraction model fails to find definitive malignant markers.

The system's experience-aware component is trained on pathologist-provided adjudication examples, meaning the more it's used in clinical settings with expert feedback, the better it becomes at weighing evidence. This creates a practical feedback loop for continuous improvement without requiring full retraining of underlying models.

Technical Architecture Details

The PathoSage workflow operates in three phases:

  • Evidence Collection: Multiple specialized pathology tools (segmentation, feature extraction, retrieval-augmented generation) independently analyze tissue patches.
  • Evidence Evaluation: Each output is assigned a confidence score based on the tool's historical accuracy on similar cases and the consistency of its findings with known pathological patterns.
  • Adjudication: A dedicated module aggregates the scored evidence, identifies conflicts, and produces a final diagnosis with uncertainty quantification.

This contrasts with current MLLM-based agents that funnel all tool outputs into a single prompt, where logical contradictions can confuse the reasoning process. The researchers provide evidence that this contamination leads to higher false-positive rates in borderline cases.

What It Means for the Path Ahead

PathoSage won't replace existing pathology AI workflows—it's designed as an integration layer on top of them. For developers, this signals a shift from building more accurate individual models to building smarter systems that can arbitrate between existing tools. For pathology departments, it offers a path to combine best-in-class tools from different vendors without sacrificing reliability.

The paper's release on arXiv positions PathoSage as an open research framework, suggesting the authors intend for it to become a community standard for evidence adjudication. Given the accelerating pace of MLLM adoption in medical imaging, systems that can transparently handle conflicting evidence will likely become a regulatory requirement rather than a nice-to-have feature.

Source: Arxiv AI. This article was produced with AI assistance and reviewed for accuracy. Editorial standards.

Avatar photo of James Whitfield, contributing writer at AI Herald

About James Whitfield

James Whitfield is a senior software engineer with 8 years of experience building developer tools, CLI applications, and IDE extensions. He has contributed to open source projects including VS Code extensions and GitHub Actions workflows. Currently covers AI developer tools, coding assistants, and platform engineering for AI Herald.

Related articles