Monday, December 22, 2025

Workflow Experiment

 

Using Multiple AI Platforms to Research Clinical Oncology Guidelines: A Workflow Experiment

The Challenge

When researching complex medical topics—particularly treatment protocols that oncologists rely on for patient care—the stakes for accuracy are high. General summaries won't cut it. You need primary sources, clinical trial data, guideline documents with version numbers, and the kind of granular detail that separates casual health information from actionable clinical intelligence.

I recently needed exactly this kind of deep-dive for follicular lymphoma (FL) maintenance therapy—the treatment protocols that follow initial chemotherapy to keep the cancer in remission. The question wasn't just "what drugs are used?" but rather: What do the major international guidelines actually recommend? What clinical trials established these standards? What are the specific dosing schedules, durations, and evidence levels behind each recommendation?

The Workflow

Step 1: Starting Point

The process began with a primary resource article outlining FL maintenance recommendations based on clinical research. Rather than manually parsing through the document, I opened the Claude Chrome extension and asked for a summary—a quick orientation to the landscape before diving deeper.

Step 2: Generating a Research Prompt

Here's where things got interesting. Instead of crafting my own research queries from scratch, I asked Claude to generate a comprehensive prompt specifically designed to return detailed, source-documented information about FL maintenance therapy. The goal was a prompt that would surface the kind of evidence-based recommendations clinical oncologists actually use when counseling patients.

Claude produced a structured research prompt targeting:

  • International clinical practice guidelines (NCCN, ESMO, ASH, German S3)
  • Landmark clinical trials (GALLIUM, PRIMA, RESORT, GADOLIN)
  • Specific version numbers, publication dates, and direct links
  • Dosing schedules, durations, and evidence gradings
  • MRD (minimal residual disease) considerations
  • Toxicity profiles and patient selection criteria

Step 3: The Multi-Platform Approach

Rather than relying on a single AI's interpretation, I presented the same research prompt to four different platforms:

  • Claude (Anthropic)
  • Gemini (Google)
  • Perplexity (with real-time web search)
  • ChatGPT (OpenAI)

The reasoning was straightforward: each platform has different training data, different search capabilities, and different tendencies in how they synthesize medical information. Cross-referencing multiple outputs would reveal both consensus findings and platform-specific gaps.

What Emerged

The results were remarkably comprehensive—and instructively varied in their approaches.

Consensus findings across all platforms:

  • Two-year anti-CD20 maintenance (rituximab or obinutuzumab) is the established standard following successful immunochemotherapy
  • The PRIMA trial established rituximab maintenance, showing median PFS of 10.5 years versus 4.1 years with observation
  • The GALLIUM trial demonstrated obinutuzumab's superiority for PFS (7-year PFS 63.4% vs 55.7%)
  • Despite significant PFS benefits, no overall survival advantage has been demonstrated
  • The "bendamustine penalty"—higher infection rates when maintenance follows bendamustine induction—is now recognized across guidelines

Platform-specific strengths:

Perplexity excelled at providing direct URLs and real-time verification of current guideline versions. Its numbered citation system made source-tracking straightforward.

Claude produced the most structured clinical decision framework, including step-by-step algorithms for patient selection and specific guidance on when maintenance may not be appropriate.

Gemini provided strong narrative context on the biological rationale for maintenance and the emerging role of bispecific antibodies that may eventually change the paradigm.

ChatGPT delivered comprehensive trial data tables and cost-effectiveness analyses, including specific QALY calculations and budget impact assessments.

The Practical Takeaway

This workflow demonstrated something important about using AI for serious medical research: no single platform tells the complete story, but the combination produces something closer to comprehensive.

The prompt engineering step proved crucial. A generic question like "tell me about follicular lymphoma treatment" returns generic information. A structured prompt requesting specific guideline documents, trial names, version numbers, and evidence levels forces the AI to surface—or acknowledge it cannot find—the precise data needed.

For anyone researching complex medical topics:

  1. Start with orientation — Use AI to summarize your initial source material and identify what you don't know
  2. Engineer your prompt — Ask an AI to help you construct a research prompt targeting exactly the depth and specificity you need
  3. Cross-reference platforms — Different AI systems have different strengths; use them complementarily
  4. Verify primary sources — The AI outputs point you toward the documents; always verify critical information against the original sources

The four research outputs now provide a working reference for FL maintenance therapy that covers international guidelines, landmark trials, dosing protocols, toxicity considerations, and emerging treatment paradigms—all with traceable citations to primary literature.


This experiment in multi-platform AI research was conducted in December 2024. Medical treatment recommendations evolve; always consult current guidelines and qualified healthcare providers for patient care decisions.

No comments:

Post a Comment

AI Influence

  What follows was influenced and aided by my local AI. The concept fascinates me, hence the reason for the extra blog, and the reason for c...