SpineFairBench: A Counterfactual Benchmark for Auditing Demographic Sensitivity in Spinal Radiology VLM Reports

Ahmed Taha, Abdelrahman Taeha, Muzzammil Ahmadzada

Preprint · 2026 · Under review · Cited by 0

Author profiles — Ahmed Taha: ORCID · Google Scholar · ResearchGate · GitHub · Hugging Face

Read PDF Preprint (ResearchGate) Code Dataset

Abstract

Radiology vision–language models may change clinically meaningful report content when the same pathology is presented under different patient demographics, but observational subgroup audits cannot isolate this effect. We present SpineFairBench, a paired counterfactual benchmark for auditing demographic sensitivity in spinal radiology report generation. SpineFairBench varies apparent age and sex in source/counterfactual spinal radiographs under a target-pathology-preservation criterion. It assesses a frozen nine-model VLM panel under a locked report-generation prompt with two pre-registered primary endpoints: recommendation change rate and diagnostic-label consistency. Retained outputs demonstrate measurable recommendation drift in all nine models. In most retained models, management recommendations are less stable than diagnostic-label overlap under the same demographic edit. A pre-registered findings-first mitigation analysis with a binding interpretation rule produced a negative result on the eligible subset, supporting a predominantly perceptual rather than interpretive locus for the two models on which it could be evaluated. In blinded validation by three board-certified radiologists, clinical plausibility and target-pathology preservation were supported for 443/450 pairs (98.44%) under a 2-of-3 majority rule. Reviewers selected "Cannot tell" for 96.8% of edit-detectability responses.

Details

BibTeX

@misc{taha2026spinefairbench,
  title        = {SpineFairBench: A Counterfactual Benchmark for Auditing
                  Demographic Sensitivity in Spinal Radiology VLM Reports},
  author       = {Taha, Ahmed and Taeha, Abdelrahman and Ahmadzada, Muzzammil},
  year         = {2026},
  note         = {Preprint},
  url          = {https://ahmedtaha.io/publications/spinefairbench/}
}