Multi-Modal Evidence Review — VLM Claim Adjudication
Multi-Modal Evidence Review — VLM Claim Adjudication
A two-stage vision-language agent that adjudicates insurance damage claims from images, cross-referencing the claim conversation and user history. A channel-separation architecture makes prompt injection structurally impossible in the judgment stage — validated by an adversarial pixel-overlay + EXIF attack suite. Its claim_status accuracy (70%) sits a deliberate 5 points below the single-pass baseline’s 75%: the baseline inflates that figure by over-flagging (90% recall at only 49% precision), while the shipped config trades it for far better-calibrated structural fields (severity +45 pp, valid-image +35 pp). Ranked 16 / 1,773 at HackerRank Orchestrate, June 2026.