Training physicians to conduct effective clinical interviews is a critical yet under-supported component of medical education. Existing patient simulators are either too rigid for natural conversation or too costly to scale. We present PatientSim, an open-source, LLM-powered patient simulator that generates realistic and behaviorally diverse patient personas grounded in real clinical data.
PatientSim builds patient profiles from the MIMIC-IV family of datasets and augments them with four behavioral persona axes—personality type, language proficiency, medical history recall, and cognitive confusion—yielding 37 unique patient combinations. We benchmark eight LLMs as the simulator backbone and select Llama 3.3 70B based on clinician-validated quality scores. Four clinicians evaluated the platform and awarded an average overall quality score of 3.89 / 4, with strong inter-rater agreement (Gwet's AC₁ > 0.85). PatientSim is privacy-compliant, reproducible, and publicly released to advance medical dialogue research and clinical training.
Combines structured real-world clinical data (MIMIC) with multi-dimensional behavioral persona axes to produce 37 distinct patient types, enabling large-scale and diverse dialogue simulation.
Benchmarks 8 state-of-the-art LLMs across factual accuracy, persona fidelity, and clinical plausibility, with both automated and expert human evaluation.
Fully open-source code and a de-identified PhysioNet dataset release, enabling reproducible benchmarks for the medical dialogue and healthcare education communities.
Patient profiles are extracted from MIMIC-IV (v3.1), MIMIC-IV-ED (v2.2), and MIMIC-IV-Note (v2.2). Each profile comprises 24 structured fields covering demographics, medical history, and ED visit details. Profiles target five dialogue-amenable diagnoses.
Four independent behavioral axes define how a patient communicates, creating 37 unique persona combinations that reflect the realistic diversity of real patients.
Eight LLMs were evaluated as the patient simulator backbone. Llama 3.3 70B was selected based on its superior persona fidelity (average score 3.68/4) and highest cognitive confusion simulation (4.0/4), validated by four clinicians at Samsung Medical Center.
Three research questions structure the evaluation, each with dedicated automated and human metrics.
4-point scale across personality consistency, language appropriateness, recall accuracy, cognitive coherence, and overall realism
Sentence-level NLI evaluation: entailment rate, contradiction rate, information coverage (ICov) and consistency (ICon)
Clinician ratings (4-point) on plausibility of statements not directly supported by clinical records
PatientSim is built on three complementary MIMIC resources, with 170 patient profiles spanning five emergency department diagnoses.
| Dataset | Version | Content |
|---|---|---|
| MIMIC-IV | v3.1 | Structured inpatient records |
| MIMIC-IV-ED | v2.2 | Emergency department records |
| MIMIC-IV-Note | v2.2 | Clinical narrative notes |
De-identified patient profiles available on PhysioNet under a credentialed access policy.
PhysioNet@inproceedings{kyung2025patientsim,
title = {PatientSim: A Persona-Driven Simulator for
Realistic Doctor-Patient Interactions},
author = {Kyung, Daeun and Chung, Hyunseung and Bae, Seongsu
and Kim, Jiho and Sohn, Jae Ho and Kim, Taerim
and Kim, Soo Kyung and Choi, Edward},
booktitle = {The Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
year = {2025}
}