Intro
Predicting Early Hospital Readmissions in Diabetic Patients
Every time a patient ends up back in the hospital within 30 days of being discharged, it’s not just stressful — it’s expensive, and often preventable. For people living with diabetes, things like unstable blood sugar, medication changes, or lack of proper follow-up care can make readmission more likely.
In this project, I worked with real hospital data from 130 U.S. facilities, covering over 10 years of patient visits.
The goal is to find patterns that help explain why some diabetic patients are readmitted so quickly — and what signals we might use to catch those risks early.
Project Overview
-
My Task
The Hospital Management Team has tasked us with reducing 30-day readmission rates for diabetic patients as part of a broader goal to cut hospital costs by 8% this year.
The team asked me to identify key risk factors and intervention opportunities.
-
Why Hospitals Care About 30-Day Readmissions?
- High readmission rates hurt hospitals — financially and reputationally.
- Readmissions also affect public rankings, accreditation, and patient trust.
- Hospitals want more patients, not repeat visits — doing it right the first time pays off.
-
My Approach
I applied a full ETL and data analysis pipeline to uncover patterns in diabetic patient visits:
- Cleaned and transformed data across 50+ features
- Identified high-risk patient segments
- Proposed data-driven to target preventable readmissions
Brainstorm Data Questions
-
Question 1
What patient characteristics are most strongly associated with 30-day readmission among diabetic patients?
-
Question 2
Are there specific combinations of treatments, discharge plans, or follow-up care gaps that correlate with higher readmission rates?
-
Question 3
Are some admission types or discharge plans consistently better at preventing readmission for similar patients?”
To get started!
ETL Workflow
Extract patient data from an open-source healthcare dataset.
- diabetic_data.csv Contains patient-level hospital encounters including demographics, diagnoses, treatments, lab procedures, and readmission outcomes.
- IDS_mapping.csv Provides mapping for coded variables such as admission type, discharge disposition, and admission source.
- Hospital_General_Information.csv Contains facility-level information, including hospital ratings and service offerings (not directly linked but analyzed for context).
ETL Workflow
Transform the raw data by cleaning and shaping it into a structured format suitable for further exploration
- Replaced inconsistent missing markers (?, "Not Available", "Not Mapped", "Unknown/Invalid") with NaN for clarity and consistency.
- Dropped the weight column due to excessive missingness (>95%)
- Retained columns with sufficient completeness (>90%) for analysis.
- Mapped coded variables like admission_type_id, discharge_disposition_id, and admission_source_id using IDS_mapping.csv to make categories interpretable.
- Bucketed ICD-9 codes from diag_1 into high-level diagnosis groups (e.g., Diabetes, Heart Failure, COPD) to identify common clinical themes in early readmissions.
ETL Workflow
Load the prepared dataset into my analysis environment
The cleaned datasets were loaded into a local SQLite database (healthcare_project.db) using SQLAlchemy. This enables fast querying using SQL for analysis and joins.
Tables created:
- diabetic_data (main patient dataset)
- ids_mapping (lookup table)
- hospital_info (hospital context)
Now I am ready to tie back and answering above 3 questions
Data Question 1 : What patient characteristics are most strongly associated with 30-day readmission among diabetic patients?
Patients with prior inpatient admissions, multiple diagnoses, emergency visits, and unstable insulin patterns are at the highest risk of early readmission.
So what ? This means hospitals can move from reactive care to proactive risk prevention by flagging patients based on a small set of historical and clinical features already available at discharge.
Data Question 2 : Are there specific combinations of treatments, discharge plans, or follow-up care gaps that correlate with higher readmission rates?
The combination of insulin dosage changes (either Down or Up) and certain discharge dispositions (especially home health care and short-term hospitals) is strongly associated with significantly higher readmission rates.
So what? Discharging diabetic patients to home care after an insulin adjustment is risky.
⚠️ Patients discharged with unknown or unmapped discharge status have a very high risk of readmission, even without insulin treatment. This is not just a data cleanup issue — it's a risk flag in itself.
So What ? These patients may: (A) Not be properly referred to follow-up(B) Have unclear post-discharge plans (C) Fall into system gaps (especially uninsured or marginalized patients)
Patients who experienced medication changes and were admitted through emergency, urgent, or elective care had consistently higher readmission rates (~11.7–12.1%) than those without med changes in the same categories.
So What? (A) These changes likely reflect underlying instability in disease management (B) Patients might not know how to take their new medications properly, may feel worse because of side effects, or simply forget to take them at all.
Data Question 3: Are some admission types or discharge plans consistently better at preventing readmission for similar patients?”
Disposition ID 22 (Home Health Care) Patients discharged here, especially with insulin status = Down, had the highest readmission rate — almost 40%.
Disposition ID 28 (Unknown). Patients discharged with “unknown” status had unusually high readmission, especially for insulin = No.
IDs 1, 2, 3, 5 (Home, Transfer, Short-term). More balanced, but still show higher readmission when insulin is changed (Down or Up)
So What? Patients sent home with a recent insulin reduction had the highest chance of readmission. But even when no insulin was used, if the discharge plan was missing or vague (like Disposition 28), readmission risk still spiked.That tells us the problem isn’t just the patient’s health — it’s what we do at discharge.