Healthcare Data Projects & Case Studies
Explore my work in healthcare data quality, HIPAA compliance, provider data deduplication, and healthcare analytics.
HIPAA De-Identification Pipeline
SituationHealthcare organizations need to analyze patient data for operational insights, but strict HIPAA regulations prevent the use of identifiable patient information (PHI/PII) outside of production systems.
TaskBuild a compliant, reusable de-identification solution that allows healthcare data to be safely used for analytics while meeting HIPAA Safe Harbor standards.
Action- Designed end-to-end de-identification pipeline using PostgreSQL and Python (Faker library)
- Implemented automated PHI/PII detection across 18 HIPAA identifiers
- Built role-based access controls to manage who can view original vs. de-identified data
- Created compliance scoring system to validate Safe Harbor requirements
- Developed user-friendly Streamlit interface for non-technical users
- Documented full methodology and compliance validation process
- Produced production-ready toolkit that ensures HIPAA Safe Harbor compliance
- Reduced de-identification time from manual (hours) to automated (minutes)
- Created reusable solution applicable across any healthcare organization
- Published open-source project demonstrating healthcare data governance expertise
Note: The demo app may take a few seconds to load if inactive. For production use with real patient data, please download the toolkit from GitHub and run it locally in your secure environment.
Slump Dog Sluggers — Philadelphia Phillies Performance Analysis
SituationBaseball analytics often assume age-related decline follows predictable patterns, but real-world player performance can be more nuanced. I wanted to test whether age actually correlates with performance decline across a full season.
TaskBuild an analytical framework to track individual player performance over time and identify whether age-related patterns exist in the data.
Action- Collected and cleaned player performance data for the 2024–2025 Phillies seasons
- Designed multi-timeframe rolling average analysis (7-game, 14-game, 30-game windows)
- Built interactive Power BI dashboard to visualize performance trends by player and age group
- Applied statistical analysis to test age-decline hypotheses
- Currently rebuilding in Tableau to add interactive parameter controls for dynamic timeframe selection
- Identified that performance patterns were more complex than simple age-based decline
- Discovered that slumps and hot streaks showed stronger correlation to external factors than age
- Created analytical framework applicable to any sports performance analysis
- Demonstrated ability to challenge assumptions with data-driven insights
Provider Data Deduplication Case Study
SituationDuplicate physician records from inconsistent data entry across multiple systems were causing patients to appear in wrong provider queues, disrupting care coordination and creating operational bottlenecks.
TaskIdentify the root cause of duplicate records, quantify the scope of the problem, and develop both immediate workarounds and long-term solutions.
Action- Mapped physician data tables across three inbound data sources (CSV, Excel, pipe-delimited files)
- Analyzed patterns in naming conventions, misspellings, and location-change duplicates
- Built SQL-based deduplication logic using temp tables as immediate workaround
- Documented root causes and presented findings to cross-functional team (clinical, IT, operations)
- Led remediation project to implement permanent fixes
- Restored accurate patient-provider matching for 1,200+ affected records
- Eliminated queue assignment failures that were delaying patient care
- Identified and documented issue 3 months before it escalated to crisis level
- Solution was production-ready when executive leadership escalated the problem
Additional Work
Workout Wednesday Challenges
Building technical skills through weekly Tableau visualization challenges that require creative problem-solving, dashboard design, and advanced techniques.
View my solutions on Tableau Public.
All project code and documentation available on GitHub. For consulting inquiries or custom analytics solutions, visit my Services page.