Building a Modern ETL / Analytics Pipeline for OpenEMR — Feedback Wanted

Hi OpenEMR Community,

My name is Ashar Ali, and I’m a Data Engineer exploring ways to build a modern ETL/analytics pipeline for OpenEMR.

From my research, I understand that while OpenEMR captures rich clinical data, there isn’t currently a fully-featured, production-grade pipeline that can:

  1. Extract data from OpenEMR databases safely

  2. Transform/clean/normalize the data for analytics

  3. Load it into a warehouse or analytics-ready schema

  4. Support monitoring, logging, and scheduling for repeatable runs

I want to develop a pipeline that is modular, secure, and usable by hospitals, even with local deployments. Before starting, I’d love to get feedback from the community:

  • Are these the main challenges hospitals face regarding analytics and reporting?

  • Are there specific analytics patterns, KPIs, or reports that would be most valuable?

  • Would hospitals be open to testing such a pipeline with demo or synthetic data first?

Any insights, suggestions, or guidance from implementers, developers, or hospital IT staff would be highly appreciated. I want to make sure the solution addresses real-world needs.

Thank you for your time and guidance!

1 Like