A modern ETL/analytics pipeline for OpenEMR

Hello Ashar Ali - those are all wonderful goals and projects. I can tell you that most of those tools would be v useful to a dev team in a large healthcare organization or a vendor setting up large institutions. However, you might want to familiarize yourself with the target audience served by OpenEMR: small outpatient healthcare practices.

I’ve taken the liberty of re-posting your DM to the OpenEMR forum on the off chance that some vendor or dev would like to respond.
Good luck with your project!

  • Harley

    Original post:

Hi Sir,

My name is Ashar Ali, and I’m a Data Engineer exploring ways to build a modern ETL/analytics pipeline for OpenEMR.

From my research, I understand that while OpenEMR captures rich clinical data, there isn’t currently a fully-featured, production-grade pipeline that can:

Extract data from OpenEMR databases safely
Transform/clean/normalize the data for analytics
Load it into a warehouse or analytics-ready schema
Support monitoring, logging, and scheduling for repeatable runs
I want to develop a pipeline that is modular, secure, and usable by hospitals, even with local deployments. Before starting, I’d love to get feedback from the community:

Are these the main challenges hospitals face regarding analytics and reporting?
Are there specific analytics patterns, KPIs, or reports that would be most valuable?
Would hospitals be open to testing such a pipeline with demo or synthetic data first?
Any insights, suggestions, or guidance from implementers, developers, or hospital IT staff would be highly appreciated. I want to make sure the solution addresses real-world needs.

Thank you for your time and guidance!

1 Like

Hi Harley @htuck @stephenwaite @adunsulag @brady.miller ,

Thank you for the thoughtful feedback and for sharing my post on the forum — I really appreciate it.

Your point about OpenEMR primarily serving small outpatient practices is very helpful, and it definitely changes how I’m thinking about this project. Instead of building a heavy, enterprise-style ETL pipeline, I’m now considering a more lightweight and practical approach tailored for smaller clinics.

I’d love to get input from the community on this adjusted direction:

  • What are the most important reports or KPIs small practices actually rely on day-to-day?

  • Are there any existing gaps in OpenEMR reporting that users frequently face?

  • Would a lightweight, easy-to-install analytics add-on be valuable in your workflows?

Thanks again for the guidance — I’m looking forward to learning from everyone here and building something genuinely useful for the OpenEMR community.

2 Likes

Hi @Ashar_Ali

Alright, it looks like you got yourself a forum account so can now post directly, that’s good.

Re-reading the posts I see a couple more things I could mention.

I’m a Customer Service kinda guy not a dev, but I have an IT degree and have helped our devs with several EMR migrations and ETLs. The scope of OpenEMR’s typical ETL project being what it is, I found myself developing short special- purpose scripts- bash and perl- to accomplish specific tasks for a particular migration. I don’t know (as in, do not have the experience) how helpful a comprehensive multi-purpose suite of tools would be- I suppose if you’re doing this all the time, and the tools were flexible enough to handle all the variations one sees in customer environments, it would be v cool to have.

Also- not sure if you’re aware of the FOSS nature of the OpenEMR project. If your intentions are to donate that tool suite to the OpenEMR Community you’ll be heroes. If you hope to monetize it, I’m not familiar with the different schemes to derive income from FOSS.

Good luck with your project, and I hope you get a lot of useful feedback!

  • Harley

Thank you for the thoughtful feedback, Harley I really appreciate you sharing your practical experience with ETL work in OpenEMR.

I understand your point about most migrations being highly specific and often handled through small, purpose-built scripts. My goal with this project is to explore whether a more flexible and reusable approach can still add value across different environments, without overcomplicating the typical workflow.

Regarding the open-source nature of OpenEMR, I want to clarify that my intention is to contribute to the community. I’m currently developing a solution and will share it here once it reaches a usable stage. I’d be happy if it proves useful enough to be considered for integration, and otherwise I can maintain it as an external tool for those who may benefit from it.

I’m mainly here to learn, contribute, and collaborate with the community.

Thanks again for your insights.

1 Like