Dimensional data warehouse service (EDW)

I was thinking of reporting over the weekend and started wondering if anybody is interested in a dimensional data warehouse service. Also called Enterprise Data Warehouse (EDW).

Currently it’s a half baked idea but I think once it’s flushed out it could be pretty cool for larger offices.

First I’ll explain what a dimensional data warehouse is. It’s different that your standard operational databases in that data in the database can repeat and not a bad thing. Certain tables “topics” become dimensions. An easy one would be a patient dimension. In a normal database that patient information could be organized in 5 different tables. I’m not looking at openEmr schema right now, this is an example:

  • Patient Info
  • Contact Info
  • Insurance Information

These COULD be combined into one table called DimPatient. Obviously you could have data repeat but the goal here is to make queries faster.

So anyway, you end up with a ton of Dim Tables. Next you have your “Fact” tables. These are the actual reports (or base reports). A fact table could be Provider Charges by Day. It could connect the DimProvider, DimService and then show the sum of the services for that day and other fee related informations. This could then be grouped to month and year… Just an example of how the reporting works with this design.

I was thinking it could also be used to create reports for MIPS reporting as well.

Anyway, as for the service, trying to keep the processing off of the production servers I’d write something that pushed the data to my staging tables and then to the data warehouse. Knowing what rows in the tables we care about have been inserted or updated is the roughest part.

I’m thinking the service would have a copy of the production database, and nightly a job would run and do a data compare of the 2 databases. The changed data would be inserted into a staging database and the EDW would then be updated and facts rebuilt. Then the production database is fully synced to the stage production db, using that same generated script.

Finally reports would then be viewable in openEMR either in the new module or in the reporting area. Finally, the service would provide a REST API to read those reports. Every clinic could have different reports so the api’s would have to return the available reports back to the user along with a way to know to consume them.

This would be a pretty significant project but if anybody is interested in this let me know. I’ll probably continue thinking about this and the difficulties but if anyone is interested and wants to spit ball about what needs to happen and how to make it work; reach out to me.

There would be a monthly cost for this service, EDW databases do have a lot of moving parts and maintenance involved.

I think this is a great idea, I have worked an EDW implementation before and we saw a lot of added value for customers. We implemented on Snowflake, so we ended up using change tracking and streams to identify the inserted/updated table rows. I’d be interested in discussing more about this, I’m pretty new to OpenEMR but anything around analytics is a draw for me.

Send me a PM and we can talk about what may or may not work.