yehster wrote on Friday, September 14, 2012:
AMC is a critical component to achieving meaningful use not just when an eligible provider(EP) is ready to attest, but also on a routine basis to determine gaps between current usage and MU goals. Ideally one would be able to run the AMC reports quickly and easily without the need to wait until “off-hours” for fear of impacting user. This is an achievable goal, but not without some significant development work.
I suspect that implementing this feature would be beneficial to many, but given its large scope it seems unlikely that any single sponsor would be willing to pay for such a project in its entirety. Therefore I’d like to see if the OpenEMR community would be willing to “crowdsource” my efforts to optimize the reporting components of OpenEMR’s CDR engine.
The current implementation has a worse than linear performance with respect to datasize. The amount of time it takes to complete a given report scales with both the number of patients and the size of the records for each patient. It would be an interesting academic exercise to formally characterize the scalability of the current approach, but it’s pretty clear based on some initial profiling that the current approach is less than ideal: thousands of separate queries per patient per reporting metric. Many of these queries are redundant as they scan the same tables repeatedly when it’s probably possible to retrieve the same data from MySQL for all patients at once instead of one of patient at a time. A 100X performance gain will be the initial target. Once that target is met or if it turns out the desired improvement isn’t possible through incremental improvements, I’ll look into an approach that uses aggregate queries. Such an approach would generate the numerator and denominator in two queries per AMC/CQM rule; one each for the numerator and denominator. The disadvantage of this technique is that defining new measures (for future meaningful use rules and additional clinical quality criteria) will require more technical knowledge of SQL and the database schema than the current approach. However, I’m confident that results could be generated in seconds rather than minutes (or hours) even with large datasets since the bulk of the work will be done by MySQL rather than in PHP. MySQL is meant to process large quantities of data at a time where as PHP is optimized for processing HTML.
In addition to the algorithmic improvements, my intention is to also develop a set of tools which allows for simple “offline” processing for heavily utilized systems. These tools would include a script which dumps the relevant data from the production MySQL server and loads it to an analysis server for further processing. Hopefully I can improve things enough such that it’s possible to just run the reports live, but if not, getting daily reports would still be possible by doing the analysis on a machine that’s distinct from the machine folks use to get work done. People should be generating daily MySQL dumps as part of their backup processes anyway.
I will track my progress continuously in Github in the interest of transparency to supporters, but also because it’s part of my normal development process anyway.
Anyway, if people are supportive of this idea, I am going to setup a Kickstarter project and see what kind of response there is. I’m still considering what would be appropriate funding tiers from interested parties. I think what I may propose is that folks who contribute at higher levels will be able to get direct technical assistance from me even prior to the completion of the project. This way people who are trying to meet deadlines for 2012 meaningful use and are bottle-necked on ACM issues can get help before the end of the year.
P.S. The batch processing I implemented before is only for Patient Reminders. It won’t do anything to improve AMC/CQM and that approach really isn’t appropriate for reporting. For reminders, each batch is independent of the others. If something goes wrong for a given batch, it’s not a big deal to just process the patients in a failed batch again. If something happens while trying to calculate an AMC or CQM rule, the overall results will be incorrect unless there are error handling/retry mechanisms, and I am of the opinion that resources would be better spent making the CDR engine better in general than trying to account for the added complexities of a batch reporting process. When broken into batches, the total number of queries executed will still remain the same even though total time may decrease (and the ability to display continuous progress to the user would certainly be re-assuring). However, it’ll be hard to test large datasets and truly be confident about the overall results.
kevin.y@integralemr.com