Rel-411 CQM and AMC report fail (memory)

tmccormi wrote on Monday, September 10, 2012:

I thought that this had been changed so it would not try and load the whole world?  This database has 45,000 patients.

 PHP Fatal error:  Allowed memory size of 67108864 bytes exhausted (tried to allocate 71 bytes) in /opt/www/vhosts/demo411.mi-squared.com/emr_demo411_41_prod/openemr/library/adodb/adodb.inc.php on line 969, referer: https://demo411.mi-squared.com/openemr/interface/reports/cqm.php?type=cqm
[Mon Sep 10 14:47:36 2012] [error] [client 24.20.23.116] PHP Fatal error:  Allowed memory size of 67108864 bytes exhausted (tried to allocate 64 bytes) in /opt/www/vhosts/demo411.mi-squared.com/emr_demo411_41_prod/openemr/library/adodb/adodb.inc.php on line 969, referer: https://demo411.mi-squared.com/openemr/interface/reports/cqm.php?type=amc

bradymiller wrote on Tuesday, September 11, 2012:

Hi Tony,
Don’t think this issue has been ever raised. Sounds like need to isolate where the problem is a bit(as I recall, the CDR engine will query all patients(if a per-provider is not selected) and then convert this into an array; since the bug is arising in adodb sounds like the query results is too big, but this seems odd since there are likely other report queries that may return all patient. Just guessing here, so problem needs to be isolate.
-brady
OpenEMR

yehster wrote on Tuesday, September 11, 2012:

The issue has most certainly been raised in the past:
https://sourceforge.net/projects/openemr/forums/forum/202506/topic/5297491
It however hasn’t been addressed.
As I said in the previous thread, the CDR engine is fundamentally flawed with it’s approach since it generates multiple queries while processing each individual CQM/AMC rule for each patient.
If you look at the Tony’s error message in more detail you’ll see that Adodb is only asking for an additional 71 bytes and 64 bytes after having used 64MB.  It’s not any one specific query that’s causing the problem.   The “out of memory” is the result of cumulative memory usage over the course of likely millions of queries.

bradymiller wrote on Tuesday, September 11, 2012:

Hi yehster,

Running out of memory was not reported in the thread you posted; again, don’t recall ever hearing a report that memory was was exhausted. As an aside, php setting memory should be at least 128M, so perhaps setting it to this will at least avoid Tony’s error.

Here’s a pertinent wiki page:
http://www.open-emr.org/wiki/index.php/CDR_Performance

Since nobody is stepping up to completely refactor the CDR engine, suggest seeing if my proposed fixes will help things. Note that the aggregation of diagnosis query checks into one query may open the door to what you want (in realistic resource increments) anyways without refactoring the entire CDR engine.

-brady
OpenEMR

tmccormi wrote on Tuesday, September 11, 2012:

I will bump the memory and retry it, for sure.  Memory errors always fail on asking for the next ‘chunk’ so how how it asked for when it failed is not so relevant.

I think we should pull in yehster’s work.   It will help.

-Tony

tmccormi wrote on Friday, September 14, 2012:

Note on this, we confirmed that our setting in php.ini were already at 128M…
-Tony

yehster wrote on Friday, September 14, 2012:

AMC is a critical component to achieving meaningful use not just when an eligible provider(EP) is ready to attest, but also on a routine basis to determine gaps between current usage and MU goals.  Ideally one would be able to run the AMC reports quickly and easily without the need to wait until “off-hours” for fear of impacting user.  This is an achievable goal, but not without some significant development work. 

I suspect that implementing this feature would be beneficial to many, but given its large scope it seems unlikely that any single sponsor would be willing to pay for such a project in its entirety.  Therefore I’d like to see if the OpenEMR community would be willing to “crowdsource” my efforts to optimize the reporting components of OpenEMR’s CDR engine. 

The current implementation has a worse than linear performance with respect to datasize.  The amount of time it takes to complete a given report scales with both the number of patients and the size of the records for each patient.  It would be an interesting academic exercise to formally characterize the scalability of the current approach, but it’s pretty clear based on some initial profiling that the current approach is less than ideal: thousands of separate queries per patient per reporting metric.  Many of these queries are redundant as they scan the same tables repeatedly when it’s probably possible to retrieve the same data from MySQL for all patients at once instead of one of patient at a time.  A 100X performance gain will be the initial target.  Once that target is met or if it turns out the desired improvement isn’t possible through incremental improvements, I’ll look into an approach that uses aggregate queries.  Such an approach would generate the numerator and denominator in two queries per  AMC/CQM rule; one each for the numerator and denominator.  The disadvantage of this technique is that defining new measures (for future meaningful use rules and additional clinical quality criteria) will  require more technical knowledge of SQL and the database schema than the current approach.  However, I’m confident that results could be generated in seconds rather than minutes (or hours) even with large datasets since the bulk of the work will be done by MySQL rather than in PHP.  MySQL is meant to process large quantities of data at a time where as PHP is optimized for processing HTML.

In addition to the algorithmic improvements, my intention is to also develop a set of tools which allows for simple “offline” processing for heavily utilized systems.  These tools would include a script which dumps the relevant data from the production MySQL server and loads it to an analysis server for further processing. Hopefully I can improve things enough such that it’s possible to just run the reports live, but if not, getting daily reports would still be possible by doing the analysis on a machine that’s distinct from the machine folks use to get work done.  People should be generating daily MySQL dumps as part of their backup processes anyway.

I will track my progress continuously in Github in the interest of transparency to supporters, but also because it’s part of my normal development process anyway.

Anyway, if people are supportive of this idea, I am going to setup a Kickstarter project and see what kind of response there is.  I’m still considering what would be appropriate funding tiers from interested parties.  I think what I may propose is that folks who contribute at higher levels will be able to get direct technical assistance from me even prior to the completion of the project.  This way people who are trying to meet deadlines for 2012 meaningful use and are bottle-necked on ACM issues can get help before the end of the year.

P.S. The batch processing I implemented before is only for Patient Reminders. It won’t do anything to improve AMC/CQM and that approach really isn’t appropriate for reporting.  For reminders, each batch is independent of the others.  If something goes wrong for a given batch, it’s not a big deal to just process the patients in a failed batch again.  If something happens while trying to calculate an AMC or CQM rule, the overall results will be incorrect unless there are error handling/retry mechanisms, and I am of the opinion that resources would be better spent making the CDR engine better in general than trying to account for the added complexities of a batch reporting process.  When broken into batches, the total number of queries executed will still remain the same even though total time may decrease (and the ability to display continuous progress to the user would certainly be re-assuring).  However, it’ll be hard to test large datasets and truly be confident about the overall results. 

kevin.y@integralemr.com

tmccormi wrote on Friday, September 14, 2012:

Medical Information Integration will contribute for sure.
-Tony

bradymiller wrote on Saturday, September 15, 2012:

Hi Tony,

Regarding your memory issue:
Have you tried to increase it?
About how long does it take until you see the memory error?

-brady
OpenEMR

bradymiller wrote on Saturday, September 15, 2012:

Hi,

Regarding the AMC/CQM reports performance overhaul, excited to see the interest in this and look forward to seeing your code. Although it is probably not gonna get Tony through his memory issue any time soon…

Still think one can get a lot of bang for their buck with the following short-term optimizations(in order of least amount of work):
1. Bypass the audit engine in the queries (there is already a function that will do this in sql.inc; just need to use them in the CDR/AMC/CQM engine frameworks)
2. Batching (Simply don’t buy the argument above. Just make sure the total patients tallied is near equal to the total patients that is predicted to be tallied; pretty simple check.)
3. Start to make a middle layer. For example, now the CDR/AMC/CQM engine/framework now does individual queries through each diagnosis number. This could be vastly optimized by instead creating one query in real time via a middle layer of code with all of these diagnosis checks on it. And this mechanism could then be carried over to other parts of the CDR code (for example, combining all patients in query etc.).

Important to note that it’s not just the AMC/CQM rules that are important to have quickly, but also the Standard CDR rules report, which runs from the core CDR engine (customizable rules at Administration->rules) and the patient reminders that also runs form the core CDR engine. Suggest avoid only focusing on the AMC/CQM stuff, or would still will be left with a slow patient reminders and standard reporting engine (I would argue that although the AMC/CQM is important for getting money for MU, the standard rules are where the real clinical benefits will lie in clinics (these are where physicians will place their customized rules).

-brady
OpenEMR

bradymiller wrote on Wednesday, September 19, 2012:

Hi,

Here’s code that fixes the memory error and also improves performance of the CDR engine reports and patient reminders:
http://github.com/bradymiller/openemr/commits/out-of-memoery-cdr-fix_3
(first commit)

Utilized non-ajax batching, which both drastically decreased memory usage and time of the reports. Here are approximate time of reports/reminders tested on a very simple 100,000 patient database within an appliance(ie. note this is equivalent to a rather slow cpu, so expect real server’s to be much faster):
CQM report: 1000 patients per 1 minute
Standard rules report: 1000 patients per 2 minutes
AMC report : 1000 patients per 3 minutes
Clinical reminders: 1000 patients per 6 minutes

It sounds like Tony ran his 40,000 real patient database CQM report in under 30 minutes with this algorithm, which is very cool.

Since this includes rather extensive changes, will plan/await some more testing (and any feedback) before committing this. If you try this code, let us know how it goes.

-brady
OpenEMR

bradymiller wrote on Wednesday, September 19, 2012:

Hi,

Here’s the pretty much finalized and tested code to fix the memory bug and improve the performance of the CDR engine. In addition to above the sql queries now bypass the audit engine, which actually cut the processing times listed above in half for all the reports(for those die hard whom still want to maintain auditing of this, there is an option to audit these queries in globals). Also incorporated the session_write_close() function in the reporting(and widgets), which stop the reports from freezing the browser when they are running (other users can easily login and use OpenEMR while the reports are runnning); thanks to yehster for pointing out this php function.
http://github.com/bradymiller/openemr/commits/out-of-memoery-cdr-fix_4

Will likely commit this soon and place in the next 4.1.1 patch after it gets testing in the master branch for a bit.

-brady
OpenEMR

rgenandt wrote on Wednesday, October 24, 2012:

I worked on a system that got this patch but it messed up the ID’s that were returned when visits and forms were being created. The id from the log record was coming back and being inserted into forms as the form_id. I need to gather more details on the settings yet, but just wanted to mention so others could avoid this issue -

bradymiller wrote on Wednesday, October 24, 2012:

Hi,

Sounds very familiar to a bug we had secondary to the logging/audit engine, which was thought to be fixed. What version OpenEMR are you using? What operating system? Which patch number are you using? And if you used a 4.1.1 patch, are you sure the original version was 4.1.1 (and not 4.1.0)?

Something to try is turning off the audit engine (Administration->Globals->Logging-><toggle off Enable Audit Logging>), which will see if it indeed a bug with the audit engine.

-brady
OpenEMR