Question about report load on SQL

ryry wrote on Tuesday, May 29, 2012:

We have an OpenEMR installation that’s been running since June of last year. Here are some statistics on our database size:

openemr_postcalendar_events: 10,819 / 5.1MB
form_encounter: 8,881 / 1.8MB
forms: 43,482 / 5.6MB
patient_data: 4,620 / 1.4MB
users: 37 / 8.8KB

The report in question is “interface/reports/appt_encounter_report.php” or Reports > Visits > Appt-Enc

It takes up 99% of our SQL server CPUs when we run it. We’ve also installed a vanilla OpenEMR installation on its own CentOS machine outside of our server cluster to be sure and the result is the same:

http://webdev.secured-domain.net/chart1.png - Load averages
http://webdev.secured-domain.net/chart2.png - CPU utilization
http://webdev.secured-domain.net/chart3.png - mysqld process

This high load makes all other operations on that same OpenEMR installation slow to a crawl while the report is running.

The report does eventually finish in about 40 minutes.
                                                              
How do other users of OpenEMR deal with this? Are users told to run the report after hours to avoid disabling the OpenEMR during the day? Are users told they can only run these reports with a sufficiently small amount of data?

Brady has mentioned that the query can be optimized, but how should it be optimized? Why wouldn’t it be optimized to begin with?

It seems to us that reports like these don’t scale and we’ll have to tell our clients they can no longer use them or have a separate, dedicated piece of hardware purely for generating reports.

bradymiller wrote on Thursday, May 31, 2012:

Hi ryry,

One more quick question. How long is the report output?

Also, here’s the query for others to see:

  // MySQL doesn't grok full outer joins so we do it the hard way.
  //
  $query = "( " .
   "SELECT " .
   "e.pc_eventDate, e.pc_startTime, " .
   "fe.encounter, fe.date AS encdate, " .
   "f.authorized, " .
   "p.fname, p.lname, p.pid, p.pubpid, " .
   "CONCAT( u.lname, ', ', u.fname ) AS docname " .
   "FROM openemr_postcalendar_events AS e " .
   "LEFT OUTER JOIN form_encounter AS fe " .
   "ON LEFT(fe.date, 10) = e.pc_eventDate AND fe.pid = e.pc_pid " .
   "LEFT OUTER JOIN forms AS f ON f.encounter = fe.encounter AND f.formdir = 'newpatient' " .
   "LEFT OUTER JOIN patient_data AS p ON p.pid = e.pc_pid " .
   // "LEFT OUTER JOIN users AS u ON BINARY u.username = BINARY f.user WHERE ";
   "LEFT OUTER JOIN users AS u ON u.id = fe.provider_id WHERE ";
  if ($form_to_date) {
   $query .= "e.pc_eventDate >= '$form_from_date' AND e.pc_eventDate <= '$form_to_date' ";
  } else {
   $query .= "e.pc_eventDate = '$form_from_date' ";
  }
  if ($form_facility !== '') {
   $query .= "AND e.pc_facility = '$form_facility' ";
  }
  // $query .= "AND ( e.pc_catid = 5 OR e.pc_catid = 9 OR e.pc_catid = 10 ) " .
  $query .= "AND e.pc_pid != '' AND e.pc_apptstatus != '?' " .
   ") UNION ( " .
   "SELECT " .
   "e.pc_eventDate, e.pc_startTime, " .
   "fe.encounter, fe.date AS encdate, " .
   "f.authorized, " .
   "p.fname, p.lname, p.pid, p.pubpid, " .
   "CONCAT( u.lname, ', ', u.fname ) AS docname " .
   "FROM form_encounter AS fe " .
   "LEFT OUTER JOIN openemr_postcalendar_events AS e " .
   "ON LEFT(fe.date, 10) = e.pc_eventDate AND fe.pid = e.pc_pid AND " .
   // "( e.pc_catid = 5 OR e.pc_catid = 9 OR e.pc_catid = 10 ) " .
   "e.pc_pid != '' AND e.pc_apptstatus != '?' " .
   "LEFT OUTER JOIN forms AS f ON f.encounter = fe.encounter AND f.formdir = 'newpatient' " .
   "LEFT OUTER JOIN patient_data AS p ON p.pid = fe.pid " .
   // "LEFT OUTER JOIN users AS u ON BINARY u.username = BINARY f.user WHERE ";
   "LEFT OUTER JOIN users AS u ON u.id = fe.provider_id WHERE ";
  if ($form_to_date) {
   // $query .= "LEFT(fe.date, 10) >= '$form_from_date' AND LEFT(fe.date, 10) <= '$form_to_date' ";
   $query .= "fe.date >= '$form_from_date 00:00:00' AND fe.date <= '$form_to_date 23:59:59' ";
  } else {
   // $query .= "LEFT(fe.date, 10) = '$form_from_date' ";
   $query .= "fe.date >= '$form_from_date 00:00:00' AND fe.date <= '$form_from_date 23:59:59' ";
  }
  if ($form_facility !== '') {
   $query .= "AND fe.facility_id = '$form_facility' ";
  }
  $query .= ") ORDER BY docname, pc_eventDate, pc_startTime";
  $res = sqlStatement($query);

-brady
OpenEMR

bradymiller wrote on Thursday, May 31, 2012:

BTW,
github “blames” both cfapress and sunsetsystems for the above sql query :slight_smile:

sunsetsystems wrote on Thursday, May 31, 2012:

I suggest doing 3 things:

1. Change the two instances of “ON LEFT(fe.date, 10) = e.pc_eventDate” to “ON fe.date = e.pc_eventDate”.

2. Add an index on “pid” to the “forms” table if it is not already there.

3. Add an index on “pc_eventDate” to the “openemr_postcalendar_events” table.

Rod
www.sunsetsystems.com

ryry wrote on Monday, June 04, 2012:

Hey Rod, Brady:

1. These indexes were already present on those two fields: pid to forms / pc_eventDate to openemr_postcalendar_events

2. Removing those left joins (resulting query at the bottom), for a period of one year:

Without left joins (new query):

18247 rows in set (39 min 5.89 sec)

With left joins (original query):

9542 rows in set (39 min 16.17 sec)

+0.44% speed increase, but the number of rows is different. I doubt there will be any optimization other than removing large chunks of the query.

3. But my overall question is how do you deal with something like this that takes 40 minutes? Do we just have too much data to be able to use the report? There isn’t much talk about it, so is everyone else finding some way of dealing with it?

sunsetsystems wrote on Monday, June 04, 2012:

Did you try my suggestion #1?  I didn’t mean to remove any joins, just change them as indicated.

Rod
www.sunsetsystems.com

mcaloon wrote on Monday, June 04, 2012:

Hello,
    I think there is an audit log that has the generated SQL query. If you can find it could you paste the actual query that ran?

Mac

bradymiller wrote on Tuesday, June 05, 2012:

Hi ryry,

Note there is no pc_eventDate index in openemr_postcalendar_events on the standard OpenEMR installation. There is a composite key that includes pc_eventDate, however that is not the same thing.

-brady
OpenEMR

sunsetsystems wrote on Tuesday, June 05, 2012:

Also with my suggested changes I would not expect any difference in the number of rows selected.

Rod
www.sunsetsystems.com

ryry wrote on Tuesday, June 05, 2012:

Brady:

Ah. Yes, it was.

New Indexes:

       Table: openemr_postcalendar_events
  Non_unique: 1
    Key_name: ped
Seq_in_index: 1
 Column_name: pc_eventDate
   Collation: A
 Cardinality: 384
    Sub_part: NULL
      Packed: NULL
        Null:
  Index_type: BTREE
     Comment:




       Table: forms
  Non_unique: 1
    Key_name: pidIndex
Seq_in_index: 1
 Column_name: pid
   Collation: A
 Cardinality: 3933
    Sub_part: NULL
      Packed: NULL
        Null: YES
  Index_type: BTREE
     Comment:

Rod:

Sorry, my fault. Didn’t remove those joins properly. The new query is:

( SELECT e.pc_eventDate, e.pc_startTime, fe.encounter, fe.date AS encdate, f.authorized, p.fname, p.lname, p.pid, p.pubpid, CONCAT( u.lname, ', ', u.fname ) AS docname FROM openemr_postcalendar_events AS e LEFT OUTER JOIN form_encounter AS fe ON fe.date = e.pc_eventDate AND fe.pid = e.pc_pid LEFT OUTER JOIN forms AS f ON f.encounter = fe.encounter AND f.formdir = 'newpatient' LEFT OUTER JOIN patient_data AS p ON p.pid = e.pc_pid LEFT OUTER JOIN users AS u ON u.id = fe.provider_id WHERE e.pc_eventDate >= '2011-05-24' AND e.pc_eventDate <= '2012-05-24' AND e.pc_pid != '' AND e.pc_apptstatus != '?' ) UNION ( SELECT e.pc_eventDate, e.pc_startTime, fe.encounter, fe.date AS encdate, f.authorized, p.fname, p.lname, p.pid, p.pubpid, CONCAT( u.lname, ', ', u.fname ) AS docname FROM form_encounter AS fe LEFT OUTER JOIN openemr_postcalendar_events AS e ON fe.date = e.pc_eventDate AND fe.pid = e.pc_pid AND e.pc_pid != '' AND e.pc_apptstatus != '?' LEFT OUTER JOIN forms AS f ON f.encounter = fe.encounter AND f.formdir = 'newpatient' LEFT OUTER JOIN patient_data AS p ON p.pid = fe.pid LEFT OUTER JOIN users AS u ON u.id = fe.provider_id WHERE fe.date >= '2011-05-24 00:00:00' AND fe.date <= '2012-05-24 23:59:59' ) ORDER BY docname, pc_eventDate, pc_startTime;

Or, formatted:

(SELECT  e.pc_eventDate,
         e.pc_startTime,
         fe.encounter,
         fe.date AS encdate,
         f.authorized,
         p.fname,
         p.lname,
         p.pid,
         p.pubpid,
         CONCAT(u.lname, ', ', u.fname) AS docname
 FROM    openemr_postcalendar_events AS e
         LEFT OUTER JOIN
         form_encounter AS fe
         ON fe.date = e.pc_eventDate
            AND fe.pid = e.pc_pid
         LEFT OUTER JOIN
         forms AS f
         ON f.encounter = fe.encounter
            AND f.formdir = 'newpatient'
         LEFT OUTER JOIN
         patient_data AS p
         ON p.pid = e.pc_pid
         LEFT OUTER JOIN
         users AS u
         ON u.id = fe.provider_id
 WHERE   e.pc_eventDate >= '2011-05-24'
         AND e.pc_eventDate <= '2012-05-24'
         AND e.pc_pid != ''
         AND e.pc_apptstatus != '?')
UNION
(SELECT  e.pc_eventDate,
         e.pc_startTime,
         fe.encounter,
         fe.date AS encdate,
         f.authorized,
         p.fname,
         p.lname,
         p.pid,
         p.pubpid,
         CONCAT(u.lname, ', ', u.fname) AS docname
 FROM    form_encounter AS fe
         LEFT OUTER JOIN
         openemr_postcalendar_events AS e
         ON fe.date = e.pc_eventDate
            AND fe.pid = e.pc_pid
            AND e.pc_pid != ''
            AND e.pc_apptstatus != '?'
         LEFT OUTER JOIN
         forms AS f
         ON f.encounter = fe.encounter
            AND f.formdir = 'newpatient'
         LEFT OUTER JOIN
         patient_data AS p
         ON p.pid = fe.pid
         LEFT OUTER JOIN
         users AS u
         ON u.id = fe.provider_id
 WHERE   fe.date >= '2011-05-24 00:00:00'
         AND fe.date <= '2012-05-24 23:59:59')
ORDER BY docname, pc_eventDate, pc_startTime;  

Result: 18247 rows in set (33 min 40.77 sec)

bradymiller wrote on Tuesday, June 05, 2012:

Hi,

See what you get when running the query through the EXPLAIN command:
http://dev.mysql.com/doc/refman/5.0/en/using-explain.html

Also, to confirm, did you need to add the pc_eventDate key (or is it already in OpenEMR)?

-brady
OpenEMR

sunsetsystems wrote on Tuesday, June 05, 2012:

LEFT OUTER JOIN forms AS f ON f.encounter = fe.encounter AND f.formdir = ‘newpatient’

This is not good - the forms table has no index on encounter.  You could either create one, or change BOTH instances of this join to:

LEFT OUTER JOIN forms AS f ON f.pid = fe.pid AND f.encounter = fe.encounter AND f.formdir = 'newpatient' 

Rod
www.sunsetsystems.com

ryry wrote on Thursday, June 07, 2012:

Hello,

I needed to add an index to pc_eventDate (openemr_postcalendar_event) and encounter (forms).

New result: 18247 rows in set (2.93 sec)

I am astonished at the difference. Are other reports in OpenEMR similarly not optimized?

sunsetsystems wrote on Thursday, June 07, 2012:

Thanks for reporting the test results.  I’ll work up a corresponding fix.

Rod
www.sunsetsystems.com

sunsetsystems wrote on Thursday, June 07, 2012:

Done and pushed to SF.

Rod
www.sunsetsystems.com

sunsetsystems wrote on Friday, June 08, 2012:

Also applied a second fix to remove two now-obsolete indexes, and to make such removals easier in the future.  Thanks to Kevin and Brady for that suggestion.

Rod
www.sunsetsystems.com

bradymiller wrote on Friday, June 08, 2012:

thanks Rod :slight_smile:
-brady