Throttling resources for long-running queries

aaronchantrill · October 14, 2019, 4:23pm

I have one user who wants to run a report that has some issues over a fairly large dataset. This results in the CPU usage spiking, and causing timeout errors for other OpenEMR users until the report finishes running, which can take up to 10 minutes. In the meantime, I can kill the query if someone comes and tells me, but basically the whole clinic is down for that 10 minutes. Since the average appointment is only 20 minutes long, that is a very long time for the providers.

I am addressing the issue in this case by showing the user how to limit the size of the queries and also fixing some of the index issues, but I would also like to also address the issue at a resource allocation level for the next time this happens. I would prefer for the user running the report to have to wait longer for the report, or get an error message saying that the report is taking too long to run than have the entire clinic shut down if a query is taking too long.

I’ve been searching online for suggestions for mysql, but haven’t found anything so far, other than renicing processes (which doesn’t really help in this case). Nothing at the query level.

I’d like to know if anyone has found a solution to prevent a bad query from consuming all the resources for the whole organization.

Thanks,
Aaron

aaronchantrill · October 14, 2019, 6:59pm

I have seen a couple of discussions of using threaded PHP to start a query and then kill that query if it is still processing after some time limit, which seems like it could be helpful. Does OpenEMR have that sort of capacity?

aaronchantrill · October 25, 2019, 1:45pm

Okay, for anyone else who is interested, this is how I have dealt with it. The idea comes from this post:

You simply set up a script that searches for any queries that have been running for over some time limit, then generates a script to kill them. If the resulting file is zero size, delete it, otherwise direct it into mysql as an input and leave it in the /tmp directory for analysis. It seems pretty extreme, but it works. Here is my script (set an actual username and password if needed, I am giving it 300 seconds to complete, which is probably way more than necessary for most use, but does alow monster reports to be generated):

/usr/local/bin/checkmysql.sh

#!/bin/bash

# Timestamp
DATE=$(date +"%Y%m%d%H%M")

# Credentials
USERNAME=username
PASSWORD=password

# Login
MYSQL="mysql -u$USERNAME -p$PASSWORD"

$MYSQL -N -e "select now(),id,time,info from information_schema.processlist where command='Query' and time>300 and info is not null" | cut -f 1,2,3,4 | awk '{print "# " $0 "\nKILL " $3 ";"}' > /tmp/kill.$DATE

if [ -s /tmp/kill.$DATE ]; then
        $MYSQL < /tmp/kill.$DATE
else
        rm /tmp/kill.$DATE
fi

In the original script, this was limited by database. You could also limit it by user, only killing queries by the openemr database user, but allowing people logged in directly to run queries as long as they need. It also sent an email to root whenever it killed a query. In my case, I just have to check the /tmp directory for kill.* files from time to time. You can also append the file to a log.

For the last couple of weeks, the whole server has locked up after about twenty minutes of users complaining of slow performance. After the last time, I rebuilt the whole database from mysqlbackup files to fix any errors in the indexes, and implemented slow-query-log. Unfortunately, it appears that slow-query-log does not write to the log file until the query completes, so if the issue consumes enough resources to cause the server to lock up it never gets recorded.

I’ll update this if and when I get any additional information.
Thanks,
Aaron