Feature Request: Replica Reader

jesdynf · June 8, 2023, 5:42pm

So one of the ways people solve issues with slow databases is through read-only replicas – you can have a database running as a read-only replica of the master. (And possibly serving double-duty as a hot spare if the master crashes.) There’s replication lag, so you have to be mindful of it, but a pristine database that’s not under any load at all and is only 10-50 milliseconds behind the writer can make a lot of searching and reporting tasks easier.

Ideally it would take this form:

A second set of database fields (fine if it can’t be set automatically but it would be nice if it could) for the second datasource, configured as a second parallel connection pool.
SELECT queries inside the codebase could be flagged to use the replica if available, and would use the master if the replica wasn’t configured. Thought would have to be given to exactly which queries could use this indirection – basically it’s not appropriate for anything that has to immediately reflect a change on-click because sometimes pages can actually beat the replication lag, but it’s fine for a patient search or some sort of report or export. (Not sure if it’s a good match for a calendar or not, depends on how things show up on it.)

It won’t help everybody – anybody using only a single database node, and let’s not lie, that’s “most people”, wouldn’t be able to leverage it, but it’ll help multi-facility performance at the high-end. Amazon Aurora has some charming trickery where every instance has a writer and reader endpoint and it Just Magically Works even if there’s only one node in the cluster or if you change the scaling.

brady.miller · June 9, 2023, 5:05am

This is pretty interesting. For the Kubernetes stuff (openemr-devops/kubernetes at master · openemr/openemr-devops · GitHub), I incorporated a mariadb replica (can actually scale it how to how every many replicas you want) since thought it would be neat for reporting/backups without taking away resources from the main mariadb database. Never thought about using it as a reader (in that case, may think about 2 replicas by default in the kubernetes setup, 1 for reader and other for backup/misc stuff).

Probably would involve a new wrapper function sqlStatementReadOnlyReplica() or something like that which would read the data using the replicareader creds in sqlconf.php . Note the audit engine will also store the select queries (when this is turned on in globals which is required for MU3) so interestingly the log of the query would need to go to the main database while the main read query would go to the reader replica (this wouldn’t be to hard to workup).

Good proof of concept would be using this sqlStatementReadOnlyReplica() in the Administration Logs page where it collects the logs. This can be a real drain sicne the table can be huge, so would be neat to out source the read to the replica reader.