Database Schema Documentation

growlingflea · February 26, 2019, 7:39pm

Hi everyone. I’d like to recruit some assistance in putting together some database schema information. Although this is helping a current customer of mine, the documentation about the database is limited and I think the community could benefit by having some solid resources.

Is there a repo where I can upload the spreadsheet I am working on so others can modify, correct, or contribute?

Thanks!!

Dan

brady.miller · February 27, 2019, 6:48am

Hi @growlingflea ,
Would a google doc spreadsheet work well for this?
-brady

growlingflea · February 27, 2019, 5:15pm

Sure. I published one. I’ll make it public.

growlingflea · February 27, 2019, 5:21pm

Here is a start. This is from a custom database from 500, but I think I removed all the custom tables and columns. If you see something that doesn’t fit, feel free to remove it.

Some of the tables that were empty were ignored. This originally was created for a data merge so some of the tables that had no changes were ignored. If you see a missing table feel free to edit, add, delete, or whatever.

https://docs.google.com/spreadsheets/d/1V-5UhtVCwqckaFqwv1mXwu3ezqY2qPYZ3hI-DF3sag8/

Since the original idea was to fix 500, that is the first page. I made the second page for 501.

brady.miller · February 28, 2019, 7:30am

hi @growlingflea ,

That is a great starting point. I would also rec making a page for development version (when this sheet is done, then will add new tables to the development version, so when release a version, just a matter of copying that to a rel sheet).

Also placed a link to your googledoc here:
https://www.open-emr.org/wiki/index.php/Database_Structure#Tables

-brady

dan_openemr · March 27, 2019, 6:31am

@growlingflea

So are you just taking every table that’s created in one of the .sql files and putting it in spreadsheet format? Is that all? There are a few new devs who could help out with this process…would be good onboarding experience for them.

Sry for the response so many weeks out after this thread was active…

adunsulag · March 27, 2019, 5:14pm

@growlingflea and @dan_openemr I’ve actually started going through and creating documentation as a newcomer. Part of the work has been creating an Entity Relationship Diagram (ERD) of the database using MySQL workbench. I’m happy to assist in the documentation efforts of the tables.

growlingflea · March 27, 2019, 5:54pm

Thanks everyone for taking part in this project. I think this will be very beneficial for the community.

adunsulag · March 27, 2019, 8:09pm

@brady.miller @growlingflea I’ve gone ahead and created a Master version sheet. I’ve merged in the work that Daniel has done into the master sheet. All of the latest tables are there but they are all missing descriptions as I don’t know what should be put there.

I’ve also gone ahead and started segmenting the tables into functional layers. My thoughts are to have an overall visual ERD that segments the tables into their specific modules / areas. Then we’d create a detailed ERD for each functional layer just so we don’t get lost in all of the information.

One question I have is where are all of the foreign keys? The ERD from OpenEMR 3.0.2 has foreign key constraints everywhere. However, my database instance and the database.sql file in the sql folder doesn’t have any foreign key constraints that I can tell. Are these handled at the application layer now?

dan_openemr · March 27, 2019, 8:36pm

@adunsulag There aren’t a lot of foreign keys I know. It just assumes…it is valid. I could be wrong though.

Also something else, we have fields that aren’t consistent. So First Name might be 33 characters in one table and 255 characters in another table.

stephenwaite · March 27, 2019, 10:02pm

hi @adunsulag, take the procedure_order table for instance, isn’t anywhere you see a references in the COMMENT in database.sql a foreign key?

brady.miller · March 28, 2019, 8:53am

hi ,

Never been foreign keys in OpenEMR; guessing that the user placed the foreign keys in the old ERD to make building the ERD easier.

I think this was mostly because foreign keys were not well supported back in the day in mysql. And nobody has added any (all the constraints and table relations are done via the code).

adunsulag · March 28, 2019, 9:33am

@stephenwaite So a foreign key is a database integrity construct that prevents your database system from getting out of wack. Here is a sample table from another application I develop. You can see the foreign keys on assignmentitem_id, last_updated_by, company_id, assignment_id, and created_by.

In this case I can’t delete my Company record (referenced by company_id) until all of my ReportFiles have properly been deleted. Otherwise I’d have orphaned records. Same with the other table types. The foreign key will also make sure I can’t insert erroneous data. I can’t insert a ReportFile record for a company that doesn’t exist. Another nice benefit is that I can set my foreign key to cascade which is also a nice database feature so if my Company is deleted it automatically deletes all of my ReportFile records.

Foreign Keys aren’t necessary unless you have parallel applications or someone accessing the database directly for administrative purposes. If you do have those you want the foreign keys to make sure someone or another application doesn’t inadvertently screw with your data. I’ve seen on other forum posts that OpenEMR wants to support parallel applications working directly against the database. @brady.miller I’d recommend adding foreign keys to the OpenEMR codebase if we want to support this use case.

The downside to foreign keys is that a foreign key will slow down database inserts, deletes, and updates. This isn’t usually significant unless you need to batch process thousands, or hundreds of thousands of records in a short period of time. There are database patterns to get around that though (denormalizing the data etc).

The only concern this late in the game is that there is possibly bad data in the database. Adding the foreign key constraint will reveal that data immediately. It will also start rejecting code in the codebase where the data integrity is being violated which is a good thing. Lastly it will force column consistency that Dan Ehlrich (sidenote 2 user tag limit for new accounts??) talks about as the foreign key column and column definition must match the column type exactly.

The other benefit of the foreign key is that ORM systems like Doctrine will autogenerate code that lets you fetch and work with all of your related entity types when you have foreign keys setup properly. It also makes the generation of ERD documentation much easier as the documentation engine can discover immediately the relationships between tables.

So there’s a bunch of pros and some cons of adding Foreign keys to the OpenEMR codebase and it’d take some time. I’m interested in taking on the project if the community wants to go that route. The downside is there will be a lot of data cleanup if there are violations of the foreign keys for anyone who wants to upgrade their system to the latest OpenEMR version.

Lastly, I’ll just say I’m speaking from my experience. I’m not a DBA but I’ve worked with database systems for a long time.

brady.miller · March 29, 2019, 6:01am

Hi @adunsulag ,
I wouldn’t be against this, but would all the effort towards this be worth it especially considering it may break upgrading. At this point, no plan for parallel applications (everything of substance is pretty much needs to go through OpenEMR’s application layer to make sense of the database).

adunsulag · March 29, 2019, 12:02pm

So, I think the effort is still worth it but the payoff may not be as good.
I’ve seen some development environments where it’s actually worthwhile to have a script that adds all the foreign keys while you are developing and then drops them when you move the code to a production environment. You do this for performance reasons, but it also is a good case if you worry that there are problems with data integrity in your codebase. This makes a lot of sense in a legacy application as it immediately highlights problems in the codebase where data integrity is not actually being maintained. It also gives you the time to write cleanup scripts to fix orphans, mismatched roles, or other problems lurking in the data that you just don’t know about because of an application layer bug.

That being said, its a whole lot more work with not as much gain as you have to make sure you are carefully avoiding the foreign keys slipping into production and destroying the upgrade path for your customers. There may be other higher priority efforts that are more critical such as developing a more comprehensive testing framework that would tease out these data errors while also testing business logic.

brady.miller · March 30, 2019, 8:21am

Hi @adunsulag ,

Sounds like this would add complexity (ie. more work for community and potential for release mistakes in addition to upgrade pains ).

I really like the idea of developing a comprehensive testing framework. We now use travis CI to simply check for php parsing errors and psr2 styling. It would be awesome if we could get a phpunit framework up and working that runs from travis. About 8 years ago or so, a developer did get rudimentary phpunit testing going, but it has been broken ever since: openemr/tests/unit at master · openemr/openemr · GitHub

stephenwaite · July 23, 2024, 3:36pm

A post was merged into an existing topic: Restore of Backup