Project - Standardized Patient Data

brady.miller · February 21, 2020, 7:27am

Goal is to develop a mechanism to create and import large datasets of standardized patient data. This is a high impact project that would then markedly improve instructional use of OpenEMR and markedly improve OpenEMR’s use in the data analytics field.

Mentors:
@robert.down
@brady.miller

prondubuisi · February 22, 2020, 7:35am

Hello @brady.miller @robert.down if I understand this project correctly we are looking to build something that allows seeding of Openemr database tables for OpenEMR demos to be more meaningful to Potential Users

If this is the case this is how we could approach it

Create a mechanism to programatically seed OpenEmr Database Tables(If we can seed the tables, we can also use PHPMYadmin to export their content for resuse)
Make the mechanism easy to use(GUI or CLI)
Make it possible to localize seeding content, so seeding can be more meaningful to local audience

This is how this could be achieved technically

Generate random data sets with PHP faker
Seed specified tables with generated data, for this we could most likely use Iluminate Database package
For ease of use we could create a CLI tool for the above using Symphony/Console package
For localization I am not very sure, but maybe the project users can specify local datasets in their locales and the faker package can use that instead of the random data sets it uses

it is worthy of note that the PHP Framework Laravel uses something similar to the above structure for seeding databases, Please tell me what you think.

I also think this will be nice to have as a Separate Openemr project on a separate repository.

I can also create a demo if needed. Thanks

brady.miller · February 22, 2020, 9:20am

hi @prondubuisi ,

Looks like a solid plan and like the idea of incorporating it into the demos in addition to the modular approach. What are your thoughts on incorporating something like this tool for creating the patient data?
GitHub - synthetichealth/synthea: Synthetic Patient Population Simulator

thanks,
-brady

prondubuisi · February 23, 2020, 7:46am

Hello checked it out @Brady Miller. Looks good especially the ability to create medications, allergies, medical encounters etc . I would play around with it and see, How do you think we could utilize it with the setup I have proposed considering it is written in Java?

brady.miller · February 23, 2020, 8:00am

Hi,

I’d probably start off by dockerizing it. It looks like there is a docker, but it hasn’t been updated for 3 years:
https://hub.docker.com/r/synthetichealth/synthea/

Since their docker is outdated, probable best to roll our own using their most recent production version:
Basic Setup and Running · synthetichealth/synthea Wiki · GitHub

Another option is to see if somebody else has already done it recently. Maybe here:
GitHub - smart-on-fhir/synthea: Static build of Synthea with http interface

-brady

Shreya_Goyal · February 24, 2020, 6:30am

Hi, I am Shreya Goyal, currently pursuing master’s in Health Informatics from IUPUI. I have a strong background in biomedical data analysis, Machine Learning and database management systems. The project looks interesting to me and I believe that I have the required skills to contribute to the project and to the organization.

prondubuisi · February 25, 2020, 3:47am

I will check the links out. I am looking to pick up some docker skills. My replies might be a little late though(Say by weekend). I am currently having School Exams. Thanks.

stephenjude · February 27, 2020, 12:35am

Hello @prondubuisi your plan looks pretty solid. Let me play around and see what I can find. I am also checking out the links from @brady.miller

prondubuisi · February 27, 2020, 1:37am

Thanks Chief @stephenjude. Very happy to see you on GSoC Streets!

stephenjude · February 27, 2020, 7:41am

@prondubuisi Thanks man

prondubuisi · March 4, 2020, 4:12am

Hello @brady.miller the Sythia Static build looks good, especially the http interface, but I think a better approach will be bundling the entire Application (Synthea Static build + our proposed solution ) as one Docker build(our own), I think this will make for a simpler project Setup for persons trying to use our project, what do you think?

Also I have played around with Synthea and I think the JSON it returns is sufficient for our database Seeding needs.

For the Standardized Patient data generation, what OpenEMR database tables do you think will need seeding to have at Least a working minimum Viable Demo Project?

I would appreciate a link to the tables and the Schema for generating them, I want to see if there is a one to one correspondence between our Synthea generated data and the values in those tables

Thanks.

brady.miller · March 4, 2020, 7:29am

hi @prondubuisi ,

I wouldn’t try to import directly to the database, or else you will likely go mad trying to sort out all the relationships after get beyond basic demographics. Would instead recommend importing the data through the OpenEMR API. Then can take advantage of the code that already exists to create patients, medications, allergies, encounters, etc. This may require building out the API to support things, but much easier to leverage processes that are already there to bring data into the database than trying to reinvent that very, very complicated wheel

If possible, would rec trying to keep the modularity of the main OpenEMR docker since that then makes it much easier to support and maintain (ie. rather than building out a separate openemr/synthea combination docker). It looks like that Synthea static build docker dumps the data into a shared folder so should be able to share that between dockers. Regarding dockers, the sky is really the limit. For example, could have the synthea docker, the openemr docker, and then a “utility docker” that runs synthea (that curl call) and imports the data into openemr via the api.

Note these are just some initial thoughts and feel free to go the way that you think is best.

thanks,
-brady

prondubuisi · March 4, 2020, 8:03am

Thanks @brady.miller for the pointers. Re-inviting the wheel is not something fun to do. This is getting more interesting and more complicated . My next point of call would be exploring the API’s. Also I can confirm Synthea static build docker dumps data into a shared folder.

My biggest concern would be ease of use, Would the users need to set up openEMR docker, Synthea docker, and Maybe a third Docker to use this utility?

Also who are the target users for this utility?
Developers? Medical Staff?

Thanks.

brady.miller · March 4, 2020, 8:19am

hi @prondubuisi ,

Target users will likely be researchers, developers, students, and demos. Don’t see medical staff or standard users using this. That being said, agree that the easier to use the better. Another option would be to follow the way we integrated the following easipro feature:
easipro by bradymiller · Pull Request #2911 · openemr/openemr · GitHub

In that case, you could have a setting in globals that directs to the synthea server (ie. could be a docker on the local network but could even be a synthea server over internet), then a script/gui in OpenEMR where you could run the process with some settings (number patients, locale, etc). The data could be dumped in temp folder (or memory, but guessing a million patients may get out of hand). And then could use OpenEMR to import them (can’t really utilize the API since internal, but can actually run the underlying Service calls that are used in API (ie. so using the API Service functions). And then in this case, wouldn’t be tough to run these processes via commandline in cases where doing OpenEMR autoinstall (it could even be a docker setting).

Gonna be lots of potential ways to do this, and agree good to consider ease of use.

thanks,
-brady

prondubuisi · March 4, 2020, 8:48am

Thanks for the Clarifications @brady.miller. I will continue digging.

prondubuisi · March 8, 2020, 1:47pm

Hello @brady.miller I am done with my exams, and I am looking for issues to get started with Docker and API services for OpenEMR, are there any issues you could please point me to? Super thanks.

robert.down · March 8, 2020, 5:35pm

I know Rachel was working with Synthea - did we ever find a solution to batch import CDAs? Synthea also outputs FHIR, maybe our FHIR API is strong enough to leverage that?

brady.miller · March 9, 2020, 12:39am

hi @prondubuisi ,

A really good issue to work on would be an API call that does:
POST /api/patient/:pid/encounter

And would use this to set up a new encounter with posted information (such as encounter date). It would return the encounter eid, which could then be used to add forms etc. to the encounter.

@robert.down , FHIR API is still far off (note it’s a GSoC project) and the CDAs are a bit of a all or nothing approach. Seems like starting small with API calls makes sense since future FHIR imports, CDA imports, will all be able to take advantage of the underlying Service logic that is built to support this.

-brady

prondubuisi · March 9, 2020, 2:50am

Cool @brady.miller I can create an Issue for this on Github right? then I can proceed to send in a patch

brady.miller · March 9, 2020, 2:52am

hi @prondubuisi , Definitely create an issue for it on github and then I’ll assign you to it. Then will submit the PR to github.