Goal is to develop a mechanism to create and import large datasets of standardized patient data. This is a high impact project that would then markedly improve instructional use of OpenEMR and markedly improve OpenEMR’s use in the data analytics field.
Hello @brady.miller @robert.down if I understand this project correctly we are looking to build something that allows seeding of Openemr database tables for OpenEMR demos to be more meaningful to Potential Users
If this is the case this is how we could approach it
Create a mechanism to programatically seed OpenEmr Database Tables(If we can seed the tables, we can also use PHPMYadmin to export their content for resuse)
Make the mechanism easy to use(GUI or CLI)
Make it possible to localize seeding content, so seeding can be more meaningful to local audience
This is how this could be achieved technically
Generate random data sets with PHP faker
Seed specified tables with generated data, for this we could most likely use Iluminate Database package
For ease of use we could create a CLI tool for the above using Symphony/Console package
For localization I am not very sure, but maybe the project users can specify local datasets in their locales and the faker package can use that instead of the random data sets it uses
it is worthy of note that the PHP Framework Laravel uses something similar to the above structure for seeding databases, Please tell me what you think.
I also think this will be nice to have as a Separate Openemr project on a separate repository.
I can also create a demo if needed. Thanks
hi @prondubuisi ,
Looks like a solid plan and like the idea of incorporating it into the demos in addition to the modular approach. What are your thoughts on incorporating something like this tool for creating the patient data?
GitHub - synthetichealth/synthea: Synthetic Patient Population Simulator
Hello checked it out @Brady Miller. Looks good especially the ability to create medications, allergies, medical encounters etc . I would play around with it and see, How do you think we could utilize it with the setup I have proposed considering it is written in Java?
I’d probably start off by dockerizing it. It looks like there is a docker, but it hasn’t been updated for 3 years:
Since their docker is outdated, probable best to roll our own using their most recent production version:
Basic Setup and Running · synthetichealth/synthea Wiki · GitHub
Another option is to see if somebody else has already done it recently. Maybe here:
GitHub - smart-on-fhir/synthea: Static build of Synthea with http interface
Hi, I am Shreya Goyal, currently pursuing master’s in Health Informatics from IUPUI. I have a strong background in biomedical data analysis, Machine Learning and database management systems. The project looks interesting to me and I believe that I have the required skills to contribute to the project and to the organization.
I will check the links out. I am looking to pick up some docker skills. My replies might be a little late though(Say by weekend). I am currently having School Exams. Thanks.
Thanks Chief @stephenjude. Very happy to see you on GSoC Streets!
@prondubuisi Thanks man
Hello @brady.miller the Sythia Static build looks good, especially the http interface, but I think a better approach will be bundling the entire Application (Synthea Static build + our proposed solution ) as one Docker build(our own), I think this will make for a simpler project Setup for persons trying to use our project, what do you think?
Also I have played around with Synthea and I think the JSON it returns is sufficient for our database Seeding needs.
For the Standardized Patient data generation, what OpenEMR database tables do you think will need seeding to have at Least a working minimum Viable Demo Project?
I would appreciate a link to the tables and the Schema for generating them, I want to see if there is a one to one correspondence between our Synthea generated data and the values in those tables
hi @prondubuisi ,
I wouldn’t try to import directly to the database, or else you will likely go mad trying to sort out all the relationships after get beyond basic demographics. Would instead recommend importing the data through the OpenEMR API. Then can take advantage of the code that already exists to create patients, medications, allergies, encounters, etc. This may require building out the API to support things, but much easier to leverage processes that are already there to bring data into the database than trying to reinvent that very, very complicated wheel
If possible, would rec trying to keep the modularity of the main OpenEMR docker since that then makes it much easier to support and maintain (ie. rather than building out a separate openemr/synthea combination docker). It looks like that Synthea static build docker dumps the data into a shared folder so should be able to share that between dockers. Regarding dockers, the sky is really the limit. For example, could have the synthea docker, the openemr docker, and then a “utility docker” that runs synthea (that curl call) and imports the data into openemr via the api.
Note these are just some initial thoughts and feel free to go the way that you think is best.
Thanks @brady.miller for the pointers. Re-inviting the wheel is not something fun to do. This is getting more interesting and more complicated . My next point of call would be exploring the API’s. Also I can confirm Synthea static build docker dumps data into a shared folder.
My biggest concern would be ease of use, Would the users need to set up openEMR docker, Synthea docker, and Maybe a third Docker to use this utility?
Also who are the target users for this utility?
Developers? Medical Staff?
hi @prondubuisi ,
Target users will likely be researchers, developers, students, and demos. Don’t see medical staff or standard users using this. That being said, agree that the easier to use the better. Another option would be to follow the way we integrated the following easipro feature:
easipro by bradymiller · Pull Request #2911 · openemr/openemr · GitHub
In that case, you could have a setting in globals that directs to the synthea server (ie. could be a docker on the local network but could even be a synthea server over internet), then a script/gui in OpenEMR where you could run the process with some settings (number patients, locale, etc). The data could be dumped in temp folder (or memory, but guessing a million patients may get out of hand). And then could use OpenEMR to import them (can’t really utilize the API since internal, but can actually run the underlying Service calls that are used in API (ie. so using the API Service functions). And then in this case, wouldn’t be tough to run these processes via commandline in cases where doing OpenEMR autoinstall (it could even be a docker setting).
Gonna be lots of potential ways to do this, and agree good to consider ease of use.
Thanks for the Clarifications @brady.miller. I will continue digging.
Hello @brady.miller I am done with my exams, and I am looking for issues to get started with Docker and API services for OpenEMR, are there any issues you could please point me to? Super thanks.
I know Rachel was working with Synthea - did we ever find a solution to batch import CDAs? Synthea also outputs FHIR, maybe our FHIR API is strong enough to leverage that?
hi @prondubuisi ,
A really good issue to work on would be an API call that does:
And would use this to set up a new encounter with posted information (such as encounter date). It would return the encounter eid, which could then be used to add forms etc. to the encounter.
@robert.down , FHIR API is still far off (note it’s a GSoC project) and the CDAs are a bit of a all or nothing approach. Seems like starting small with API calls makes sense since future FHIR imports, CDA imports, will all be able to take advantage of the underlying Service logic that is built to support this.
Cool @brady.miller I can create an Issue for this on Github right? then I can proceed to send in a patch
hi @prondubuisi , Definitely create an issue for it on github and then I’ll assign you to it. Then will submit the PR to github.