Intelligent Chart Summarization

MatthewVita · June 3, 2017, 9:46pm

UPDATE:

Much progress has been made on this project.

Goal

The Intelligent Chart Summarization project is an effort led by Dr. André Millet. Dr. Millet has put forth an idea of using an algorithm to summarize the most vital parts of a patient’s timeline. This project will extract out relevant medical codes for each unstructured encounter and work with @JBW and @toolbox on the analytics side.

Resources

Picture of web ui: https://raw.githubusercontent.com/GoTeamEpsilon/cTAKES-Friendly-Web-UI/master/sample-visit-note.PNG
Repos: https://github.com/GoTeamEpsilon/cTAKES-Intelligent-Chart-Summarization-Solution
Getting started video: https://www.youtube.com/watch?v=0V584l8J8_Y
Tasks management Intelligent Chart Summarization project · GitHub
C_ClinicalDocumentProcessing.class.php example: https://gist.github.com/MatthewVita/5de7971e1adfb8724bdb989aa23317de
Parser example: https://gist.github.com/MatthewVita/06a8c99339c3cb04d88c6646208ee37b
Demo 1: https://i.imgur.com/liMVBwM.gifv
Demo 2: GitHub - TheToolbox/ctakes-mockup
related issue: Create Timeline View of Patient · Issue #808 · openemr/openemr · GitHub

Notes

Need to get ICD working as well
There will be a GLOBALS feature toggle for this as it requires additional setup (docker). We must document everything in the wiki!

Workflow

Provider creates encounter
Enters free form notes
Clicks save
Text is HTTP POST’ed to the Docker solution (the head of the pipeline) via an HTTP endpoint over there
Text is processed
At the tail of the pipeline (i.e.: after Python does its parsing), an HTTP POST is made to an OpenEMR controller endpoint called /ctakes/{pid}/content which stores the ctakes JSON into the database
cTAKES SNOMED/RXNORM codes are available for viewing in the encounter form notes

Team

@MatthewVita and TeamEpsilon
@andremillet
@toolbox
@brady.miller (who I am volunteering because he expressed interest :))

Chatroom

#ctakes

andremillet · June 3, 2017, 10:01pm

while we wait, I am taking notes of what improvements we could do.
today, as I was attending to a Epilepsy class, it was discussed how
important to classify a epilepsy convulsion in order to manage it.
I have even drawn a ‘timeline’ in a paper and started to pinpoint symptoms,
signals, medications in a period of time to help me make the most accurate
diagnosis possible.
I really want you guys to understand that we are not talking about
calendar, but gathering knowledge that is beyond time, and that it WILL
make a difference in a patient’s life someday

MatthewVita · June 3, 2017, 10:58pm

Great.

Also, here’s the emails I’m sending out Chart Summarization Project Academic Recruiting · Issue #824 · openemr/openemr · GitHub

andremillet · June 3, 2017, 11:19pm

excellent!

andremillet · June 3, 2017, 11:40pm

could you send a copy to DrVictorfiorini at gmail dot com?
he is from a university nearby

MatthewVita · June 4, 2017, 12:50am

Done.

andremillet · June 7, 2017, 1:44am

as feature, timeline should work as a ‘personal assistant’. as one of its functionalities, it should, for example, remember the physician of patients on controlled medication that are titrating dose.
also, it should group ‘profiles’ of patients per pre established ICD 10, automatically or customized (hypertensive patients that are also diabetics, or dislipidemic patients with a determined total colesterol value)

MatthewVita · June 7, 2017, 2:55am

Hi @andremillet,

The idea of cTAKES is that it uses machine learning models under the hood that are trained using clinically annotated golden data. Here are the components that make up cTAKES: https://ctakes.apache.org/components.html. This would (hopefully) mean that it would be able to figure out ICD 10 codes and important connections between a patient’s problems and medicines.

As far as grouping patients together, perhaps we need an “OpenEMR Dashboard” to really understand a patient region. This can be available to admins and providers. There is a project out there called “OpenEMR Insights”.

In other news, I haven’t heard back from anyone in terms of contributing. This project may just be me and you, it seems. I’m fine with this, but I was hoping that such an exciting project using cutting edge tech would attract some attention!

Thanks,
Matthew

MatthewVita · June 11, 2017, 1:32am

Hi @andremillet,

I have had my RocketChat turned off for a week or so (no particular reason… just forgot to have it on!) so I missed your messages. Sorry about that!

I want to kick this project off by seeing what cTAKES can do. Can you take some encounter data that you have for a handful of patients and remove any personally-identifying information? Ideally, the data should include medical issues, drugs, codes, and important free text notes about their situation.

Once this is done, we can use this awesome tool that has cTAKES running under the hood to see the results: http://54.68.117.30:8080/index.jsp

I did some research and this is the easiest way to test out cTAKES, with the second being pulling down a Docker image (fortunately you have Linux so that will be easy as we move forward with the project).

I do realize that the US gov’t has released a rather large dataset of realistic fake patient data, but I was thinking that it would be great to start with 3 or so entries from you because you have already listed a scenario in your initial email about the patient that smokes and I’m sure you have worked with patients with other unique problems. In this way, you’ll know what you, the provider, expect to see from a summary perspective.

Is this okay?

Thanks,
Matthew

MatthewVita · June 11, 2017, 7:55am

@andremillet,

I’ve done a good amount of digging into cTAKES. One thing I’m learning is that you pretty much need an account with https://uts.nlm.nih.gov//license.html to take advantage of vast gov’t sponsored datasets. Make an account (it’s easy).

I ran the following through cTAKES: https://www.med.unc.edu/medselect/resources/sample-notes/sample-initial-visit-note-1

This was the result: http://i.imgur.com/qWZ2YE7.jpg

Very neat.

I have found that cTAKES running in clinical pipeline mode is what we are looking for. It’s not the fastest approach, but it runs through the most robust models. I am looking into GitHub - dirkweissenborn/ctakes-server: A simple REST-server around ctakes clinical pipeline. and GitHub - tmills/ctakes-docker to set up a Docker container as a “black box” for experimenting. Eventually, a Docker container can run along-side OpenEMR and communicate via REST.

Thanks,
Matthew

EDIT: @robert.down and @brady.miller, check the the med.unc and imgur results out. This technology is pretty impressive, even without the ICD/SNOMED and NIH datasets/models. Figured I’d ping you guys because you were on the original email + this is worth sharing!

andremillet · June 12, 2017, 1:04am

very interesting. would it work multilanguage as well?

got lost on the docker subject . ctakes may be indeed our start. I was thinking in what to do with the information and now we have a way to acquire it! and it is open source, so we can improve it!

brady.miller · June 12, 2017, 8:05am

Very cool stuff! Gonna start looking through the datasets.
-brady

MatthewVita · June 13, 2017, 12:40am

cTAKES is mainly English at the moment

MatthewVita · June 16, 2017, 12:20am

I am working with the maintainer of GitHub - tmills/ctakes-docker to get it working. There is an issue with the MIST container due to licensing concerns.

He is very responsive and helpful so I will hopefully report back with good news. For context, the analysis engine isn’t loading:

MatthewVita · June 18, 2017, 7:24pm

@andremillet just an update. I’m almost there with fixing the issue with the docker ctakes. The author of the library is very nice and has been helping me get around the issue. Because MIST isn’t open source licensed, we have been trying to use a generic HIPAA model instead. It has been a difficult task, however!

-m

MatthewVita · June 18, 2017, 7:31pm

Just to give you more context. Once this Docker image is working per our requirements, I will guide you in installing Docker, build the image, and run the container. Should be very easy to do!

With the container running, I’ll have you paste in some realistic clinical text and we will examine the NLP output. We can then talk about how to present this information to the user in the OpenEMR patient screen and go from there.

Fortunately the way the solution work is via a GUI program that allows one to interact with a clinical text box and see the results. This will speed up our investigative work. From a programming perspective, the Docker container allows for programmatic access via passing in clinical text files which sends it through its pipeline. One of the nice features of the Docker container is that the author queues up processing via ActiveMQ to handle production traffic, among other things!

I’m very excited to get this working because this Docker container will have the full clinical pipeline running with the best in class models and algorithms!

-m

MatthewVita · June 20, 2017, 11:25pm

Still working out the docker issues with the maintainer. We’re getting closer!

-m

andremillet · June 21, 2017, 1:03pm

I want to kick this project off by seeing what cTAKES can do. Can you take some encounter data that you have for a handful of patients and remove any personally-identifying information? Ideally, the data should include medical issues, drugs, codes, and important free text notes about their situation.

working on it already. I am ‘negotiating’ with the Infirmary Chief I work at so I can use OpenEMR and so, test our module.

I did some research and this is the easiest way to test out cTAKES, with the second being pulling down a Docker image (fortunately you have Linux so that will be easy as we move forward with the project).

ok, I’ll need to use my computer then. Told him it was 100% browser lol

I do realize that the US gov’t has released a rather large dataset of realistic fake patient data, but I was thinking that it would be great to start with 3 or so entries from you because you have already listed a scenario in your initial email about the patient that smokes and I’m sure you have worked with patients with other unique problems. In this way, you’ll know what you, the provider, expect to see from a summary perspective.

I do not think it will be an issue. We have plenty of data everyday. As soon we start collecting it, we build our on database.

andremillet · June 21, 2017, 1:18pm

check the email I sent you later, comparing ctakes with todoist.
can we have a web-based fronted to increment data?

andremillet · June 21, 2017, 2:35pm

tried the link you gave, this was the output.