Importing documents

drpwayne wrote on Sunday, December 19, 2010:

Hi,
Can someone clarify to me how to import documents into OpenEMR?
I have several hundred thousand documents - some pdf and some jpg - that are in MySQL. I understand that OpenEMR does not keep documents in the database but in individual files, so I will need to export the documents to individual files. The documents table in OpenEMR is:
id, integer autoincrement
type (enum, “blob”,“web_url”,“file_url”)
size, date, url, mimetype, pages, owner, revision, foreign_id, docdate, list_id

Is “owner” the patient account ('pubid") or the patient id (id?) or the patient pid or none of the above?
What is the foreign_id?
I assume that “size” is optional.
Let’s say I have a pathology report, pdf format,  for the patient whose pubid is 30000, and the report is dated 2010-12-01.
I can create a file of any name, say taking the autoincrement id and appending “.pdf” to it, as
\openemr\30000\pathology\101432.pdf
So I put the scan date of the report in “date”, the date of the pathology report in “docdate”, “pdf” in the mimetype,  but how is the file associated with patient 30000? Is “30000” put in the foreign_id? Or is it “owner”? Also, do I put a “7” into list_id to show that it is a pathology report?
TIA for any guidance.
- Peter

visolveemr wrote on Monday, December 20, 2010:

Hi,

Openemr supports uploading documents for each patients under different catogories. We can also add/edit new categories under “Administration -> Practice -> Documents”.

If you are using openemr - 4.0 dev tip, documents are uploaded under patient demographics. If you are using openemr 3.2,  “Documents” option is under “Patient/Client->Medical Record”.

Following are the documents table field descriptions:
1. foreign_id : refers to id in patient_data table,
2. size : is calculated in bytes if the document is available
3. date : refers to the uploaded date and “docdate” refers to the date of the report(which can be added
through GUI).
4. list_id : refers to the id of the related issue from the lists table.

For more details/usage on each fields in documents table, refer the file “library/classes/Document.class.php”.

Hope this helps.

Thanks
ViCarePlus Team,
www.vicareplus.com
services@vicareplus.com

penguin8r wrote on Wednesday, December 22, 2010:

There is a bulk file import utility included with the OpenEMR distribution, /openemr/contrib/util/emr_scan_load.plx,
that can do some of what you’re describing.
However, it requires you to change the data type for the PID field in the database in order for it to work properly.
I’ve used it successfully, but I’m constantly on the lookout for unexpected complications due to changing the pid field type.
Someday when time permits I will try to go through the Perl code there & find a way to make it work without the change.

drpwayne wrote on Thursday, December 23, 2010:

Thanks to both of you for your replies.  I have a rudimentary knowledge of php and I can read the documents.classes.php file but I don’t understand how to store documents in a table from that. I think that the “documents” table doesn’t actually store documents, just addresses of documents, so the “url” could be an auto_increment or other unique field in a MySQL table of documents. What I don’t see is how in the documents.classes.php definition there is any way of retrieving the document if the url is, in fact, a key field in another table containing blobs, nor how to set new documents into the table of blobs.
I can’t find the file import utility that penguin8r mentions. I don’t think it’s in the 3.2 xampp distribution.
Thanks again.

tmccormi wrote on Thursday, December 23, 2010:

I have all the code that Dr Sam Bowen wrote to store documents in the database.  Send me an email contact and I forward it to you for your use in what ever way you find it useful
-Tony
tony @ mi-squared.com

tmccormi wrote on Thursday, December 23, 2010:

I just pushed to a branch on my github account for easy access:

https://github.com/tmccormi/openemr/commit/3442d1eae869798943511755119b52c547c7c88d

-Tony

drpwayne wrote on Thursday, December 23, 2010:

Thank you, that’s a great holiday gift. I will study Dr Bowen’s code over the next few days. At first glance, it looks more understandable than the documents class.
- Peter

tmccormi wrote on Friday, December 24, 2010:

no question about that :slight_smile:
-Tony

drpwayne wrote on Monday, December 27, 2010:

I looked at Dr. Bowen’s code, and though it works, he stores his documents (images) outside of the usual OpenEMR document tree (at least, it looks that way to me, I’m still trying to figure out where things are stored). I’d like to use the document tree/document list that currently exists. It looks like I need to modify code in C_documents.class, which stores and retrieves from files. If I change that class to store and retrieve from a table with SQL calls, that should do it. Easier said than done.
Incidentally, in following another thread on this same topic, I agree with Dr. Bowen that it’s preferable to store documents in a table, not in individual files. One issue that was mentioned as a drawback to tables was backup. I store documents in MySQL MyISAM tables, and they support master/slave replication.  In several years of running I’ve never had to make a full system backup. There is a continuously running slave system at the office. In addition, I bring a notebook PC to the office every day or two and connect it to the network. Then I bring the notebook home where there’s another slave that copies from the notebook. So there are multiple backups, and it only takes a few minutes and requires no effort other than bringing a notebook computer back and forth. We have close to a 100 GB database, but modern notebook PCs come with more than enough storage.
The problem I used to have with storing documents as files was when a document was misfiled or double-filed and needed to be deleted. By keeping documents in the database, deleted files are then deleted on the slaves. When documents were kept in individual files, then documents deleted on the server had to be manually deleted on the backup computers.
I know, I know, it’s a controverisal issue :slight_smile:
- Peter

drpwayne wrote on Tuesday, December 28, 2010:

OK, I’m willing to give in. It looks like a lot of code depends on documents being stored as files.  I can see that the only tables that need updating when a file is stored are the documents table and the categories_to_documents table.
Just a comment - in uploading a document, I notice that today’s date is the date inserted. After the document is uploaded, then the date can be changed. This seems to me to be time-consuming. Most documents that are scanned in will be dated on a previous date - the date the document is scanned is less important than the date of the document, if one is looking at someone’s medical records. It helps to know when the document was scanned but the main identifiers are patient id, type of document, and date of document. The date scanned is not a major identifier.
In addition to date scanned, though, the system should keep track of who did the uploading. If you’ve ever worked in an office and found things mis-scanned, everyone always says “it wasn’t me, it was Jeanie.” And Jeanie says it was Emily.  And Emily was on vacation that day, so it must have been Rose. Knowing who did the scanning is important.
- Peter

tmccormi wrote on Tuesday, December 28, 2010:

The Administrative Logs will track the information you need as far a ‘who’ -  here is an example from the log of a recent uploaded document.  3rd field is the userid

12/27/2010 17:06:46 	other-replace 	admin 		Default 	0 	1 	REPLACE INTO documents SET `id` = '48', `type` = '1', `size` = '1601811', `date` = '2010-12-27 17:06:46', `url` = 'file:///opt/emr_devtip/openemr/sites/default/documents/17/MI2_Postcard-LoRes.pdf', `mimetype` = 'application/pdf', `foreign_id` = '17', `docdate` = '2010-12-27'

drpwayne wrote on Tuesday, December 28, 2010:

Thank you, Tony, but my Administration->Logs only shows me logins and logouts and views. I can see why Dr Bowen rolled his own outside of the normal documents tree; even though it’s not elegant, it lets him add some extra description to each document (in his case, “ordering_practitioner”, which apparently he wants to track). 
- Peter

tmccormi wrote on Tuesday, December 28, 2010:

Version 4.0 has significantly improved logging …no doubt document management can be improved over time.  
-Tony

jcahn2 wrote on Tuesday, December 28, 2010:

Peter’s absolutely right.  the only significant date on a document is the day it was generated.  Can’t think of when the scanned in date was important to me.
Jack

drpwayne wrote on Thursday, December 30, 2010:

Question regarding some of the session variables being passed around in some of the code:
$_SESSION - I assume that’s the patient’s ID. Or is that the pubpid? Or the patient_data pid?  It’s not clear to me.
$_SESSION - ??
$_SESSION]“authUser”] - I  assume that’s the user’s login code (“admin”, “NancyG”, or whatever)?
Thanks for clarification, anyone who can give it.