Document Restore

Hello

Had some problems with the documents directory where many of them were lost (don’t ask :frowning: ).

We do have a backup of these documents elsewhere. The kicker is that the backup of the documents is not encrypted. The clinic will store these documents on their local file storage, with the original name. So I can’t just upload the document (FTP) and have it show up. It seems like I have to first encrypt the document and then the document name before uploading. How do I translate something like “patient-file-202312.pdf” to its encrypted value of “9a77573a-8ea5-423e-8c13-a93a3f1df2a6” and then upload it? I have full access to the database and filesystem. Just trying to recover as much as I can from this fiasco.

Thanks.

hi @midder , with a copy of the backup of the documents and a copy of the documents table from the database you could write a custom script to match the file to the uniquely named document by comparing the hash that openemr makes and saves in the table.

Thank you @stephenwaite. However my “backup” of the documents is a backup with the names in plain text. How do I “rename” and then encrypt the document appropriately so that I can load it up to the filesystem on the server and the app will pick it up?

the custom script will have to hash it and then compare it to the documents table and grab everything it needs to then replace it

Yes I get that. Sorry I’m not asking the right thing here. Maybe I’m still not.

My question is how is it done? Is there a key somewhere? A specific algorithm? What’s included in the hash?

it’s in the Document class, createDocument() method

I think the worry is encrypting the document. The keys for that are the keys stored in the database.
You’d have to look but I suspect the file name hash is only a reference to the document for storage security.

The file name is a uuid (text form) which is stored in the database entry (in binary form). A good way to map would be to convert the filename uuid to binary and then find the database entry (which also contains the text title of the file).

1 Like

To be more helpful on my post, the uuid filename is stored in the drive_uuid column in the documents table and the human readable filename is stored in the name column.

one more tip. the path where the file is stored is in the url column.

So recreation of your files could like something like so (this is assuming you have not changed servers/path to the documents directory; if that is case then will get more complicated but still possible utilizing the path_depth column):

  1. encrypt the file (via CryptoGen->encryptStandard($data, null, 'database'); )
  2. find the url via mapping the uuid filename on the drive_uuid in the documents table
  3. plop the encrypted file at the url with the name if file as the text drive_uuid

EDIT:
Actually, since you are starting from the human readable filename (and not the uuid), will be more complicated but possible. In that case would use the hash of the file to map it in the documents table (via above @stephenwaite instructions).

ANOTHER EDIT:
1 more tip on doing the hash. For documents at and after openemr 6.0.0, hash('sha3-512', $content); is used and prior to this sha1($content); was used, so when trying to map to the hash column in the documents table would first try the sha3-512 hash and if nothing is mapped, then would try the sha1 hash (and if nothing is mapped on that, then that would be bad).

1 Like

Thanks for the tips here.

Looks like I’m going to get hung up here:

$iv = RandomGenUtils::produceRandomBytes(openssl_cipher_iv_length(‘aes-256-cbc’));

Since it is random, I will never be able to encrypt the same document the same way twice, correct? If I send in a file called test.pdf, each time I send it in it will be encrypted in a different way.

So there is no way to take an unencrypted file that has been previously uploaded (and encrypted) to OEMR, encrypt it again, and have the system be able to unencrypt and then read it?

1 Like

hi @midder ,
No need to worry. It should work just fine (note you are just sending in each document once). The encrypted file will be different, but the original file, uuid file name, and file hash will not change (since hash is on the orginal file prior to encryption) :slight_smile:

(one issue i would watch out for is file duplicates (ie. when you run the sqlStatement query to collect the documents with a hash there will likely only be one; if there are more then than one, then i would note that and do those more carefully later; if this is case, likely are the same identical files that would be stored in those separate places but would be good to do a sanity check on those))

EDIT: the above issue is interesting since it makes me think what would happen if had a duplicate file prior to 6.0.0 and after 6.0.0. So I think when collect document entry based on hash would actually in the WHERE clause do the sha3-512 and sha1 hash in same clause with a OR. And if you get more than 1 document, then would set that query result aside and deal with it later with a bit more oversite and manual intervention.

1 Like