AWS S3 Option for file storage

juggernautsei · August 15, 2022, 12:31pm

We are in the process of developing a module that will connect the AWS S3 bucket system as a storage option. The screenshot below is what we have right now.

Nilesh_Hake · August 16, 2022, 3:31pm

@juggernautsei,

I had already implemented this feature in OpenEMR for one of my client. Below are flow of process.

Upload image to S3 private bucket on AWS.
Then display that image into OpenEMR using presinged functionality of AWS server.

growlingflea · August 16, 2022, 6:19pm

Are you writing this in bash or php? I have a cron job that takes a backup of the database and sites directory and syncs it to AWS S3 storage twice a day.

juggernautsei · August 16, 2022, 6:40pm

I am building a module. Yes, I am writing it in PHP.
@Nilesh_Hake I know that doing private work does not lend itself to sharing. My question is did you tie it into the patient documents so that instead of saving the files on the local instance. The files are saved in the S3 bucket via the normal document work flow.
Our goal for this module is to add to the ways to save files.

During the build process, I have the code write a test file to the S3. That is as far as I have gotten.

growlingflea · August 16, 2022, 6:54pm

Sounds good. I’m wondering, are you doing more than just syncing the data to the S3 bucket? The solution I used is about six lines of code, three lines in the crontab file and the file that takes a mysql dump. You will need to have the aws library installed on the server. The cronjob runs the backupsql.sh file which is just a mysql dump which saves the database to the /opt/backups folder. Then I sync the sites directory and the backups folder to S3 server.

45 12,21 * * * sh /opt/backups/backupsql.sh
34 23 * * * aws s3 sync /var/www/openemr/sites/ s3://sqldump-bucket/sites
30 12,22 * * * aws s3 sync /opt/backups/ s3://sqldump-bucket/mysqldumps

The backup script is below:

date > backup-date.log
now=$(date +"%Y_%m_%d_%H_%M_%S")
mysqldump --user=openemr --password=openemrPassword openemr > ${now}_daily-all-dbs.sql
gzip ${now}_daily-all-dbs.sql
rm ${now}_daily-all-dbs.sql
date >> backup-date.log

juggernautsei · August 16, 2022, 7:11pm

Yes, we are going to be writing and reading the document files from the S3 bucket. It is not just for backup.

growlingflea · August 16, 2022, 7:14pm

The way that I understand how S3 works is that is doesn’t cost anything (or very little) to write to the drive but its very expensive to read or pull data out. If you are going to be writing and reading data, you might want to choose a different service.

juggernautsei · August 16, 2022, 7:39pm

@jesdynf could you chime in on the cost of the S3 bucket? I don’t know but since you are well versed on AWS. Maybe you know the cost aspect of using S3 and why you want to use S3 instead of the local file system.

According to AWS calculator, this is the cost

Inbound:
Internet: 5 GB x 0 USD per GB = 0.00 USD
Outbound:
Internet: 5 GB x 0.09 USD per GB = 0.45 USD
Data Transfer cost (monthly): 0.45 USD

jesdynf · August 16, 2022, 8:16pm

That’s flat wrong, yes. Perhaps he’s referring to Glacier?

Amazon S3 Simple Storage Service Pricing - Amazon Web Services covers the ground. S3 is designed to affordably transfer terabytes of static documents. You pay for the storage (and that’s honestly the largest cost over time, 2.3 cents per GB/mo), you pay for the data transfer (and you’ve got the math correct there), and you pay tiny fractions of pennies for the document transfer requests. ($0.005 per thousand puts, $0.0004 per thousand gets).

And think about it – how often do these patient documents get actually uploaded or downloaded? These aren’t framework .js files or something looked at on pageload, there’s only so many times in the lifespan of any given document it’ll be looked at.

juggernautsei · August 17, 2022, 11:27am

So, to put real numbers behind what @jesdynf said.
We have 78GB of data right now that took over 10yrs to accumulate. So you know some of those documents from the first few patients will never be looked at again. The storage of that 78GB of data will cost $1.79/mo sitting in an S3 bucket.
The redundancy that is offered by the S3 bucket is worth the monthly cost. For pennies, the files are written across three different data centers. Now, that is data availability and security wrapped into one. No more wondering if the cron job failed.

Nilesh_Hake · August 17, 2022, 11:44am

Write now I have implemented this functionality for the upload doctor signature into S3 private bucket and then display that image into openemr using presign functionality of OpenEMR. Later we will going to work on the patient document nas well.

TranMedical · August 29, 2022, 9:55pm

I was also working on this but for google cloud. Currently it works though I’m not sure it is optimal for speed.

I used the docker image setup. What I did:

I fuse mounted my bucket.
I had mounted my sites dir to a docker persistent volume, so I let that set up first.
Then I copied over the documents dir within the sites dir into my fuse mounted dir.
Then I symlinked the documents path to the fuse mounted dir

Not sure if its helpful but it seems to work.

jesdynf · August 30, 2022, 2:18pm

A single instance routing to S3 might work okayish like that. In this configuration S3 is just a hot backup, no different from an rsync. You’re storing, but not serving, from S3. You wouldn’t be able to use a cluster of workers to serve from it – you could try, but that’s when the whole thing would fall to pieces.

Marco_Meza · April 11, 2024, 8:24pm

Hi,
I wonder if you have completed this feature?
I wish to use S3 as repository for dicom results.

Thanks!

juggernautsei · April 11, 2024, 9:30pm

Yes, it works. I will PM you about the module.

juggernautsei · April 11, 2024, 10:26pm

I took @jesdynf advice and the documents are served from the S3 bucket. It can complete replace the local storage.

casper · January 23, 2025, 10:43am

This is an interesting features hope this can be added as official module in open emr