drbowen wrote on Thursday, July 31, 2008:
There is an old running debate between two factions in web management.
Method 1) Store images in a directory. The SQL database just holds a pointer that points to a particular file on the hard drive.
Method 2) Store everything in the database. Store images as BLOBS. Feed the images from the database as needed.
1. Hard drive
Pros:
Modestly simpler to program
very fast database response
(pulling the images may be subject to how fast the hard drive is and how far out on the disk the images are being stored.)
Cons:
If the server is attacked successfully or semi-successfully, it may allow access to the images which are confidential information exposing the owner to legal liability.
To get full access the attacker still has to compromise the MySQL database. Unfortunately, since MySQL is really popular on the web, there are lots of attackers who have a lot of practice at this.
The typical attackers may not know what they have if they successfully compromise the server and may just want to vandalize your system.
Professional attackers will be looking for stuff they can sell. Such as very carefully validated names, addresses, telephone numbers, personal ID numbers (such as SSN in the USA) for purposes of sale to third parties for identity theft. The favorite attacks are SQL injection where malformed SQL statements can cause your server to dump all of the demographic data to the screen of the attacker. A successful SQL injection would be piped to a file on the attackers machine where it can be sold to a less than savory third party.
You have to back up the SQL database and the documents directory (where all your images are kept). This is by its nature a two step process. (rsync is awesome at this). One caveat, if you rsync the MySQL database you need to stop the MySQL process before beginning and then restart when you finish.
All of the images are stored as individual files in one really big directory. Depending on the operating system of the host, the sheer number of files may overwhelm the hosts ability to keep track of the files. Not only are there host operating system limitations, but file system limitations. You won’t find out until you run out of room then everything just stops without warning. This is not as much a problem now as it was 10 years ago when Linux struggled with lots of files in a directory.
It is important to maintain relative paths to these files to allow upgrades to newer / faster hardware as needed. File pemissions most also be set correctly to help improve security.
2. BLOBS in the database Pros:
Single point of entry for security. If you secure the database correctly, maintain good password policies, this makes administrative life easier.
Allows the use of MySQL replication to provide real time backup of the database.
Cons:
Slower response by the database. I personally, have not had an issue with this. I set up a metadata table and the metadata table has a pointer to the image table. This gives very fast response. You can probably measure faster reponse with database performance tools. I don’t think normal users can tell the difference.
While the number of files is not an issue, the maximum size of files is. The image table gets huge. The default OpenEMR table is MyISAM. MyISAM tables have a 4 gigabyte hard limitation. One quickly has to convert the table type to InnoDB to be able to store the data. InnoDB has a lot of advantages including better data integrity, extremely flexible when it comes to adding more space. One has to watch the maximum file size in the underlying operating system, file system and whether you are running 32 bit or a 64 bit system. Again this situation is much better tahn it was ten years ago. The new table limit size is measured a unit of number so huge that I have trouble remembering how many zeros it has. (32 TiB)
Note in the above: 1024 Bytes = 1 KiB; 1024 KiB = 1 MiB; 1024 MiB = 1 GiB; 1024 GiB = 1 TiB; 1024 TiB = 1 PiB; 1024 PiB = 1 EiB
ext2/3 with 8 KiB blocksize (Systems with 8 KiB pages like Alpha only) 32768 GiB (= 32 TiB)
ReiserFS 3.6 16384 GiB (= 16 TiB)
XFS 8 EiB
JFS with 4KiB blocksize 4 PiB
All of these file systems now exceed the maximum disk space that most of us have available to us on our current systems. For instance my curent production server has 4 x 1 TiB in a RAID configuration giving me 3 TiB of storage space. My current InnoDB table holding the images is around 54 GiB.
"The Lowdown on MySQL Table Types
Most people use MyISAM if they need speed and InnoDB for data integrity. You can use more than one or any combination of these table types in your database. Remember to asses the needs of your application before building it. Even though MyISAM is faster than InnoDB in the MySQL world, InnoDB is fast compared to any database engine. With InnoDB you get transactions, speed and integrity three features not usually used in the same sentence. Most of my customers want as much speed as they can get, but at the end of the day, good data integrity lets them sleep at night. "
http://www.developer.com/db/article.php/2235521
MyISAM is reportedly going to be deprecated in future versions of MySQL.
InnoDB does not have tool for a "hot copy" unless you purchase a third party solution.
Most of the "cons" listed above end up having similar problems in the BLOB method.
In the end, which way you go, hard drive with database pointer vs. BLOB in the database ends up being largely personal preference.
This was discussed in the OpenEMR forums in 2004. The vote at the time was 2 votes for method 1 (Rod and Andres), 2 votes for method 2 (Tekkno Genius and myself).
We ended up with the current method because Andres Paglayan is a much faster (and definitely a more elegant coder) than I am.
I, being an independent soul, finished my code and went with method 2.
Sam Bowen, MD