I’m perplexed here regarding character sets and key limits.
It appears that the php-gacl database will not work if forced to a UTF-8 character set, because some of its key/indexes limits break 1000.
For example, if i create table by command:
create database openemr default character set utf8;
Then run the install script (choose i’ve already made the database), then i get several of these errors:
1071: Specified key was too long; max key length is 1000 bytes
Am I missing something, or would it be best to ensure the php-gacl database is utf8 compatible (probably simple to do). If i do reduce sizes of indexes/keys though will that make it difficult for migrating data (ie if a user wants to migrate their external php-gacl data into the embedded gacl database).
I’m looking at the GACL database structure and don’t understand how the database key is exceeding 1000 characters. I’ve read the forum posts describing the trouble but I can’t figure out how using three fields totalling 471 characters goes past the 1000 character limit.
Have you done research into why this is happening?
utf8 is a multi-byte character set. In Mysql, 1-3 bytes are stored for each character. In the index, I see that they choose to store 3 bytes regardless of the character’s actual length, so that limits you to 333 chars.
Been taking a crash course in MySQL especially in regards to UTF-8. The default out of box MySQL settings are the ‘latin1’ character set and ‘latin1-swedish-ci’ collation, so this problem will only happen with users/developers who go out of their way to change the mysql defaults to UTF-8 (generally a bad idea if want to stay compatible with legacy applications).
There doesn’t seem to be any plans by MySQL to change their default character set of ‘latin1’ (MySQL 6 user manual still reports this is the default). Although it does appear like there are some in the debian threads that would like to change default character set to UTF-8; but again this wouldn’t be good for any legacy apps.
Although most of OpenEMR’s database is in UTF-8 format (there are actually some tables that are still ‘latin1’), it does fine because it actually explicitly creates these tables in UTF-8 (thus over-riding the ‘latin1’ default of mysql).
Converting php-gacl to UTF-8 doesn’t appear as simple as I initially thought. If the database creation script was a normal mysql script then it’s simple to limit the size of indexes and keys without decreasing the size of the actual data fields. However, php-gacl’s database installation script is in a XML ADODB schema, which doesn’t seem to allow this.
So, considering the time/testing required and the small audience whom are effected, I’m thinking the conversion of php-gacl to UTF-8 could be postponed until after 3.0 release (unless somebody knows of a very simple way to do it). It could be done along with a ADODB upgrade in future version of openemr.