Parameterized translations

rachoac wrote on Thursday, January 07, 2010:

Hi folks,

I’m finding the need to provide translations which require parameterization.

For example, if I need the following text translated:

“Displaying the latest 3 records.”

… doing it this way would be bad (in psuedo code):

echo xl(“Displaying the latest”);
echo " ";
echo $record_count;
echo " " ;
echo xl("records);

… because obviously this example ignores the grammatical rules of most other languages than English. In java land we can declare something like this:

getTranslation( “english”, “translation.key”, collection )
where collection is key=value pairs, where key = position in translation key, and value = the value we want substituted.

So we could create the following translation.key(s) in english and czech, that looks like this:

english:
translation.key = Displaying the latest {0} records.

czech:
Displej {0} nejnovější záznamy.

… and call our translate method like this:

getTranslation( “english”, “translation.key”, collection )
getTranslation( "czech “translation.key”, collection )

where collection is {0} = 3.

This is of course a very rough description of a solution, but you get the point.

If this problem is solved some other way in the code, please let me know. What do you guys think?

sunsetsystems wrote on Thursday, January 07, 2010:

My first thought is to wonder how other software projects do it.  Would you be interested in researching that?

Rod
www.sunsetsystems.com

blankev wrote on Thursday, January 07, 2010:

Dear Rachoac,

sounds like something worthwhile to explore. I still have some Dutch translations that won’t fit in the related available space. Also, in English something can be translated but have two different meanings in Dutch, meaning the same word on one spot need to have “this” translation on one spot and needs “that” translation for the same word in a different context.

Now we have more or less working translations some translators seem to be happy with 25% and others are sufficient with 80% translation. The Dutch translations is 100% translated because some unneeded translations are included in the Brady-Google-translation-sheet, but were not needed to make a usable translated OpenEMR Dutch version, but your observation of different grammatical order is correct. It is sometimes awkward to have a translation that does covers the context but is in a wrong grammatical order, this is very clear with a translation sentence divided in more parts.

Where can I get some more info about this way of translation? And for Developers in translation coding…… is this time and hardship worth the effort since most USERS of OpenEMR know after a while what button to press even with the wrong translations. Another option would be some prohibiting usage of language for Developers. You can only use words and wordings of the Brady translation sheet. Or have the the restriction of complete undivided sentences with the choice of amount at the start or end of a sentence.

But a better look and a more complete translation is a positive action for future translation Versions.

Pimm

ideaman911 wrote on Thursday, January 07, 2010:

Pimm, Rachoac et al;

Might a better approach be to try to always have parameter values as the end of a statement, such as “Latest Records Displayed: 3”, which might better accommodate multi-lingual without having to re-engineer the code?  Just an idea?

Joe Holzer    Idea Man    315-622-9241     im@holzerent.com
http://www.holzerent.com

rachoac wrote on Friday, January 08, 2010:

Rod,

Yes, I’ll do a little poking around to see what some good practices for .php are. The solution I described is pretty standard in J2EE applications; I’m sure there has to be a (L/W)AMP analog someplace.

Thanks,

Aron

rachoac wrote on Friday, January 08, 2010:

Hi Pimm,

One way to avoid the ‘hardship’ you mentioned while still having complete translations is to not change the language defintiion datastore (enabling existing invocations of xl(…) to still work), but still add a new xl function, overloaded with an array of substitutable values.

This new function could have the following signature:

xl( , ,  )

After looking up the value of  based on the , the new method could go ahead and substitute parameters based on the array of values.  Given a defintion that looks like this:

“Hello, {0}, my name is {1}” … and  = {‘pimm’, ‘aron’}, the new xl(…) function would subsittute the first {0} with ‘pimm’, and the second {0} with ‘aron’, and produce the translated string “Hello, Pimm, m name is Aron”.

Does this make sense? I look to Brady and Rod to validate this concept. Seems like a simple way to get true 100% translations while not breaking things.

Thanks,

Aron

bradymiller wrote on Friday, January 08, 2010:

hey,

A real problem would be translators not breaking it. It’s hard enough not having them place tab/enter characters in the fields of the google docs spreadsheet (the scripts check this, but i need to manually fix each one in the google docs spreadsheet, which becomes a real time sink). I’m guessing most translators would ruin the {0} etc. formatting of their translations no matter what we do if keep using the google doc spreadsheet method. Then we’d be worse of then before (the variable pimm and aron would simply go away in translations). I like Joe’s idea for now; avoid using these types of constants altogether.

Could consider changing the google docs system, but my overiding goal when migrating it to google docs with these translations is to beat out entropy (to useless chaos, which is what happened pre 3.0); the current system will never lose translations and will grow as OpenEMR grows(scripts will add the newly added constants). Although the system is rather simple and ugly sometimes, the meanings are there. I’m suggesting current goal should be maintaining status quo during all the meaningful use and gui changes(this won’t be easy in itself, every new developer has the potential to wreak havoc while going through the internationalization learning curve); then when that’s done can start thinking about getting fancy.

-brady

whimmel wrote on Friday, January 08, 2010:

Another way to implement this without having to write any new code is to use sprintf() or printf().  It’s a standard C function so it executes very fast. If you use multiple parameters, you just have to make sure the translations use them in the same order.

 printf( xl("Display the next %d items."), $count);

tmccormi wrote on Friday, January 08, 2010:

The code is fine, but that doesn’t fix Brady’s issue.  The developers creating the translation tables would have to know printf formatting rules.

Tony

bradymiller wrote on Friday, January 08, 2010:

Tony and all,

It’s even more fundamental than the developers. The real weakness of the translation project is the strength of the project; the translators.

Currently, the way it mostly works is somebody emails me to get access to the google docs spreadsheet, and I respond as quickly as possible with the instrucion wiki link and I give them access right away. Then will usually get a hundred or so translations and the translator usually doesn’t come back (novelty wears off, favorite tv show distracts them away, moved on to another project etc.). Using open source principles I never remove a translator’s privileges, because they can come back and always want to keep door open. Needless to say we just got 100 translations that will never go away. Occasionally we get a consistent translator, ie. Pimm and some greek translators.

The point I’m trying to make is that at this point, the translators are mostly “temp” volunteers, so really don’t have the time to train them for anything complicated (ie. keeping formatting of parametric constants). If we established a more rigid training/tutorial before giving spreadsheet access we’d simply lose the “temp” volunteers, which currently provide a significant amount of translations. If the project grows to the point where there are more stable translators etc., then this becomes more of an option.

-brady

blankev wrote on Friday, January 08, 2010:

After reading most remarks concerning Translation Problems, the question arises “ARE WE WILLING TO ATTACK” the real differences in grammatical and make OpenEMR an almost perfect translatable program or are we willing to accept some incompatibilities that could give rise for scrutiny and funny remarks from linguistics or is OpenEMR functioning with its translations as could be found in most translated programs.

Is there a need, or is it cosmetics we are seeking?

If there is a need, like the Dutch translation of the English word  " to: " , than we could try to solve these. But it means that OpenEMR programs get different words or sentences. Another option might be to start work toward future solutions like suggestions made and find the obstacles.

The related question will be how many different problems can we expect? How often do we want to have a different grammatical order in the translations?

I don’t think this should be another headache for Brady and the Google translation documents. But implementing it, converting it, as a silent implementation like the “Patient Photograph” won’t do any harm, but could solve some of the problems and avoid new headaches for the Brady Translationsheet.

U USE “IT”, U C “IT” & IF U R NOT INTERESTED U DON’T C “IT”

Or am I too simple minded?

Pimm

rachoac wrote on Saturday, January 09, 2010:

Regarding this comment from Pimm:

U USE “IT”, U C “IT” & IF U R NOT INTERESTED U DON’T C “IT”

I think that’s where I would ideally want to go with any changes to the translation system. If we make changes, we need to make it flexible enough so that translators who are used to doing it the old way can continue to do so. The system would still work for legacy translation keys.

Brady is correct in that volunteer translators may not give us parameterized translations in the right format. However, if I read it correctly, it looks like the translations end up going to Brady first, where I presume he can vette it for proper formatting. I understand that this puts more burden on Brady or other developers vetting the translations though. We could theoretically build a validator to ensure that definitions don’t violate some key conventions (such as always having a closing ‘}’ if you’ve got an opening ‘{’).

I personally feel like its kind of restrictive to being forced to use canned phrases, which may seem somewhat forced in other languages. For example, I speak Indonesian, and took a look a the Indonesian translation. It seems at times … a little wierd … I can understand it and use it, but it seems a little ‘barbarian-y’ at times.

Regarding how often we want different grammatical order to the translations … little bits of instructional text, legal notices, and such that would round off a good user interface would be practically impossible to achieve without a fully internationalizable interface. We can of course live without these things, but its all about polish and giving the user as good a UI experience as possible, especially if they are using the tool 8-9 hours a day in a mission critical setting!

Hope that helps,

Aron

bradymiller wrote on Saturday, January 09, 2010:

Aron,

Indonesian was created mostly via automated google translations. The contributors did this stating it would be better to start that way, but then haven’t come back. This is why it does not get included in the official OpenEMR releases. So, wouldn’t judge the merit of current system on that language. As an aside, if you’d like to translate Indonesian, I’d be happy to give you editing access to the spreadsheet :slight_smile:

My goal here is not to shut down ideas, just to let you know what internationalization entails. Check out the internationalization developer page, check out the google docs spreadsheet, and check out the scripts in /openemr/contrib/util/ along with the README file there. The whole system is transparent. The goal of this whole thing was to just get the translations stabilized and working. I would be happy to see OpenEMR become more successful, and grow out of the current mechanism. But, at this point it’s just me who update the tables and add constants via the scripts. Sure, you can do all the validating you want, but with the current google doc mechanism, you still need to manually fix the inconsistencies in the google docs spreadsheet; with tabs/enter keys straighforward but still sometimes a time sink. With parametric constants, then what do I change it back to?? multiple this by 3500 constants and 20 or so languages… In this situation, I would likely give up at some point, entropy would win, and the database would degrade to chaos yet again.

Instead of starting from the frontend, I suggest starting from the backend. Think about a way to get away from google docs via some sort of automated translator input site (freemed does something like this using some python functions) for translations that gives directions and does validation in real time for each entry; then everything could be automated, and then could have your parametric functions in the front end. Again, the overriding goal needs to be avoiding degradation of the translations.

-brady

bradymiller wrote on Saturday, January 09, 2010:

too funny,
Right after I hit the submit button on previous message, I realized there is likely a way to automate the validations and keep the trnaslations on google docs. But will require significant effort that I can’t undertake now. Here would be my plan:(To understand what I’m talking about check out openemr/contrib/util/language_translations/ and read the README file there)
- Modify openemr/contrib/util/language_translations/buildLanguageDatabase.pl to also read in the current_spreadsheet.tsv file, which contains the most recent google doc spreadsheet form from the most recent language tables update. Then while validating, if there is an error (such as a tab or enter key where there shouldn’t), then simply copy the old entry there (this way you lose the entry, but it wasn’t correct anyways, and go back to previous correct entry). Once this was solidly working for tab/enter keys (again, need to be stable to avoid losing definitions needlessly), then could validate parameterized functions (easy since if in the english constant, then need to be in the definition). Then would possibly open the door to total automation via the google doc API http://code.google.com/apis/documents/overview.html . Aron, if you end up doing this, then you’d be fully qualified to do the language table updates also in the future, oh wait they’d be automated… :slight_smile:

-brady

blankev wrote on Saturday, January 09, 2010:

New translations arise day by day.

Working in V 3.3 in the right upper corner, there is the word “manual” I tried to convince myself that this word had been translated in the past. Because this comes from the lower left corner in earlier versions.

I was incorrect in assuming this, it is a complete new word for translators. 

In this case, my suggestion would be to accept the word “manual” and get rid of the word “Online manual”. But that needs some programmers work and consistency for future releases and how does anybody know it is an online manual? Well after trial and error everyone knows the facts in just a few clicks.

There are more examples where the exact wordings have been changed just a little bit. But if the same wording as in past version is used it gives less rise for extra translation of constants.

Pimm

tmccormi wrote on Monday, January 11, 2010:

Certainly a tool in the translators and the developers tool kit could be one that does a quick fuzzy search of the tables so when you are adding a new bit of text you can at least see that the example word/phrase “Patient Name” exists as “Pt. Name” already … I have no idea if that’s true, just seemed like a good example.

-Tony

bradymiller wrote on Monday, January 11, 2010:

Tony,

Also could just simply search (via web page or grep) the most recent version of this file, which contains the constants:
http://openemr.cvs.sourceforge.net/viewvc/openemr/openemr/contrib/util/language_translations/currentConstants.txt

-brady

-brady

blankev wrote on Saturday, June 19, 2010:

Brady,

you asked for a reaction “here” on the latest on improvements for translations. I can see something happening for security and constants and drug database and a lot of other things. As a User I can only “see”, so please explain a bit more in detail what difference you programmers and what should be tested or translated(?) so I do something useful?  let’s say I am confused by all the steps forward and I wait for the official V4.0 to test without knowing what you Dev-Guys/Dev-Girls are implementing.

For instance I would like to see for dates:

Begin……… End ……  (I am used to get the answer that this in not important for now and these are still worded as From …… To ……… I explained this in vain for Translation in the past)

But I really got lost for Translations for some time, due to the simple reason of translation above 80% is not needed for regular use). I go the the Google spreadsheet and change words that have to be changed and change them back again and back again, but forget to make a note. For the Dutch Translation there are some more words with double meaning and these can’t be translated into one word.

I suppose this will be the same for Chinese, Greek and Russian……………………………

BTW how are we doing percentage wise with the different translations of OpenEMR? Do we really need 100% translation or should we concentrate on available translation, because there is no need to translate for 100%. For esthetically reasons there is the 100% translation rule, but for USERS of all types Administrators, Doctors and Front-desk personnel, there is a translation need and a learning curve for English that does not need translation.

Words like: Save, Database, Manual, Back, User etc…… most probably do not need translations since these words are the same for any computer and these are understandable in every language that teaches their kinds with the use of computers.

My questions are:
Brady (1) what should I look for ?
(2) What to do (?) are we need of 100% translations.

If this last 100% is the goal, let’s activate translation efforts in every language.

I am looking (lurking) every day into Development/Help/Users, and conclude on the amount of postings, that a lot of good things are happening to OpenEMR, but as a User I am confused about the goals and their outcome. To me it seems like lot of thing happening but on my Users side of the screen OpenEMR is still working as before only with many new and nice tweaks.

Re-consider the change to V4.0. It seems like “Globals.PHP”, “Security changes” and “Drugs-DB” are very important improvements and these confuse the newcomers a lot. (I do understand your hesitance and can live with the patches of V3.2 . From the many Forum-questions on the same topic and for answering, I do conclude that it would be nice if we had the same version…… V4.0 with latest improvements. So comments can be directed to the implemetation of the latest versin.

Pimm

bradymiller wrote on Saturday, June 19, 2010:

hey,

I was waiting for a message from you; was beginning to worry.  Goal on the translation side is to just keep them stable, and keep them updated with the 4.0 development (I’m a bit behind there, definitely due for an update soon (gonna shoot for an update on july 3rd weekend). Got a lot of translations for a new language, Danish, so that will be cool.

Two main goals now are Meaningful Use and Security. In the US, meaningful use is required to keep this project feasible. And security is required to allow OpenEMR in larger practices and for companies to offer the service over the internet; not to mention official reports of our security vulnerabilities are starting to arise.  Not accomplishing either of these goals would lose a large userbase potential, so mine and most other developers efforts have been targeting those goals. The next official release will be decided by the community, but I’m guessing is months away before the development push starts to cool off; this is why we’ve been issuing both bug fixes and new features in the 3.2 patches.

The best thing you can do for now is recruit more translators for other languages and keep testing on the cvs demo.

-brady

saikensf wrote on Saturday, June 19, 2010:

I’ve used a translator system similar to yours but with a few twists you might find interesting.  I don’t think it would be feasible to implement the entire thing, but cherry picking features might be of value.  Items 3 and 4 in particular.

Definition - key: the value passed to xl to get the translated string.  xl( key, flag, prefix, postfix );

# 1 - The default string for each key was stored in the translator table in its own column.
- - - Table Structure:  Key, Lang_code, Current Value, Default Value, order -> PK ( Key, order )
- - - To keep the strings varchar, and thus likable, overflow would fall into a new insert of a sequentially higher order number.

# 2 - The keys themselves were named by their page name and a description, like login_welcome.
- - - Pro:  This avoided OpenEMR’s ambiguity problem by creating a 1 to 1 mapping from DB to Screen.
- - - Con:  The translation table was a lot bigger.  Added a little more work to page creation due to DB inserts.
- - - Arguable: Looking at the page code you didn’t see the actual text.  Some coders loved this as removing a distraction.  Some hated it as they felt it denied them an important piece of the big picture.

# 3 - The admin users had an interface where they could change the current value of any string or revert to the default value.
- - - They could search either by page or by text.  So show me everything on login or everything containing “Welcome”. 
- - - Their input was sanitized to strip it of any control characters except basic HTML formatting tags like bold.
- - - They could immediately see their edit live, instant gratification and instant QA.

# 4 - a limited set of special keywords was defined that were determined programmatically.  %Location%, %Vender%, %User%, %Today%, %Var%, etc and anytime they were found in a string they were replaced with the appropriate value from session or a blank in translation.  %Var% being an optional argument passed to some translator calls that resolved to empty string if nothing was passed.
- - - The 1st, 2nd, 3rd, nth issue would be resolved by re-writing the string to not need it, like “Record Count: %var%”.

A user control interface might help with translator training and retention.  There is greater satisfaction in the work if you can see it change a site before your very eyes ( I did that! ).  And variable resolution would be very easy to implement.  Just call a few sub string replaces on the translated string before echoing or returning it.

-Simone