Gerd Altmann

Preface

In the upcoming 5.16.xx kbmMW Enterprise Edition will receive another new feature, internationalization also called I18n.

The idea about i18n, is to be able to translate an application in such a way that it is fully usable in other languages, and preferably it also contains the ability to allow end users easily to switch to their preferred language.

i18n

At first it seems simple… replace all texts with constant codes, and then look those up in a table and output the result of that lookup, and yes… most solutions, including the famous GetText (https://en.wikipedia.org/wiki/Gettext) solution that many mimics as a defacto standard.

Basically GetText makes a fairly simple lookup to translate one text to a different one.

In later incarnations it has also gained some support for handling translation differently depending on a count. In many languages a translation which incorporates a count, changes depending on the count number, zero, one or many, and in other cases the count do not matter.

Example:

  • I have 1 brother
  • I have 2 brothers
  • I have 0 brothers

another example

  • I have one cookie
  • I have a few cookies
  • I have many cookies
  • I have no cookies

In English, it is obvious what the wording should be, but if you would translate the later example to Polish, you would have 4 different texts, while Japanese generally do not change wording based on plurals.

However for kbmMW I have chosen to take it a step further, and have been inspired by a more modern solution that is very popular in modern Javascript based development, i18next (https://www.i18next.com/).

What distinguish i18next from GetText? Well one important thing is that i18next supports context sensitive translation. It is sort of an extension to the plural based translation.

Example:

  • My sister she is nice
  • My brother he is nice

In the above example, the context is either sister or brother. If we are referring to my sister, the sentence will, in English require use of she alternatively he. In other languages there may be no differentiations.

Another example:

  • One of my secretaries was remarking only this morning how well and young I am looking

In this example, secretaries is a context word which controls how the translation would be, because it is interpreted as a different gender in different languages. If another title would be used instead of secretaries, the translation would be different.

  • French: Un de mes secrétaires [male]
  • Italian: Uno dei miei segretari [male]
  • Spanish: Una de mis secretarias [female]
  • Portuguese: Uma das minhas secretárias [female]
  • German: Einer meiner Sekretäre [male]

As you can see, translation is not always just as simple as replacing one text with another, because context may get into the equation.

i18n the kbmMW way

Neither i18next nor kbmMW’s i18n is the perfect tool for translation, but both gets closer to the perfect solution than others who do not consider context sensitive translation.

I looked at i18next’s configuration files, which defines how translations are done, and figured that they were quite ugly and not really suited for a framework like Delphi. Further i18next only supports one level of context, while I decided to add support for multiple levels of context within the same sentence.

Further i18next handles the concept of pluralization as a separate topic, while I found that pluralization is simply a variation of the context. So by handling multiple levels of context, kbmMW’s i18n automatically supports both traditional context (for example gender) and pluralization in the same sentence.

Why not just use Delphi’s built in translation solution? Well the built in Delphi translation solution is generally based on constant resource strings, which are compiled to DLLs and used as a simple lookup translation. So you will need a tool to generate those DLLs and you will not have context sensitive translation. Further strings in your application will need to take advantage of resource strings and thus define a constant integer value for each string you will want to translate.

It is somewhat cumbersome and IMO not really a good way.

kbmMW’s i18n supports loading the internationalization from various storages, which currently includes JSON and YAML based file formats. I personally find that YAML format is the easiest to digest for the human eye, and it also supports entering comments, which JSON do not support.

Other formats can be added, which could cross support existing translation description formats.

To make access to i18n easy, a singleton instance is readily available when you include the unit kbmMWI18N in your application. The singleton is named i18n and will be used for all translation related functionality. It is possible to make your own instances of TkbmMWI18N if you so wish, but it should rarely be needed.

kbmMW’s i18n supports two ways to translate, auto translation of select properties on components and forms/frames, and translation of static and dynamic texts, used within the code. In addition it supports setting other properties, like size and position values, in case components require some slight rearrangement to fit a translation.

Translation in code

It can be done in a couple of ways, either simple translation of a string without any consideration of context, or translation using a format string, where the arguments can be used as context.

ShowMessage(i18n.Translate('This is some text to translate'));

It will attempt to translate the static text to the language that is currently selected. We will see shortly how to define languages, how to load them in and how to select them.

ShowMessage(i18n.Format('This is %s %d of %d',['brother',1,3]));

This is working quite much as a regular Delphi format and support all its format specifiers, but also support additional formatting of date and time. The arguments will be understood as regular arguments, and optionally (depending on the language specification) also as context controlling values.

You can abbreviate Translate and Format by simply using an underscore. The following examples are doing exactly the same as the above examples:

ShowMessage(i18n._('This is some text to translate'));

and

ShowMessage( i18n._('This is %s %d of %d',['brother',1,3]));

Translation of forms and components

It is simple to translate any component or form.

You just register that component or form with the i18n instance. Then the properties of the form/components and subcomponents will automatically be attempted to be translated when the current language changes.

The following will be a typical usage, you register the current form self with the translation framework, for example in the forms AfterConstruction method, or whenever the form has been fully constructed.

 i18n.RegisterComponent(self);

If you have multiple forms, and you instantiate and release them on the fly.

It is good practice to call:

 i18n.UnregisterComponent(self);

before the form/component is released. kbmMW will, however usually detect destruction of a TComponent instance, and deregister it automatically from translation.

Loading language files

As mentioned before, the language file can be in YAML or JSON format (or any format that has been registered with kbmMW’s i18n framework).

A language file can contain a single language translation, or multiple language translations.

Any language is identified by a name. The name can be anything, but I recommend that it follows the typical ISO 639-1 language code standard (https://www.w3schools.com/tags/ref_language_codes.asp) extended with a country code (https://www.w3schools.com/tags/ref_country_codes.asp). Eg. da-DK, en-US etc.

The following code will load the language(s) defined in the YAML file translation.yaml:

 i18n.Load('','yaml','file:..\\..\\translation.yaml');

The first argument to Load is the name of the language to load. If an empty string is given, all languages found in the file are loaded. The next argument is the format to use, and the final argument is the settings for that format. In this case, it refers to a file placed two directories above the executables current run directory.

Changing the current language

Loading the language(s) do not alter the applications translation. Not until you actively choose to change the current language.

i18n.CurrentLanguage:='da-DK';

If a language with the name da-DK has been loaded, everything will automatically be translated according to that languages translation rules.

Obviously you can query what the currently selected language name is right now, by checking the CurrentLanguage property.

At any time, you can get an array of loaded language names:

var
a:TArray<string>;
begin
a:=i18n.LanguageNames;
...
end;

And you can get language captions and descriptions and, if defined, which graphic files should be used to show their flags:

var
   a:TArray<string>; 
begin
   a:=i18n.Languages.GetCaptions;
   a:=i18n.Languages.GetDescriptions;
   a:=i18n.Languages.GetFlags(true); // Get the file names for the small sized flags.
...
end; 

The language file (YAML variant)

Ok, now I have shown the relevant simple methods and properties, it is time to get into how you describe a language translation file.

To make it simple, I will show an example and explain from the example:

languages: 
  da-DK: 
    caption       : Dansk
    description   : "For folk der bedst forstår Dansk"
    flag          : 

      # some small flag
      small: ".\\DK_64x64.png"

      # a larger flag
      large: ".\\DK_512x512.png"

    formatSettings: 
      currencyString           : dkr
      currencyFormat           : 3
      currencyDecimals         : 2
      shortDateFormat          : "%D-%M-%Y"
      longDateFormat           : "%D. %M2 %Y"
      shortTimeFormat          : "%H:%N"
      longTimeFormat           : "%H:%N:%S"

      # Short Month Names
      shortMonthNames          : [ Jan, Feb, Mar, Apr, Maj, Jun, Jul, Aug, Sep, Okt, Nov, Dec ]
      longMonthNames           : [ Januar, Februar, Marts, April, Maj, Juni, Juli, August, September, 
                                   Oktober, November, December ]
      shortDayNames            : [ Søn, Man, Tir, Ons, Tor, Fre, Lør ]
      longDayNames             : [ Søndag, Mandag, Tirsdag, Onsdag, Torsdag, Fredag, Lørdag ]
      thousandSeparator        : "."
      decimalSeparator         : ","
      twoDigitYearCenturyWindow: 50
      negCurrFormat            : 8
      negativeCurrencyFormat   : 1
      dateSeparator            : /
      timeSeparator            : ":"
      listSeparator            : ","
      timeAMString             : AM
      timePMString             : PM

    properties    : 
      Form1.btnLoadLanguage.Caption   : Hallo
      Form1.Caption                   : Dansk
      Form1.btnLoadLanguage.Height    : 68

    phrases       : 
      Hallo                               : Hallo
      OK                                  : OK
      "Dette er en dato %{SHORTDATE}"     : "Dette er en dato %{SHORTDATE}"
      "Dette er et tidspunkt %{LONGTIME}" : "Dette er et tidspunkt %{LONGTIME}"
      "Dette er en dag %{SHORTDAYNAME}"   : "Dette er en dag %{SHORTDAYNAME}"
      "Dette er en måned %{LONGMONTHNAME}": "Dette er en måned %{LONGMONTHNAME}"
      "Dette er en numerisk værdi %f"     : "Dette er en numerisk værdi %f"
      "Dette er en valuta værdi %c"       : "Dette er en valuta værdi %c" 
      "Jeg har %d søster"                 : 

        # No CONTEXT definition, so all arguments will be considered context
        "1": "Jeg har 1 søster"
        "*": "Jeg har %d søstre"

      "Dette er %s %d ud af %d %s"        : 
        CONTEXT   : [ 1, 3, 2 ]

        # Propose which placeholders arguments should be considered context defining.
        # Starting with 1. The order of the argument indexes are significant.
        # If CONTEXT not defined, all placeholders arguments will be used in default order.
        # This example provides same result as if CONTEXT was not defined.
        # Arguments are numbered from 1. Syntax %{n:format} allows reordering arguments on translation.
        søster/1/1: "This is %{2:%d}. sister of %{3:%d} sisters" 
        bror/1/1  : "This is %{2:%d}. brother of %{3:%d} brothers"
        søster/0  : "There are no sisters"
        bror/0    : "There are no brothers"
        søster    : "This is %{2:%d}. sister of %{3:%d} sisters"
        bror      : "This is %{2:%d}. brother of %{3:%d} brothers"
        "*"       : "This is %{2:%d}. %{1:%s} of %{3:%d} %{4:%s}"

      Form1                               : "Dansk"
      "Current language:da-DK"            : "Nuværende sprog:da-DK"
      "Load language"                     : "Indlæs sprog"
      Learning                            : Lær
      "Save language"                     : "Gem sprog"
      Translate                           : Oversæt
      "Simple translate"                  : "Simpel oversættelse"
      "Format translate"                  : "Formateret oversættelse"
      "Memo1\r\n"                         : "Dansk data i Memo1\r\n"
      "Current language:%s"               : "Nuværende sprog:%s"
      "Learn phrases"                     : "Lær sætninger"
      "Learn properties"                  : "Lær properties"

    propertyNames : [ Text, Caption, Hint, Width, Height ]

  en-GB: 
    caption       : English
    description   : "For people who best understands English"
    flag          : 
      small: ".\\UK_64x64.png"
      large: ".\\UK_512x512.png"

    formatSettings: 
      currencyString           : "$"
      currencyFormat           : 2
      currencyDecimals         : 2
      shortDateFormat          : "%M/%D/%Y"
      longDateFormat           : "%M2 %D. %Y"
      shortTimeFormat          : "%H:%N"
      longTimeFormat           : "%H:%N:%S"
      shortMonthNames          : [ Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec ]
      longMonthNames           : [ January, February, March, April, May, June, July, August, 
                                   September, October, November, December ]
      shortDayNames            : [ Sun, Mon, Tue, Wed, Thu, Fri, Sat ]
      longDayNames             : [ Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday ]
      thousandSeparator        : "\0"
      decimalSeparator         : "."
      twoDigitYearCenturyWindow: 50
      negCurrFormat            : 8
      negativeCurrencyFormat   : 1
      dateSeparator            : /
      timeSeparator            : ":"
      listSeparator            : ","
      timeAMString             : AM
      timePMString             : PM

    properties    : 
      Form1.btnLoadLanguage.Caption   : Hello
      Form1.Caption                   : English
      Form1.btnLoadLanguage.Height    : 38

    phrases       : 

      # Simple translation
      Hallo                               : Hello
      OK                                  : OK
      søster                              : sister

      #  
      # Translation with numerical values and no context variations.
      "Dette er en numerisk værdi %f"     : "This is a numerical value %f"
      "Dette er en valuta værdi %c"       : "This is a currency value %c"

      #  
      # Translation with functional arguments and no context variations.
      "Dette er en dato %{SHORTDATE}"     : "This is a date %{SHORTDATE}"
      "Dette er et tidspunkt %{LONGTIME}" : "This is a time %{LONGTIME}"
      "Dette er en dag %{SHORTDAYNAME}"   : "This is a day %{SHORTDAYNAME}"
      "Dette er en måned %{LONGMONTHNAME}": "This is a month %{LONGMONTHNAME}"

      #  
      # Translation with count context variations. Only one value is provided
      "Jeg har %d søster"                 : 

        # No CONTEXT definition, so all arguments (1) will be considered context
        "1": "I have one sister"
        "*": "I have %d sisters"                            # Fallback translation.

      #  
      # Context specific translation.
      "Jeg har 1 %s"                      : 
        søster: "I have one sister"
        bror  : "I have one brother"
        "*"   : "I have one unknown affiliate"

      #  
      # Context and count specific translation. Context is given as arguments.
      "Dette er %s %d ud af %d %s"        : 
        CONTEXT   : [ 1, 3, 2 ]

        # Propose which placeholders arguments should be considered context defining.
        # Starting with 1. The order of the argument indexes are significant.
        # If CONTEXT not defined, all placeholders arguments will be used in default order.
        # Only first 3 arguments of the 4 provided are considered context and in the specific order 1,2,3.
        # Optional context arguments must be last.
        # Arguments are numbered from 1. Syntax %{n:format} allows reordering arguments on translation.
        søster/1/1: "This is %{2:%d}. sister of %{3:%d} sisters" 
        bror/1/1  : "This is %{2:%d}. brother of %{3:%d} brothers"
        søster/0  : "There are no sisters"
        bror/0    : "There are no brothers"
        søster    : "This is %{2:%d}. sister of %{3:%d} sisters"
        bror      : "This is %{2:%d}. brother of %{3:%d} brothers"
        "*"       : "This is %{2:%d}. %{1:%s} of %{3:%d} %{4:%s}"

      Form1                               : English
      "Current language:da-DK"            : "Current language:da-DK"
      "Load language"                     : "Load language"
      Learning                            : Learning
      "Save language"                     : "Save language"
      Translate                           : Translate
      "Simple translate"                  : "Simple translate"
      "Format translate"                  : "Format translate"
      "Memo1\r\n"                         : "Memo1\r\n"
      "Current language:%s"               : "Current language:%s"
      "Learn phrases"                     : "Learn phrases"
      "Learn properties"                  : "Learn properties"

    propertyNames : [ Text, Caption, Hint, Width, Height ]

The YAML file is positional aware. Hence everything that belongs together must start at the same position on a line. Further YAML property/object names are case sensitive.

In the above example you will notice that an YAML described object named languages has been defined. The object contains a number of properties, which each of them are also objects.
The first one is called ‘da-DK‘ and the second one is called ‘en-GB‘.

These objects contain the actual language translation settings for that particular language. You can have one or more unique language objects in each file.

Each language object, have an optional Caption and an optional Description, and an optional set of flag graphics file paths, one named small and one named large.

Then comes a formatSettings object, which in turn contains the settings for TFormatSettings. You can lookup the Delphi manual for an explanation about its settings. However there is a significant difference in the sense that shortDateFormat, longDateFormat, shortTimeFormat and longTimeFormat use TkbmMWDateTime format specifiers which you can read more about here: https://components4developers.blog/2018/05/25/kbmmw-features-3-datetime/

Next optional object which is named properties, lists any properties for any instantiated component in your application, for which you want set to a specific value. It can be string or numerical properties, and thus allows you to resize or rearrange various controls if needed to make it perform well with your translation.

The values specified here will be used without further translation.

The optional object called propertyNames controls which properties will be scanned for potential translation. In this example, only properties named Text, Caption, Hint, Width or Height will be potentially translated. If propertyNames is empty or not specified, then all non empty string properties will be eligible for translation.

Finally we have the general phrase translation object which should be used for the bulk of translation.

The phrase object will contain any number of unique phrases, as you have defined them in your application. String properties on components, that has not already been translated in the properties section, will be attempted to be translated via the phrases section. The same will all runtime string translations, using Format, Translate or the _ methods.

A translation can be as simple as stating the original value and the translated value, but when using the Format, the arguments may need to be rearranged to make a correct translation, or perhaps the format is context sensitive. Those features can all be provided in the phrases object.

Example of a simple translation:

 Hallo                               : Hello 

If the string Hallo is being used with Translate or _ or any property that is not listed in the properties section contains the string Hallo, it will be automatically translated to Hello if the en-GB language is selected.

Example of a simple format translation:

 "Dette er en numerisk værdi %f"     : "This is a numerical value %f" 

In this case, we make a simple translation, but includes a format specifier for a floating point value. This is used when calling Format or _ in code.

Example of a rearranged format translation:

 "Side %d ud af %d sider"     : "Total %{2:%d} pages. This is page %{1:%d}"  

In this example the order of arguments has changed in the translation. The index of the first argument in the original string is 1, the next is 2 etc. The full format specifier, in this case %d, can be copied over to the translated variant after the colon.

Example of a simple context specific format translation:

"Jeg har %d søster":          
     "1": "I have one sister"
     "*": "I have %d sisters"

This example makes a context specific translation depending on the argument given to Format or _. If the argument is 1, then the text will be translated to “I have one sister”. In all other situations, the translation will be “I have n sisters” where n is the actual number given as the argument. Hence “*” is the default translation for the phrase if no other contexts matches.

Example of multiple argument context specific format translation:

"Jeg har %d %s":
     CONTEXT    : [ 2, 1 ]
     "mand/1"   : "I have one man"
     "mand"     : "I have %{1,%d} men"
     "kvinde/1" : "I have one woman"
     "kvinde"   : "I have %{1,%d} women"
     "*"        : "I have %d %ss" 

This translation is triggered by the Format or _ methods like this:

ShowMessage(_('Jeg har %d %s',[1,'mand']));

In English, the Danish word ‘mand’ will be translated to either man or men, depending on the count, and similarly the Danish word ‘kvinde’ will be translated to either woman or women.

For the English language, we want the ‘mand’ or ‘kvinde’ word to be the primary context word. For that reason I have defined a CONTEXT property which controls in which order the arguments are used in building the context specifier. It is perfectly legal to omit arguments if they have no relevance in the context specifier. Eg:

"Jeg har %d %s":
    CONTEXT : [ 2 ]
    ...

Then only the string argument will be used for defining the context specifier.

In this case I want the ‘mand’/’kvinde’ word to appear first, and the count after. That makes it possible to define wild card style context specifiers, like “mand/1” and “mand”, where the first specifier matches specifically the word ‘mand’ and the count 1, while “mand” matches “mand” with any count. Contexts are attempted matched with most precise context match first. If none are found, another iteration is attempted, without the least significant context word and so on, until either a match has been found, or nothing is matched, after which the “*” match is used.

If no “*” match has been defined, the original string will be used untranslated.

If no CONTEXT property is given, all arguments will be used for defining the context specifier in their original order.

Example of translation with functional arguments

"Dette er en dato %{SHORTDATE}"     : "This is a date %{SHORTDATE}" 

This example show how to use Format or _ to output a date, which will be autoformatted according to the chosen language.

Currently a number of functional arguments are supported:

  • SHORTDATE – Converts a floating point, TDateTime or a TkbmMWDateTime value to a short date.
  • LONGDATE – Converts a floating point, TDateTime or a TkbmMWDateTime value to a long date.
  • SHORTTIME – Converts a floating point, TDateTime or a TkbmMWDateTime value to a short time.
  • LONGTIME – Converts a floating point, TDateTime or a TkbmMWDateTime value to a long time.
  • ISO8601 – Converts a floating point, TDateTime or a TkbmMWDateTime value to an ISO8601 date/time.

The functional argument can be prepended with a argument index number to pick the relevant argument for translation, if more are provided.

"Velkommen %s. Dette er en dato %{SHORTDATE}"     : "At date %{2:SHORTDATE}, we welcome %{1:%s}" 

This is the first version of I18n for kbmMW, and I’m certain new features will be added as it matures, and requirements are detected. One of the next things to add, is the integration between kbmMW SmartBind and kbmMW I18N.

This concludes the first blog post about kbmMW’s new I18N feature. 

Happy translating!

 798 total views,  4 views today

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.