Skip to end of metadata
Go to start of metadata

CiviLingua - Multilingual site support using internationalization (i18n) and localization (l10n)

Background

CiviCRM up to version 1.5 supports a single localization for a site. A number of languages are supported in terms of having the CiviCRM interface elements translated in .po files, and the ability to store content in utf_unicode_ci character collation. 

OpenConcept Consulting (http://openconcept.ca) has a couple of clients interested in having multilingual Drupal/CiviCRM sites, where the interface and content of the site can be switched back and forth for each page between English and French with a single click. (The team is Mike Gifford - client relations, Joe Murray - project manager, Steve McCullough - developer, and Eliot Che - designer.) We're interested in contributing the code developed for this back to the community, and developing a generalized solution that will support any number of languages. We don't have experience or client funds to provide Mambo/Joomla's support - perhaps Piotr and others can help on that front.

Overview of Phases

civicrm.settings.php has a number of settings that control i18n and l10n. In CiviCRM 1.4 these include support for configuring an installation to provide one of a variety of standardized localizations for the interface language (CIVICRM_LC_MESSAGES) and the presentation of certain data such as currency (CIVICRM_LC_MONETARY), and customizations for address element ordering (CIVICRM_ADDRESS_FORMAT) and dates (CIVICRM_DATEFORMAT_*).

A phased approach is proposed to extending CiviCRM's support for i18n and locale (l10n) modules to support multilingual sites.

Phase 1 will allow a single CiviCRM installation to switch among supported interface languages through the UI or via an API. The intent is to provide initial CiviCRM support for displaying the interface and a single copy of data content (ie date, currency and address information) appropriately when running Drupal i18n and l10n modules. This is expected to be fairly lightweight, but unlikely to be adequate for multilingual sites. It's not clear there would be any benefit in providing this as a stand-alone module, but community feedback on this would help.

Phase 2 will allow site content to be stored for multiple languages, with appropriate content displayed based on the active interface localization. For example, the same address may be written 3 St. Catherine Street West in English and 3, rue St. Catherine Ouest in French, the same Organization may have different English and French names, and honorifics should change, for example from Mr. to M. This phase would provide users with the ability to store and display information in more than one language.

Neither Phase 1 nor Phase 2 will address making CiviMail or CiviSMS multilingual (I imagine there are issues trying to coordinate the use of multi-language versions of some of the relevant entitities). Nor will either phase support storing amounts in multiple currencies. At present I don't see problems providing initial DAO support for making textual fields in CiviContribute, CiviProject and CiviMember multilingual, though they are not our priority and could be dropped.

There are several approaches that may be desired for information on multilingual sites. When data is entered in one language, it can provide the default content for that entry in other languages, or the other languages may default to blank or non-existent. There may also be a desire for workflow to prevent entries from being displayed until they have been translated, and for identifying entries in need of translation. We will be working with our clients to define and support their current needs, and developing code that can be easily extended to work with alternative needs such as these.

Further phases may be specified at a later point. For example, some automated translation of standard addresses is possible and needed for certain purposes (e.g. dealing with Canadian federal electoral files).

Technical Specification

Phase 1: CiviCRM UI

Provide a localization menu option(s) on the CiviCRM menu if CIVICRM_LC_MESSAGES='i18n'. Initially this will be a list at the end of the menu of the localizations that are available, as determined by the presence of .po files for CiviCRM.

Provide a way of enabling languages/locales for multi-lingual sites separate from setting the currently active language/locale. The available languages initially will be a list of the localizations that are available, as determined by the presence of .po files for CiviCRM. An Administration panel option will present a page of all available languages with a checkbox to enable or disable them, and Submit and Cancel buttons.

Phase 1: Technical implementation

  1. Config changes:
    1. civicrm.settings.php: set CIVICRM_LC_MESSAGES='i18n' to tell CiviCRM to use Drupal's i18n/l10n locale
    2. Create i18n/l10n config files to provide locale-specific values for the CIVICRM_ADDRESS_FORMAT, CIVICRM_DATEFORMAT_*, and CIVICRM_MONEYFORMAT constants. These files will be named along the lines of fr_CA.format.php, and will reside in the /l10n/ directory. They provide *_I18N versions of any or all of the format constants, and these localized versions will be used if the 'i18n' flag is passed and if an *_I18N version of the constant exists. (Existing definitions from civicrm.settings.php or the Config.php defaults are used if a locale or constant is unavailable.)
  2. Changes to civicrm/CRM/Core/Config.php constructor function:
    1. Set the UI language:
      1. determine the languages supported by CiviCRM
      2. get the Drupal i18n / locale settings
      3. match any supported language codes
      4. default to 'en_US'
      5. set CIVICRM_LC_MESSAGES and CIVICRM_LC_MONETARY to the derived language code
    2. Fix internal urls by inserting the Drupal i18n code
    3. Load i18n format values from /l10n/ language code file
      (see http://openconcept.ca/drupals_i18n_and_civicrms_interface for details and code)
  3. Schema changes
    There are several approaches to identifying locale for every locale dependent CiviCRM object, and which records contain the same data except in different languages. Most data access on a multilingual site will be for a particular language. Occasionally there will be a need to switch languages, or perhaps display different languages simultaneously to assist in translation. Implementing the many-to-many relationship between records instantiating the same content in different languages can be done in several ways. A similar approach to the pattern used for civicrm_location and civicrm_entity_tag is to have two tables, civicrm_18n and civicrm_i18n_l10n, as follows:
    1. civicrm_i18n table definition
      1. field id - unsigned integer, required, autoincrement, primary key
      2. primary key, index id, made up of id.
    2. civicrm_i18n_l10n table definition
      1. field id - unsigned integer, required, autoincrement, primary key
      2. civicrm_i18n_id - unsigned integer, required, Foreign key to i18n ID
      3. field entity_table - varchar (64), required,  le of the object possessing the associated l10n entries (eg Contact, Group, or ACL Group)
      4. field entity_id - unsigned integer, required, Foreign key to the referenced item in the table specified by civicrm_i18n.entity_table.
      5. locale - varchar (64), required, Locale of the item (eg en_CA).
      6. index index_civicrm_i18n_id, made up of civicrm_i18n_id. 
      7. index index_entity_table, made up of entity_table.
      8. index index_entity_id, made up of entity_id.
      9. index index_locale, made up of locale.
    3. Every time a new localizable entity like a contact is created in one language/locale, others will be created in that table for every other enabled locale, an entry will be created in civicrm_i18n for the 'translation set,' and a set of records in civicrm_i18n_l10n, one for each new record in contact table (localized contact) and all sharing the same civicrm_i18n_id will be created in civicrm_i18n. Let's clarify with two examples.
    4. Adding contact entity to Canadian French and English site
      Two records would be added to civicrm_contact, one with locale of en_CA (id 7), the other with locale of fr_CA (id 8). A new record would be created in civicrm_18n (id=22). Then two entries would created in civicrm_i18n_l10n, one with 'entity_id' = 7 and 'entity_id' = 8, both with 'civicrm_i18n_id' set to 22 and with entity_table = 'contact'.
    5. Adding contact entity to Canadian French, Canadian English, and Brazilian Portugese site
      Three records would be added to civicrm_contact, with locales of en_CA, fr_CA, and pt_BR, and ids of 9, 10, and 11. One record would be added to civicrm_i18n (id 23). And three records would be added to civicrm_i18n_l10n, all with civicrm_i18n_id = 23 and 'entity_table' = 'civicrm_contact', and with the following values for 'entity_id': 9, 10, 11.
    6. Tables affected
      All CiviCRM tables will have a new DAO static boolean method, isLocalizable(), which returns whether the table contains localizable content. Localizable tables will have two new DAO static methods returning arrays of field names: getLocalizables() and getNonLocalizables(). The following tables will not be localizable:
      civicrm_acl
      civicrm_acl_group_join
      civicrm_geo_coord
      civicrm_state_province
      civicrm_validation
      civicrm_dupe_match
      civicrm_entity_file
      civicrm_mailing*
      civicrm_financial_trxn
      civicrm_premiums_product
      civicrm_entity_tag
      civicrm_mapping_field
      civicrm_phone
      civicrm_uf_match
      civicrm_subscription_history
      civicrm_membership_payment
      civicrm_membership_log
      civicrm_email
      civicrm_im
      civicrm_uf_join
      civicrm_group_contact 
  4. API
    1. There needs to be interfaces to
      1. enable/disable support for i18n / l10n content by enabling the CiviLingua component
        1. when enabled,
          1. insertions create records for each enabled language/locale and associated many-to-many mappings between different locale versions of the same data
          2. deletions remove all locales versions of the data, and all many-to-many mappings linking them
          3. changing current locale returns the same content in the newly selected locale
        2.  when disabled, there is no UI element for changing languages, and all data is stored in a single language
          1. turning off is equivalent to disabling all languages except the current language
          2. warnings will be issued before destroying all multi-lingual data
      2. enable / disable a locale
        1. enabling a locale causes all existing locale dependent nodes to have content instantiated in the new locale by copying the data in the currently active locale records, and creating all relevant many-to-many locale mapping records to be created
        2. disabling a locale requires confirmation from the user, and results in all content for that locale being deleted from the database. At least one locale must be enabled.
      3. set the current locale
        1. this changes a filter for all access to locale dependent tables in the db when support for i18n/l10n is enabled 
        2. returns an error if support for i18n / l10n is not enabled

Phase 2: CiviCRM UI

A user interface is provided that allows users to view and edit objects needing to be translated. When more than one locale is enabled, inserting localizable CiviCRM objects results in untranslated versions of the object being created for locales that are enabled but not the current locale. When untranslated objects are editted, the overridable default will be to flag the object as translated. Users will be able to indicate when updates to content in one locale should result in flagging related content in other locales as untranslated. Deletion of localed content will result in the deletion of all related content for other locales.

Resources: 

OASIS ebusiness international name, address, crm data standards:
http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=ciq

International Addresses

http://www.columbia.edu/kermit/postal.html

Labels:

Creative Commons License
Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution-Share Alike 3.0 United States Licence.