Project

General

Profile

Internationalization

Applications which use the same encoding for all inputs (source code and schemata) as the default for the system on which the Code Analytics tools are executed, can ignore this chapter. For other applications it is important to have an understanding of the internationalization requirements. Using the wrong locale will often cause failures in parsing.

Input File Encoding

All input files should be encoded with the same character set. Find out the character encoding for the source code. Make sure that all source code files use that same encoding. When the schema files (.df files) are exported, make sure to use that same encoding in the export. When the data is dumped/exported from the database (.d files), make sure to use that same encoding. If there are any other files or inputs for the application, ensure they use that encoding. Anything that is not encoded properly should be converted.

This encoding must be the same character set that will be used when running the FWD commands. If it is not the default character set, then the default locale must be overridden when the FWD commands are executed. See Setting the Locale for FWD for details.

Operating System Locale Support

The locale is usually a set of data files that are read and used during processing of the C runtime library and/or the operating system APIs. Subsystems such as Java will depend upon the locale definitions to properly handle I18N issues.

If the specific locale needed is not yet defined in the operating system's locale definitions, it must be added. There may be an installation program or other utilities to handle this. On Linux, character sets, locales and the tools to compile new locale definitions are included with the C library. The most important tool is named localedef.

1. To examine the currently installed locales and character sets, identify the paths used on the system. Run this:

localedef --help

Near the end of the output, there will be a display like this:

System's directory for character maps : /usr/share/i18n/charmaps
                       repertoire maps: /usr/share/i18n/repertoiremaps
                       locale path    : /usr/lib/locale:/usr/share/i18n

Look inside the /usr/lib/locale/ (in this case) to find the locales that are already installed. If the required locale is not present, it will need to be compiled.

2. To compile the proper locale definition for Linux, use a command similar to this:

sudo localedef --no-archive -f IBM866 -i ru_RU ru_RU.IBM866

--no-archive causes the compiled locale definition to be created with a name the same as the last command line parameter (ru_RU.IBM866). The definition will be created in a new directory of the same name which is in the main locale path (usually /usr/lib/locale/). In this case the compiled locale will be named ru_RU.IBM866 and it will be located in /usr/lib/locale/ru_RU.IBM866/.

The -f parameter specifies the character map name to be used. There must be a <character_map_name>.gz in the directory where character maps are stored (usually /usr/share/i18n/charmaps/). In this case the character map is IBM866 and there should be a file named /usr/share/i18n/charmaps/IBM866.gz.

The -i parameter specifies the language and country definitions to use. There must be a <lang>_<country> file of the same name in the directory for the input definitions (usually /usr/share/i18n/locales/). In this case the language is ru, the country code is RU so the parameter is ru_RU and there must be a file named /usr/share/i18n/locales/ru_RU.

3. Confirm the new locale is visible in the locale list using locale -a. It is important to check this in a regular user account (not root), because the permissions of the locale directory can hide a locale from normal users. The <lang>_<country>.<charset> locale should appear in the list. If it does not, ensure the new locale's file system permissions are correct. They should look like this:

/usr/lib/locale:
drwxr-xr-x 3 root root pl_PL.IBM852

/usr/lib/locale/pl_PL.IBM852/:
-rw-r--r-- 1 root root LC_ADDRESS
-rw-r--r-- 1 root root LC_COLLATE
-rw-r--r-- 1 root root LC_CTYPE
-rw-r--r-- 1 root root LC_IDENTIFICATION
-rw-r--r-- 1 root root LC_MEASUREMENT
drwxr-xr-x 2 root root LC_MESSAGES
-rw-r--r-- 1 root root LC_MONETARY
-rw-r--r-- 1 root root LC_NAME
-rw-r--r-- 1 root root LC_NUMERIC
-rw-r--r-- 1 root root LC_PAPER
-rw-r--r-- 1 root root LC_TELEPHONE
-rw-r--r-- 1 root root LC_TIME

This can be achieved with the following commands (the order is important):
sudo chmod 0755 /usr/lib/locale/pl_PL.IBM852/
sudo chmod 0644 /usr/lib/locale/pl_PL.IBM852/*
sudo chmod 0755 /usr/lib/locale/pl_PL.IBM852/LC_MESSAGES/
sudo chmod 0644 /usr/lib/locale/pl_PL.IBM852/LC_MESSAGES/*

Note that some Linux system updates may modify permissions on these file system resources. If you find error messages to this effect, reissue the chmod commands above to restore permissions to their proper settings.

Setting the Locale for FWD

On Linux, the locale of the current process is set using the LANG environment variable. On most Linux systems, by default the LANG is set to en_US.UTF-8 (English language, US country with the UTF-8 character set). This will be picked up by all FWD tools (conversion or runtime) since the Java Virtual Machine (JVM) will naturally honor the default locale (LANG setting). The JVM is the infrastructure that allows Java programs to execute.

To force all input/output processing for the JVM to a specific locale, use the following syntax:

LANG=<locale> <command>

This overrides the LANG environment variable for the lifetime of that specific command. Alternatively, this can be set as the system default or as the default for the user's shell based on startup script entries (e.g. ~/.bashrc).

This must be used with all FWD conversion tools (e.g. ReportDriver, ConversionDriver and PatternEngine). Likewise, it must be used to start FWD servers, FWD clients and any FWD batch processes (e.g. ServerDriver or ClientDriver). This example runs the bogus Whatever program with a Russian locale:

LANG=ru_RU.ibm866 java -classpath $P2J_HOME/p2j/lib/p2j.jar com.goldencode.p2j.Whatever

Whenever possible it is best to run the Java process using UTF-8 as the default character set. If the only requirement for an override is for the conversion process to read source code files (external procedures, classes and include files) that have been encoded in a specific charset, then the preferred method to handle this is to set a global hint (e.g. put it in the abl/directory.hints file) using the source-charset hint. See Conversion Hints and look in the preprocessor section.


© 2004-2017 Golden Code Development Corporation. ALL RIGHTS RESERVED.

Next: Parsing