Conversion Locale - FWD - Golden Code Redmine

Input File Encoding¶

Optimally, all input files would be encoded with the same character set. Find out the character encoding for the source code. Make sure that all source code files use that same encoding. When the schema files (.df files) are exported, make sure to use that same encoding in the export. When the data is dumped/exported from the database (.d files), make sure to use that same encoding. If there are any other files or inputs for the application, ensure they use that encoding. Anything that is not encoded properly would be best if it is converted to the common charset.

In FWD, set the source-charset global parameter in the p2j.cfg.xml. This will define the default for all conversion processing even if it is different from the default JVM encoding (which should remain as UTF-8).

In the case where some inputs have a different encoding, then the conversion process can be configured to read source code files (external procedures, classes and include files) that have been encoded in a specific charset, by setting the source-charset hint in either a directory-level or a file-specific hints file. See Conversion Hints and look in the preprocessor section.

Do not change the default locale of the JVM when the FWD commands are executed. See Setting the Locale for FWD for details. The default encoding of the JVM should remain UTF-8.

Operating System Locale Support¶

The locale is usually a set of data files that are read and used during processing of the C runtime library and/or the operating system APIs. Subsystems such as Java will depend upon the locale definitions to properly handle I18N issues.

If the specific locale needed is not yet defined in the operating system's locale definitions, it must be added. There may be an installation program or other utilities to handle this. On Linux, character sets, locales and the tools to compile new locale definitions are included with the C library. The most important tool is named localedef.

1. To examine the currently installed locales and character sets, identify the paths used on the system. Run this:

localedef --help

Near the end of the output, there will be a display like this:

System's directory for character maps : /usr/share/i18n/charmaps
                       repertoire maps: /usr/share/i18n/repertoiremaps
                       locale path    : /usr/lib/locale:/usr/share/i18n

Look inside the /usr/lib/locale/ (in this case) to find the locales that are already installed. If the required locale is not present, it will need to be compiled.

2. To compile the proper locale definition for Linux, use a command similar to this:

sudo localedef --no-archive -f IBM866 -i ru_RU ru_RU.IBM866

--no-archive causes the compiled locale definition to be created with a name the same as the last command line parameter (ru_RU.IBM866). The definition will be created in a new directory of the same name which is in the main locale path (usually /usr/lib/locale/). In this case the compiled locale will be named ru_RU.IBM866 and it will be located in /usr/lib/locale/ru_RU.IBM866/.

The -f parameter specifies the character map name to be used. There must be a <character_map_name>.gz in the directory where character maps are stored (usually /usr/share/i18n/charmaps/). In this case the character map is IBM866 and there should be a file named /usr/share/i18n/charmaps/IBM866.gz.

The -i parameter specifies the language and country definitions to use. There must be a <lang>_<country> file of the same name in the directory for the input definitions (usually /usr/share/i18n/locales/). In this case the language is ru, the country code is RU so the parameter is ru_RU and there must be a file named /usr/share/i18n/locales/ru_RU.

3. Confirm the new locale is visible in the locale list using locale -a. It is important to check this in a regular user account (not root), because the permissions of the locale directory can hide a locale from normal users. The <lang>_<country>.<charset> locale should appear in the list. If it does not, ensure the new locale's file system permissions are correct. They should look like this:

/usr/lib/locale:
drwxr-xr-x 3 root root pl_PL.IBM852

/usr/lib/locale/pl_PL.IBM852/:
-rw-r--r-- 1 root root LC_ADDRESS
-rw-r--r-- 1 root root LC_COLLATE
-rw-r--r-- 1 root root LC_CTYPE
-rw-r--r-- 1 root root LC_IDENTIFICATION
-rw-r--r-- 1 root root LC_MEASUREMENT
drwxr-xr-x 2 root root LC_MESSAGES
-rw-r--r-- 1 root root LC_MONETARY
-rw-r--r-- 1 root root LC_NAME
-rw-r--r-- 1 root root LC_NUMERIC
-rw-r--r-- 1 root root LC_PAPER
-rw-r--r-- 1 root root LC_TELEPHONE
-rw-r--r-- 1 root root LC_TIME

This can be achieved with the following commands (the order is important):

sudo chmod 0755 /usr/lib/locale/pl_PL.IBM852/
sudo chmod 0644 /usr/lib/locale/pl_PL.IBM852/*
sudo chmod 0755 /usr/lib/locale/pl_PL.IBM852/LC_MESSAGES/
sudo chmod 0644 /usr/lib/locale/pl_PL.IBM852/LC_MESSAGES/*

Note that some Linux system updates may modify permissions on these file system resources. If you find error messages to this effect, reissue the chmod commands above to restore permissions to their proper settings.

Setting the Locale for FWD¶

On Linux, the locale of the current process is set using the LANG environment variable. On most Linux systems, by default the LANG is set to en_US.UTF-8 (English language, US country with the UTF-8 character set). This will be picked up by all FWD tools (conversion or runtime) since the Java Virtual Machine (JVM) will naturally honor the default locale (LANG setting). The JVM is the infrastructure that allows Java programs to execute.

To force all input/output processing for the JVM to a specific locale, use the following syntax:

LANG=<locale> <command>

This overrides the LANG environment variable for the lifetime of that specific command. Alternatively, this can be set as the system default or as the default for the user's shell based on startup script entries (e.g. ~/.bashrc).

This must be used with all FWD conversion tools (e.g. ReportDriver, ConversionDriver and PatternEngine). Likewise, it must be used to start FWD servers, FWD clients and any FWD batch processes (e.g. ServerDriver or ClientDriver). This example runs the bogus Whatever program with a Russian locale:

LANG=ru_RU.ibm866 java -classpath $P2J_HOME/p2j/lib/p2j.jar com.goldencode.p2j.Whatever

At this time all Java processes for FWD expect to use UTF-8 as the default character set. Modifying the LANG for the JVM is to a non-UTF-8 value is NOT SUPPORTED. Setting the LANG to modify the language and country is supported, so long as it is still using UTF-8 as the charset (e.g. <lang_countrycode>.UTF-8).

Project

General

Profile

FWD

Wiki

Input File Encoding¶

Operating System Locale Support¶

Setting the Locale for FWD¶