Feature #3753
I18N additions
100%
Related issues
History
#1 Updated by Greg Shah over 5 years ago
- Related to Feature #3292: i18n improvements added
#2 Updated by Greg Shah over 5 years ago
Implement the following I18N features for Phase 1 (POC):
- FIX-CODEPAGE statement
- CODEPAGE-CONVERT (runtime)
- GET-CODEPAGE
- CURRENT-LANGUAGE conversion
- INPUT STREAM CONVERT runtime support
- SESSION:CPSTREAM (improve the runtime)
#3 Updated by Greg Shah over 5 years ago
Implement these for Phase 2 (main project):
- generalized CONVERT option support (e.g. not just INPUT but also used in OUTPUT TO)
- NO-MAP I/O option
- NO-CONVERT runtime support
GET-CODEPAGES()
functionIS-CODEPAGE-FIXED()
function- CURRENT-LANGUAGE function and statement (it is partial today, complete the runtime support; please see #3817 which is related to this)
- #3817 string resource bundles and translation manager replacement
#4 Updated by Constantin Asofiei over 5 years ago
- Assignee set to Constantin Asofiei
- Status changed from New to WIP
#5 Updated by Constantin Asofiei over 5 years ago
The conversion issues for this task (phase 1) are solved in 3750a rev 11297.
#6 Updated by Constantin Asofiei over 5 years ago
A note to check: editor with LONGCHAR value, having a non-default codepage.
#7 Updated by Greg Shah over 5 years ago
The CURRENT-LANGUAGE
implementation really should not be needed for the POC, since the default language's string constants can be used. As such, this will be deferred to phase 2.
#8 Updated by Greg Shah about 5 years ago
- Assignee changed from Constantin Asofiei to Eugenie Lyzenko
Constantin: Eugenie is going to take this task. I know you finished part of the work on phase 1 (conversion and some runtime), but that not all was complete. If you can please do the following:
1. If you have any pending work, we need to get that into a branch somewhere.
2. We need an update to this task to detail what is done and what is left to do.
3. A specific list of items and questions for which we need testcases.
Eugenie: If there are parts of the work for which we don't need testcases, you can start work on those.
#9 Updated by Eugenie Lyzenko about 5 years ago
Greg Shah wrote:
Eugenie: If there are parts of the work for which we don't need testcases, you can start work on those.
OK.
#10 Updated by Constantin Asofiei about 5 years ago
All runtime needs to be implemented. Below is a list of what it needs to be done.
For FIX-CODEPAGE and GET-CODEPAGE, both conversion and runtime support required. Testing should be done for:- is the codepage copied from one longchar value to another?
- is the codepage involved in comparison operators?
- FIX-CODEPAGE with empty, unknown, non-empty longchar vars
- what if the codepage is already set?
- clob fields - can they work with fix-codepage and get-codepage? Can the codepage be set in some other way?
- editor with large-object (which can display a LONGCHAR val, with or without a codepage set) - how is the text displayed?
- assigning a longchar to a char - is the codepage inherited from the rvalue?
- assignment between longchars - the same, is the code page included in the assign?
- is the codepage affecting the character bytes? For example:
// lc1a, lc1b - same codepage // lc2 - other codepage lc2 = "some text which may differ in the codepage". lc1a = "some text which may differ in the codepage". lc1b = lc2.
Are lc1a and lc1b equal - is the final text in lc1b unaffected by the initial codepage in lc2? The idea here is to determine if the longchar's codepage is used when assigning a text to it (thus the reference text is kept in memory converted in the target codepage). - how are other statements which work with strings, affected?
- we need to specify the list of known codepages (or default to the Java's available codepages)
- CODEPAGE-CONVERT, INPUT STREAM CONVERT work with this
- what is the default codepage value - some explanation is in https://documentation.progress.com/output/ua/OpenEdge_latest/index.html#page/dvint/determining-the-code-page.html
- combinations of source and target codepages
- the source codepage is not the real text's codepage
- source/target codepages are not in the convmap.cp list
#11 Updated by Constantin Asofiei about 5 years ago
Greg, see above for the runtime I18N - are these what you are looking for?
About the translation manager and translatable strings; we need tests to prove:- Are all strings without
:U
translatable? - see these https://documentation.progress.com/output/ua/OpenEdge_latest/index.html#page/dvref/-22--22character-string-literal.html - Translation Database
- How is the translation database saved? Is this a simple Progress DB? If so, what is the schema?
- What is the character encoding of the database?
- Can the source text and translated text be in different encodings?
- Is there any functionality in OpenEdge that read the translation database at runtime or is this only for building the compile-time r-code text segments?
- How you can switch between translations, is this related to
CURRENT-LANGUAGE
? - Are 4GL system error messages translated, too? (we have been assuming YES, that
CURRENT-LANGUAGE
will select different message sources in OpenEdge) - Are only standalone static strings translated? What if the string is in an expression, like
"there is an error in program" + pname
- thethere is an error in program
, will this string be translatable? - How does the 4GL behave if a string has a translation and others do not; is this something done at compile time, so a translation can't be done in future, or at runtime, and the .r will see any newly added translation?
#12 Updated by Greg Shah about 5 years ago
see above for the runtime I18N - are these what you are looking for?
Yes, this is what I was looking for.
#13 Updated by Constantin Asofiei about 5 years ago
- texts at the schema definition (labels, and so on) - are these translatable?
#14 Updated by Greg Shah about 5 years ago
Eugenie: Please make a list of the items in this task which do not need any testcases written. These are items for which you already have enough information to implement. I imagine that all the conversion and some of the runtime features can be worked now.
#15 Updated by Eugenie Lyzenko about 5 years ago
Greg Shah wrote:
Eugenie: Please make a list of the items in this task which do not need any testcases written. These are items for which you already have enough information to implement. I imagine that all the conversion and some of the runtime features can be worked now.
OK.
#16 Updated by Eugenie Lyzenko about 5 years ago
Constantin,
Can you provide the short list of what is already DONE in this task?
I'm creating the implementation plan and need to exactly separate things that are ready from other TODO list.
#17 Updated by Constantin Asofiei about 5 years ago
Eugenie Lyzenko wrote:
Constantin,
Can you provide the short list of what is already DONE in this task?
Items in #3753-2 should have full conversion support already, with stubbed (or partial) runtime. Items in #3753-3 I don't think have conversion support, and no runtime.
I would start with conversion support for #3753-3, as the syntax is not that complex.
#18 Updated by Eugenie Lyzenko about 5 years ago
My first steps plan is to:
1. Investigate in details all external resources to be used during I18
runtime support. For now I see only dlc/convmap.cp
file. We need to properly define where these resources will be located and how we will use them.
2. Implement some simple features that uses external resources from point 1:
- FIX-CODEPAGE()
statement
- IS-CODEPAGE-FIXED()
function
- CODEPAGE-CONVERT()
function
- GET-CODEPAGE()
function
- GET-CODEPAGES()
function
- CURRENT-LANGUAGE()
function and statement
Let me know if this plan needs corrections according to current projects requirements.
#19 Updated by Eugenie Lyzenko about 5 years ago
And I will create 3753a
branch to upload the changes if no objections.
#20 Updated by Eugenie Lyzenko about 5 years ago
Created task branch 3753a
from trunk revision 11301
.
#21 Updated by Greg Shah about 5 years ago
Let me know if this plan needs corrections according to current projects requirements.
I think the plan is OK. The tricky part is trying to avoid the areas where tests are being written. If you must explore some of these topics with your own testcases (due to time constraints), please note these here in advance so that people writing tests don't duplicate work.
#22 Updated by Eugenie Lyzenko about 5 years ago
Greg Shah wrote:
Let me know if this plan needs corrections according to current projects requirements.
I think the plan is OK. The tricky part is trying to avoid the areas where tests are being written. If you must explore some of these topics with your own testcases (due to time constraints), please note these here in advance so that people writing tests don't duplicate work.
OK. I think I can not completely avoid the testcases. Just to verify the implemented functionality works. Something simple, like:
resilt = function|statement(args). message result.
It is required to be not completely blind in implementation process.
#23 Updated by Eugenie Lyzenko about 5 years ago
Task branch 3753a
has been updated for review to revision 11302
.
The first steps in I18N
implementation. Added the conversion and runtime support for GET-CODEPAGES
function. Currently runtime is based on Java internal variables, no convmap.cp
feature yet. This is in research/design stage.
The testcases repo has been updated to rev 1832
with simple testcases for GET-CODEPAGE(S)
functions.
Several considerations.
Point to clarify about understanding FIX-CODEPAGE/IS-CODEPAGE-FIXED
statement/function. Correct me if I'm wrong My understanding for parameter meaning passing to them is in ABL
it is the name of the variable not yet initialized(not assigned to something), not the variable itself. For example the following code is correct:
def var chVar as longchar. FIX-CODEPAGE("chVar") = "IBM850".
but the following code is not correct:
def var chVar as longchar. FIX-CODEPAGE(chVar) = "IBM850".
This means internally Progress keeps the fixed variable map list in format:
"name"<-->"codepage"
Then when the operation that is code page dependent is calling the Progress looks if the actors is in fixed variable names map and if it is in - uses redefined code page instead of the default one(or defined for other transformations) for this variable as the source CP
if business logic need to convert the var to another code page. Is it correct understanding?
On the other hand the Java strings internally has no code page assigned per String object, meaning the String object is the set of bordered bytes. The code pages for transforming are defined during transformation. So we will need to keep the registry map for all variable names in the current session that uses "fixed" codepage.
Is it OK to implement all I18N
specific inside p2j/util/EnvironmentOps
class? Or it will be better to create another helper class completely dedicated to I18N
implementation? I guess the new I18N
could be big part of the file.
Continue working. Next step will be FIX-CODEPAGE()
, IS-CODEPAGE-FIXED()
, GET-CODEPAGE()
. And attaching the FWD
functionality of the convmap.cp
.
#24 Updated by Greg Shah about 5 years ago
I don't understand. The following code works:
def var lc as longchar. message "LC CP (before) = " + get-codepage(lc). message get-codepages. fix-codepage(lc) = "1252". message "LC CP (after) = " + get-codepage(lc). /* this generates: ? 1256,709,708,721,711,786,714,710,720,BIG-5,GB2312,CP936,CP950,IBM852,1250,ISO8859-2,1253,IBM851,ISO8859-8,IBM862,IBM850,IBM858,ISO8859-1,ISO8859-15,SHIFT-JIS,EUCJIS,KSC5601,CP949,CP1361,1252,1257,MAZOVIA,ROMAN-8,KOI8-R,1251,IBM866,ISO8859-5,62 0-2533,1254,IBM857,UNDEFINED,IBM861,IBM437,UTF-8,UCS2,UTF-32,UTF-16,UTF-16BE,UTF-16LE,UTF-32BE,UTF-32LE,ISO6937,CP950-HKSCS,GB18030 LC CP (after) = 1252 */
Using a string literal or char expression as the parameter to FIX-CODEPAGE()
does not work.
#25 Updated by Greg Shah about 5 years ago
Currently runtime is based on Java internal variables, no convmap.cp feature yet.
We should start with a standard set of known codepages and a way to map them to Java charsets.
Customers will need a way to customize this. The additional codepage to charset mappings should be implemented in the directory. But the standard set should be built in to the runtime, no directory entries needed.
Then when the operation that is code page dependent is calling the Progress looks if the actors is in fixed variable names map and if it is in - uses redefined code page instead of the default one(or defined for other transformations) for this variable as the source CP if business logic need to convert the var to another code page. Is it correct understanding?
I don't think so. I think this just sets a value inside the longchar
var itself.
You can only set this value before the var has real data. We will need to test to see how it affects assignment, copy-lob, overlay, substring and other statements that can assign data.
On the other hand the Java strings internally has no code page assigned per String object, meaning the String object is the set of bordered bytes. The code pages for transforming are defined during transformation. So we will need to keep the registry map for all variable names in the current session that uses "fixed" codepage.
This should not be needed. I think the codepage just changes how the data is transformed at assignment.
Is it OK to implement all I18N specific inside p2j/util/EnvironmentOps class? Or it will be better to create another helper class completely dedicated to I18N implementation? I guess the new I18N could be big part of the file.
Better to create a new helper class. But the functions/statements that operate only on longchar
should be in the longchar
class.
#26 Updated by Eugenie Lyzenko about 5 years ago
Greg Shah wrote:
I don't understand. The following code works:
[...]
Using a string literal or char expression as the parameter to
FIX-CODEPAGE()
does not work.
OK. I need to add conversion support for FIX-CODEPAGE()
statement because it is missing.
#27 Updated by Eugenie Lyzenko about 5 years ago
Greg Shah wrote:
...
Better to create a new helper class. But the functions/statements that operate only onlongchar
should be in thelongchar
class.
I'm going to introduce new helper class in p2j/utils/I18nOps.java
to separate all I18N
server side specific from rest of the environment processing(except longchar
variable related calls which will be handled in longchar
). Let me know please if class name or location is wrong.
#28 Updated by Eugenie Lyzenko about 5 years ago
Task branch 3753a
has been updated for review to revision 11303
.
This is the support for FIX-CODEPAGE
statement(conversion and runtime). Reworked support for GET-CODEPAGES
, added support for IS-CODEPAGE-FIXED
, GET-CODEPAGE
. If the feature is longchar
dependent - the implementation is inside longchar
class.
The testcases updated to revision 1833
, added simple test for FIX-CODEPAGE/IS-CODEPAGE-FIXED
.
The reworked approach for GET-CODEPAGES
call is based on idea to have 4GL to Java mapping for currently supported charset names. For now the default convmap
code pages set is what we have in original 4GL system. From this set we select only character sets that is supported in Java base package. I have scanned Java charset, some code pages have found but some - not. Need additional work to find out what to do with 4GL encodings. The problematic code pages:
709 708 721 711 786 714 710 720 CP936 CP950 IBM858 EUCJIS KSC5601 CP949 - is it x-windows-949? CP1361 MAZOVIA ROMAN-8 UNDEFINED UCS2 ISO6937
Continue working. Also need to understand what is the special UNDEFINED
code page value? Default for current OS?
The next step will be adding full support for CURRENT-LANGUAGE()
and CODEPAGE-CONVERT()
.
#29 Updated by Eugenie Lyzenko about 5 years ago
Task branch 3753a
has been rebased with trunk 11302
, new revision is 11304
.
#30 Updated by Eugenie Lyzenko about 5 years ago
The testcases bzr repo updated to rev 1834
with simple test for CURRENT-LANGUAGE
statement/function to verify the support.
The result - we already have full conversion and runtime support for this based on current directory configuration. The current language value is stored permanently in directory.xml
from one session to another. If this approach is OK we have nothing to do for this statement/function.
Starting to work on CODEPAGE-CONVERT()
function.
#31 Updated by Greg Shah about 5 years ago
The result - we already have full conversion and runtime support for this based on current directory configuration. The current language value is stored permanently in
directory.xml
from one session to another. If this approach is OK we have nothing to do for this statement/function.
Please read #3817 carefully. Setting the CURRENT-LANGUAGE
will replace string literals in the code with a different version stored at compile time. The trick to seeing this is that the CURRENT-LANGUAGE
must be changed and then the next programs loaded will be affected. Of course, they must have had the replacement strings setup using the Translation Manager. I really doubt we support any of this. Just being able to query and set the value is not enough.
#32 Updated by Eugenie Lyzenko about 5 years ago
Task branch 3753a
has been updated for review to revision 11305
.
Added stub to handle generic case of the CODEPAGE-CONVERT()
function in new helper class I18nOps
. Several versions of calls in TextOps
will finally call single method in I18nOps
to generalize processing. Current conversion approach is not changed. Continue working.
#33 Updated by Eugenie Lyzenko about 5 years ago
Greg Shah wrote:
The result - we already have full conversion and runtime support for this based on current directory configuration. The current language value is stored permanently in
directory.xml
from one session to another. If this approach is OK we have nothing to do for this statement/function.Please read #3817 carefully. Setting the
CURRENT-LANGUAGE
will replace string literals in the code with a different version stored at compile time. The trick to seeing this is that theCURRENT-LANGUAGE
must be changed and then the next programs loaded will be affected. Of course, they must have had the replacement strings setup using the Translation Manager. I really doubt we support any of this. Just being able to query and set the value is not enough.
OK. Reading.
So far all the translatable strings(labels, static text, ...) that are loading after CURRENT-LANGUAGE
change should be replaced with new language specific version, right? Or totally all character based text(event if CURRENT-LANGUAGE
had the old value in a time of loading)?
I mean do we need to have text set for every language(or translate it dynamically) for auto-refresh as reaction for CURRENT-LANGUAGE
change?
#34 Updated by Greg Shah about 5 years ago
Don't implement the CURRENT-LANGUAGE
runtime right now. I think this needs to wait until we have tests that show the specific behavior. The way it was described, we must track the setting of this value for each program that is loaded and only that language is used for that program even if CURRENT-LANGUAGE
is changed while it is running. This needs to be proven, but it means a more complicated implementation since it is set at the time the program loads.
So far all the translatable strings(labels, static text, ...) that are loading after CURRENT-LANGUAGE change should be replaced with new language specific version, right?
No, I think it is all translatable strings in programs that are loaded after CURRENT-LANGUAGE
change.
Or totally all character based text(event if CURRENT-LANGUAGE had the old value in a time of loading)?
No, I think it is only the string literals that are not marked :U
(untranslatable).
I mean do we need to have text set for every language(or translate it dynamically) for auto-refresh as reaction for CURRENT-LANGUAGE change?
No. The customer has a database that has these translations. At conversion, we would read these and create Java resource bundles that would be used by specific programs depending on the CURRENT-LANGUAGE
setting when the containing program was loaded.
#35 Updated by Greg Shah about 5 years ago
- Related to Feature #3817: create resource bundles from string literals and implement optional support for setting values from the translation manager database added
#36 Updated by Eugenie Lyzenko about 5 years ago
Task branch 3753a
has been rebased with trunk 11303
, new revision is 11306
.
#37 Updated by Eugenie Lyzenko about 5 years ago
Task branch 3753a
has been updated for review to revision 11307
.
This adds base level of the CODEPAGE-CONVERT()
function implementation from source CP to target one. The idea is to use Java integrated converters with Java byte[] as intermediate buffer.
The testcases repo has been updated to rev 1835
with simple testcase for CODEPAGE-CONVERT()
method.
Had to tell the testcases is too simple and can be used to check the conversion support and proper runtime logic calls. It is a bad idea to use it to test output provided by CODEPAGE-CONVERT()
for several reasons:
- The original input string content can vary depending on OS default code page where the test is running(for 4GL and FWD).
- In Linux the char inside braces is small a umlaut, while in Windows it contains 2 chars, large A with upper tilde and some other char. So the output is different and not clear if this is the OS code page change or FWD
bug.
So we certainly need some OS neutral testcase which can produce stable result in every OS. May be we need to read original strings from file or represent strings as a set of integer arrays.
What I think and need confirmation or decline is if we have some char shape in one charset(say small a umlaut) - no matter to what target CP it will be converted - the char glyph remains the same(small a umlaut), correct? It there is no such char in target CP - some undefined char shape will be the result.
For any char the transformations 1.(code page 1) -> 2.(code page 2) -> 3.(code page 1) must get the original character(before step 1), correct?
The default char set for 4GL
/FWD
is ISO8859-1
, right? Does it mean the JVM current charset must be reset to ISO8859-1
instead of UTF-8
used in Linux? I mean not only getting SESSION:CHARSET
for FWD
code to return this value but also using this value inside all Java string conversions calls. I guess we will have to go this way to duplicate the 4GL behavior.
#38 Updated by Greg Shah about 5 years ago
Progress handles its various text processing/sorting/conversion using 5 possible input tables:
- character attributes (whether something is an alphabetic char or not)
- case tables (how to translate between upper/lower case)
- collation (how to sort)
- code page conversion (how to translate a char in a source cp to the same char in the target cp)
- word break (how to delimit words)
We will implement a 4GL program that explicitly uses 4GL code that must depend upon these tables. By processing all possible inputs we can observe the output and save the input to output mapping in our own file format.
I think this is easily done.
- character attributes (use
LC
andCAPS
to convert case of characters, only characters that change case are alphabetic) - case conversion (use
LC
andCAPS
to convert case of strings) - collation (use
EQ
,NE
,GT
,LT
,GE
orLE
operators to compare strings) - code page conversion (use
COPEPAGE-CONVERT()
orASC()
orCHR()
to convert using specific source and target codepages). - word break tables will require the right set of input strings and then the use of contains with the right possible match targets to determine how word breaks work for a specific codepage (this one is trickier but should be possible)
Before you do that, please read the following references:
I18N documentation (in v12.0, it is named internationalize-abl.pdf)
dlc/prolang/README (text file with some details about their I18N implementation)
Once you have 4GL code that can calculate these values, we should run them on some common input/output codepage combinations. Then we need to check the Java version of these conversions to see if it is the same. If exactly the same, then we can use the standard Java implementation. If it is not the same, then we will have to override at least some of the implementation.
#39 Updated by Greg Shah about 5 years ago
For any char the transformations 1.(code page 1) -> 2.(code page 2) -> 3.(code page 1) must get the original character(before step 1), correct?
Yes.
The default char set for 4GL/FWD is ISO8859-1, right?
At one time, I think ISO8859-1 was possibly the 4GL default. It may still be the default. When you create a database, I think you can select any value. And when you run, you can set the session default using the -cpinternal
command line option. And at installation time, I think you might be able choose the default as well.
For FWD, the default is determined by the operating system locale. On Linux, this tends to be UTF-8.
Does it mean the JVM current charset must be reset to ISO8859-1 instead of UTF-8 used in Linux? I mean not only getting SESSION:CHARSET for FWD code to return this value but also using this value inside all Java string conversions calls. I guess we will have to go this way to duplicate the 4GL behavior.
So far, for string processing in the converted code we have not found a requirement to internally store string data in anything other than Unicode. Generally speaking, it seems that all of the charset/codepage settings in the 4GL are related to controlling how text data is read from input or written to output. For example:
CPLOG (logfile output)
CPPRINT (output to printer)
CPRCODEIN (reading text from compiled code)
CPRCODEOUT (how text will be converted during compilation)
CPSTREAM (stream IO like files and processes)
CPTERM (character terminals)
None of those relate to how the data is internally stored or compared. Those are controlled using these:
CPCASE (read-only, set using the -cpcase
command line parm or the default database value is used)
CPCOLL (read-only, set using the -cpcoll
command line parm or the default database value is used)
CPINTERNAL (read-only, set using the -cpinternal
command line parm or ? if not specified)
I do wonder what happens if these are:
- conflicting with each other
- conflicting with the database setting
- conflicting with the operating system locale
For now, we will assume that all users will have the same CPINTERNAL
, CPCOLL
and CPCASE
values and that UTF-8 is OK for these values. If this changes, then we will need to implement a deeper approach all of our string processing so that each character variable has knowledge of these values. I'd like to avoid that for now.
At the database, this is different because we have found the need to implement a custom locale for both H2 and PostgreSQL to sort properly. I think this is related to the input databases being in ISO8859-1. I don't know if we haven't had access to a suitable ISO8859-1 locale or if the Progress version of it was just customized so that we needed to do our own version.
Eric/Ovidiu: Perhaps you can comment on this?
#40 Updated by Eugenie Lyzenko about 5 years ago
After reading some background info and Progress documentation some point become more clear for implementation perspective.
1. The suggest that character shape should be the same before and after conversion(example - a umlaut) is correct - no special processing in 4GL.
2. But integer code point(character code) us changing.
3. The point number 1 is problematic to verify because either way we see the compared texts in a single predefined code page and in general case the character shape will be different(example - a umlaut).
4. So far the only predictable way is to verify consistency of the integer code behind character set. Here is the example:
DEFINE VARIABLE char850 AS CHARACTER NO-UNDO. DEFINE VARIABLE charsetstring AS CHARACTER NO-UNDO. define variable intCode as integer no-undo. intCode = 132. char850 = CHR(intCode, "UNDEFINED"). message "Current session charset: " SESSION:CHARSET. message "Original char is: " intCode. charsetstring = CODEPAGE-CONVERT(char850, "ISO8859-1", "ibm850"). message "IBM850 -> ISO8859-1 conversion is: " asc(charsetstring, "UNDEFINED").
In this example
132
integer character code will be converted to 228
code in ibm850 -> ISO8859-1
conversion. And this can easily be seen in 4GL in Windows.
The issue with this test is the fact we have no ASC/CHR
proper runtime support for source/target character page.
So I'm planning to implement missing ASC/CHR
features related to code page. This way we will have good testing environment for further research.
Continue investigation.
I need to understand what does it mean the Progress character in particular code page. For example char text with CP IBM850 means single byte set(not Unicode). The Java internal string representation is UTF-16
. If it is required to do some output this internally converts to OS supported code page, UTF-8
in Linux
or Windows-1252
in Windows
for example.
Also need to research what is the difference in Progress native Unicode support.
And I need to consider another possible internal representation of the 4GL character type. May be integer array is better than String. Just because using String we do character conversion every time we use String. Not sure, need to investigate.
#41 Updated by Greg Shah about 5 years ago
And I need to consider another possible internal representation of the 4GL character type. May be integer array is better than String. Just because using String we do character conversion every time we use String. Not sure, need to investigate.
Implicit conversion is only needed in the following cases:
- text data is being read from or written to an external source/target (a file or the screen); AND
- the input or output codepage is configured to be different from the internal codepage
This is a very small number of cases. For this reason I think we definitely want to keep character data as strings. Otherwise all of the internal usage (which is most of the usage) will be very expensive because we will constantly be converting the int[]
data into a String
and then back to int[]
.
#43 Updated by Eugenie Lyzenko about 5 years ago
Task branch 3753a
has been updated for review to revision 11308
. The tescases repo has been updated to rev. 1836
.
This adds some new testcases to get the status of the current CODEPAGE-CONVERT
function result. Also the change has support for UNDEFINED
special code page. The idea is applying to source or target the UNDEFINED
means no conversion. Usually both Java calls to String.getBytes(CodePage)
and new String(inputByteArray, CodePage)
does the respective conversion inside JVM
. If CP
is UNDEFINED
we need to discard the conversion using String.getBytes()
and new String(inputByteArray)
.
The testcase uast\i18n\cp_convert.p
demonstrates the consistency of the FWD
CODEPAGE-CONVERT
implementation. Double conversion gets the initial string value. And this means looks like we do not need to implement the 4GL specific processing for text transformation. Everything can be done with regular Java tools.
The test uast\i18n\char_convert.p
is the demo for getting integer value of the current character with provided code page. The test is UI independent, just displaying char code, not char itself. It shows the FWD
current implementation for CHR/ASC
are need to be updated to add code page support. These functions are useful for OS independent code page testing so I'm planning to add respective support next.
The other constraint has been found is for usage of the character constants that are not supported by current Java charset(UTF-8
in Linux). This means if we encounter the character that can not be displayed in Java CP - it will be converted to UTF-8
?
char. This is what we have to take into account working with source tree. Even if we can avoid this during conversion we will not able to compile it with errors like this:
... error: unmappable character for encoding utf-8 [ant:javac] character cp850string = TypeFactory.character("text with umlaut (�)"); [ant:javac] ...
So it is better to avoid the hardcoded text constants with extended chars in a source tree.
The preliminary conclusion:
1. The implementation of the CODEPAGE-CONVERT
is OK, at least for now I do not see the issues.
2. For further testing we need the CHR/ASC
to have the support for code-page options for source and target.
Working on the point 2.
#44 Updated by Eugenie Lyzenko about 5 years ago
Greg Shah wrote:
And I need to consider another possible internal representation of the 4GL character type. May be integer array is better than String. Just because using String we do character conversion every time we use String. Not sure, need to investigate.
Implicit conversion is only needed in the following cases:
- text data is being read from or written to an external source/target (a file or the screen); AND
- the input or output codepage is configured to be different from the internal codepage
This is a very small number of cases. For this reason I think we definitely want to keep character data as strings. Otherwise all of the internal usage (which is most of the usage) will be very expensive because we will constantly be converting the
int[]
data into aString
and then back toint[]
.
Agreed, we need to keep the implementation as effective as possible leaving the String
as backend for character
.
#45 Updated by Eugenie Lyzenko about 5 years ago
Task branch 3753a
has been updated for review to revision 11309
.
The update adds target/source code page handling for CHR
function. The implementation is still under debugging but the base functionality is here. The next step will be to complete CHR
and add ASC
.
#46 Updated by Eugenie Lyzenko about 5 years ago
Task branch 3753a
has been updated for review to revision 11310
.
This is completed implementation for CHR/ASC
functions with support target and source code pages. New testcases have been written to verify approach(updated to rev. 1837
).
The current implementation of the ASC
supports DBCS and Unicode allowing return value more than 255
. Need to update gaps rule file to reflect the status and do more tests to verify implementation. Also the planned next steps is to handle upper/lower/collation transformation functions and prepare schedule for next implementation steps.
#47 Updated by Eugenie Lyzenko about 5 years ago
Task branch 3753a
has been updated for review to revision 11311
.
The update reflects gap marking status change for I18N
related features that already implemented.
#48 Updated by Eugenie Lyzenko about 5 years ago
Task branch 3753a
has been updated for review to revision 11312
.
This small update fixes double element for KW_LC
in expressions.rules
for gap markup.
Also the testcases repo has been updated to rev. 1838
. With new testcases to verify LC/CAPS
functions for CHARACTER
and LONGCHAR
variables. Testing shows the implementation is OK without additional tables to construct for UPPER case to LOWER case and back.
Preparing plan for further work.
#49 Updated by Eugenie Lyzenko about 5 years ago
So far the next steps plan will be to focus on stream related codepage processing:
- SESSION:CPSTREAM
- INPUT STREAM CONVERT
runtime support
- (OUTPUT TO)/(INPUT THROUGH) CONVERT
option
- NO-MAP
I/O option
- NO-CONVERT
runtime support
#50 Updated by Eugenie Lyzenko about 5 years ago
Task branch 3753a
has been rebased with trunk 11304
, new revision is 11313
.
#51 Updated by Eugenie Lyzenko about 5 years ago
The testing shows the 4GL returns ISO8859-1
for both SESSION:CPSTREAM
and SESSION:CPINTERNAL
. In FWD
we have UTF-8
as SESSION:CPSTREAM
and ISO8859-1
as SESSION:CPINTERNAL
. Actually we have UTF-8
for both but substitute with ISO8859-1
for SESSION:CPINTERNAL
.
Should we return UTF-8
too at least for Linux server instance?
These options are very important for proper code page related IO handling because the result will be different depending on source/target CP
. So we need proper strategy. What are our plans here?
What about to define -cpstream
and -cpinternal
overrides in directory.xml
file? This way we could fine tune code page options for particular customer application.
#52 Updated by Eugenie Lyzenko about 5 years ago
Task branch 3753a
has been updated for review to revision 11314
.
Testcases repo has been updated to revision 1840
.
Update adds runtime support for INPUT ... FROM CONVERT
statement. Using the same approach that is used in CODEPAGE-CONVERT
function. The new testcase is used to demonstrate this. Also the attribute SESSION:CPSTREAM
and SESSION:CPINTERNAL
are also fully supported in FWD
. With notes I've previously mentioned. I think we need to provide customer the opportunity to customize -cpinternal
and -cpstream
options in directory.xml
to have proper conversion while reading from/writing to external file.
Continue working on implementing (OUTPUT TO)/(INPUT THROUGH) CONVERT
option.
#53 Updated by Eugenie Lyzenko about 5 years ago
Task branch 3753a
has been updated for review to revision 11315
.
Testcases repo has been updated to revision 1841
.
Update adds runtime support for CONVERT/NO-CONVERT
options in stream based IO. Also the handling approach has been changed to have conversion by default. So to ignore the conversion process the option NO-CONVERT
must be explicitly specified. In addition stream constructor gets default values for -cpstream
and -cpinternal
variables to be used if source or target code page overrides are not defined.
The note for usage -cpstream
and -cpinternal
as source and target. When the stream is in reading the source code page is -cpstream
while target code page is -cpinternal
. But in the case of writing the source and target should be swapped, -cpinternal
become a source code page and -cpstream
become a target code page respectively. The implementation should support both named and unnamed streams. Including INPUT THROUGH
version.
Several testcases added to testcase repo. Just to have simple tests to confirm implementation consistency. However we need to have some complex tests from 4GL
experts to debug/verify the implementation.
The next step is to check/implement NO-MAP
option and string resource bundles and translation manager replacement described in #3817.
#54 Updated by Eugenie Lyzenko about 5 years ago
Task branch 3753a
has been updated for review to revision 11316
.
Testcases repo has been updated to revision 1842
.
Adding the conversion and runtime support for NO-MAP
stream IO option. Testcases updated to check MAP/NO-MAP
option.
Greg,
Do we need to implement MAP
option here? It is related to PROTERMCAP
file entry used to char conversion. If we need the option to be implemented - I have a question related. The simple 4GL code(from i18n/stream_cp_map.p
):
... INPUT FROM "stream_cp_input.txt" MAP "hp/italian". ...
converting to incorrect call:
... UnnamedStreams.assignIn(StreamFactory.openFileStream("stream_cp_input.txt", false, false), "hp/italian"); ...
And I'm not clear why
MAP
option is considering as parameter for UnnamedStreams.assignIn
call. No KW_MAP related rules added. Why this is not happening with CONVERT TARGET "targetCP"
stream option for "targetCP"
? Is there a simple answer I'm missing? This can save my time I'll spend to find a root cause.#55 Updated by Greg Shah about 5 years ago
The testing shows the 4GL returns ISO8859-1 for both SESSION:CPSTREAM and SESSION:CPINTERNAL.
This will depend on the installation. I don't know if it is explicitly set during OpenEdge installation or if it is inferred from the locale.
In FWD we have UTF-8 as SESSION:CPSTREAM and ISO8859-1 as SESSION:CPINTERNAL. Actually we have UTF-8 for both but substitute with ISO8859-1 for SESSION:CPINTERNAL.
We did this long ago because we had encountered 4GL code that expected ISO8859-1 but there was no real reason that we couldn't use UTF-8 internally. So we "hacked" this.
The current approach is not correct and it may cause that application to have an issue, but we need to fix this now.
Should we return UTF-8 too at least for Linux server instance?
For now, we are not going to have different CPINTERNAL
values based on the user's session. Instead we need to base this on the JVM default encoding. It should have nothing to do with the operating system. In the future, we will need to honor different CPINTERNAL
by session. But this will require much more than just how we set and report that value, we would also need to handle all String operations in that encoding. That is not for now.
I do think we need to always return a codepage name that is recognizable. I'm not sure that the JVM default encoding will always have a name that matches what OpenEdge would return. For example, in Java what is the Windows 1252 encoding name? I think it is windows-1252
. In the 4GL, it will return as 1252
. Please create a map of the names that can translate between these.
What about to define
-cpstream
and-cpinternal
overrides in directory.xml file? This way we could fine tune code page options for particular customer application.
I agree that we need the equivalent of -cpstream
in the directory. Please go ahead and add a mechanism to set the value of all of the codepages (-cpstream
, -cpinternal
, -cpprint
...) from the directory. It should be able to be defined/overridden at all the different levels (global default, server default, group, account...).
This value should be specified using the 4GL CP name, which we must map to the Java encoding name (see above).
Please add conversion and simple runtime support for the attribute getters for all of the codepage values (most are missing). This must include CHARSET
(which I think is the same thing as CPINTERNAL
, but we need to check) and for CODEPAGE
.
These values should have a compatible default if not overridden in the directory. But if specified in the directory (look this up using the directory access methods that do the hierarchical search), then the value specified should be returned.
We will only honor the actual value for CPSTREAM
right now. The others won't have support, so the runtime support for those should be marked as stubs.
For now, I think the CPINTERNAL
value returned must be based purely on the JVM default charset, which is set by the JVM based on the JVM's locale.
This testcase proves that the CP*
attributes are not settable (see testcases/uast/i18n/
):
/* setting CP* values causes **CHARSET is not a settable attribute for PSEUDO-WIDGET. (4052) */ /* but the ERROR-STATUS-ERROR will be false */ /* it doesn't matter the value that is set, it could be "utf-8", "garbage" or even ? (unknown value) */ /* the result is always the same warning */ session:cpinternal = "ISO8859-1" no-error. if error-status:error or not error-status:get-number(1) = 4052 then message "ERROR: expected error 4052 to be returned.". session:cpstream = "ISO8859-1" no-error. if error-status:error or not error-status:get-number(1) = 4052 then message "ERROR: expected error 4052 to be returned.". session:cpterm = "ISO8859-1" no-error. if error-status:error or not error-status:get-number(1) = 4052 then message "ERROR: expected error 4052 to be returned.". session:cplog = "ISO8859-1" no-error. if error-status:error or not error-status:get-number(1) = 4052 then message "ERROR: expected error 4052 to be returned.". session:cpprint = "ISO8859-1" no-error. if error-status:error or not error-status:get-number(1) = 4052 then message "ERROR: expected error 4052 to be returned.". session:cprcodein = "ISO8859-1" no-error. if error-status:error or not error-status:get-number(1) = 4052 then message "ERROR: expected error 4052 to be returned.". session:cprcodeout = "ISO8859-1" no-error. if error-status:error or not error-status:get-number(1) = 4052 then message "ERROR: expected error 4052 to be returned.". session:cpcase = "basic" no-error. if error-status:error or not error-status:get-number(1) = 4052 then message "ERROR: expected error 4052 to be returned.". session:cpcoll = "basic" no-error. if error-status:error or not error-status:get-number(1) = 4052 then message "ERROR: expected error 4052 to be returned.". message "Finished successfully.".
Also the attribute SESSION:CPSTREAM and SESSION:CPINTERNAL are also fully supported in FWD.
I don't think we can say that CPINTERNAL
is fully supported. We don't modify the encoding used for internal string operations in a FWD user's session, so the real support is not there. This is still "partial" level support right now. Please add a comment to the gap rules for cpinternal <!-- this value cannot be overridden on a per-session basis, the default jvm encoding is always reported and is always used for internal string operations -->
.
Also the handling approach has been changed to have conversion by default.
This should only be the case if CPSTREAM
is different from CPINTERNAL
, right? So in the default case there should be NO conversion because these two values are always the same in the 4GL unless the command line override has been provided (e.g. -cpstream
). Of course, specifying the source or target codepage in the 4GL code will override the CPSTREAM
or CPINTERNAL
in this calculation, so we need to handle that at runtime.
Please check the default (whether CPSTREAM = CPINTERNAL
) once and save the result in the context-local area. Then implement the default conversion or no conversion based on this.
The note for usage -cpstream and -cpinternal as source and target. When the stream is in reading the source code page is -cpstream while target code page is -cpinternal. But in the case of writing the source and target should be swapped, -cpinternal become a source code page and -cpstream become a target code page respectively. The implementation should support both named and unnamed streams.
Yes, understood.
Do we need to implement MAP option here?
Maybe. At least the conversion should be supported. But I need to understand what the runtime behavior is for MAP
.
It is related to PROTERMCAP file entry used to char conversion.
Can you provide more details? What actually happens here when this is specified? Where does the translation mapping data come from (is it hard coded into the protermcap)? How does it mix with the CP*
attribute support?
And I'm not clear why MAP option is considering as parameter for UnnamedStreams.assignIn call.
This is because there is no processing for the KW_MAP
option. If there was, then there would be a peerid
and it would properly emit into the parent. This should be fixed easily.
#56 Updated by Eugenie Lyzenko about 5 years ago
Task branch 3753a
has been updated for review to revision 11317
.
The update adds conversion and runtime(still no real work, just storing the option value).
Yes, the issue was in missing createPeerAst
call, thanks for help. Also I have modified the progress.g
to change KW_MAP
options approach to literal | filename[null]
. This way we can support both MAP proterm-entry
and MAP "proterm-entry"
, the same way the 4GL
does. The previous version supports only MAP "proterm-entry"
case. Please review this change.
#57 Updated by Eugenie Lyzenko about 5 years ago
Greg Shah wrote:
Do we need to implement MAP option here?
...
Maybe. At least the conversion should be supported. But I need to understand what the runtime behavior is forMAP
.It is related to PROTERMCAP file entry used to char conversion.
Can you provide more details? What actually happens here when this is specified? Where does the translation mapping data come from (is it hard coded into the protermcap)? How does it mix with the
CP*
attribute support?
The 4GL doc states (for INPUT FROM
statement):
The protermcap-entry value is an entry from the PROTERMCAP file. Use MAP to read from an input stream that uses a different character translation from the current stream. Typically, protermcap-entry is a slash-separated combination of a standard device entry and one or more language-specific add-on entries (MAP laserwriter/french or MAP hp2/spanish/italian, for example). The AVM uses the PROTERMCAP entries to build a translation table for the stream. Use NO-MAP to make the AVM bypass character translation altogether.
For now we are limited the fact the usage of the PROTERMCAP
file is for Linux/Unix systems only. I think it does not work in Windows. As for other 4GL parts we need the testcase to investigate how it actually works(the only document is not good enough source to have the real picture as we know from previous experience). So the issue is missing Linux based system with ABL installed. We can suspect the doc's "build a translation table for the stream"
means overriding source/target codepage translation handling for some/all characters.
#58 Updated by Eugenie Lyzenko about 5 years ago
Greg Shah wrote:
The testing shows the 4GL returns ISO8859-1 for both SESSION:CPSTREAM and SESSION:CPINTERNAL.
This will depend on the installation. I don't know if it is explicitly set during OpenEdge installation or if it is inferred from the locale.
In FWD we have UTF-8 as SESSION:CPSTREAM and ISO8859-1 as SESSION:CPINTERNAL. Actually we have UTF-8 for both but substitute with ISO8859-1 for SESSION:CPINTERNAL.
We did this long ago because we had encountered 4GL code that expected ISO8859-1 but there was no real reason that we couldn't use UTF-8 internally. So we "hacked" this.
The current approach is not correct and it may cause that application to have an issue, but we need to fix this now.
OK.
Should we return UTF-8 too at least for Linux server instance?
For now, we are not going to have different
CPINTERNAL
values based on the user's session. Instead we need to base this on the JVM default encoding. It should have nothing to do with the operating system. In the future, we will need to honor differentCPINTERNAL
by session. But this will require much more than just how we set and report that value, we would also need to handle all String operations in that encoding. That is not for now.
OK. Understood.
I do think we need to always return a codepage name that is recognizable. I'm not sure that the JVM default encoding will always have a name that matches what OpenEdge would return. For example, in Java what is the Windows 1252 encoding name? I think it is
windows-1252
. In the 4GL, it will return as1252
. Please create a map of the names that can translate between these.
Such map already implemented in I18nOps
helper class.
#59 Updated by Eugenie Lyzenko about 5 years ago
Greg Shah wrote:
Please add conversion and simple runtime support for the attribute getters for all of the codepage values (most are missing). This must include
CHARSET
(which I think is the same thing asCPINTERNAL
, but we need to check) and forCODEPAGE
.These values should have a compatible default if not overridden in the directory. But if specified in the directory (look this up using the directory access methods that do the hierarchical search), then the value specified should be returned.
We will only honor the actual value for
CPSTREAM
right now. The others won't have support, so the runtime support for those should be marked as stubs.
Task branch 3753a
has been updated for review to revisions 11318
, 11319
.
This changes support level for CPINTERNAL
attribute to partial.
Also implementing conversion support for all CP*
code page related attributes, marking runtime support as stubs.
Currently due to the attributes have read-only access, only getters are implemented. But according to your testcase(settable_cp_attributes.p
) the setters are need to be implemented as well, correct? With the only purpose to generate the error while execution. If we will have no conversion support for setters any call to session:cpterm = "ISO8859-1"
will cause the conversion error I think. So please clarify this point.
Planning the rebase 3753a
with the recent trunk in a 5-10 min.
#60 Updated by Eugenie Lyzenko about 5 years ago
Task branch 3753a
has been rebased with trunk 11305
, new revision is 11320
.
#61 Updated by Eugenie Lyzenko about 5 years ago
Greg Shah wrote:
...
Also the handling approach has been changed to have conversion by default.
This should only be the case if
CPSTREAM
is different fromCPINTERNAL
, right?
Yes.
So in the default case there should be NO conversion because these two values are always the same in the 4GL unless the command line override has been provided (e.g.
-cpstream
). Of course, specifying the source or target codepage in the 4GL code will override theCPSTREAM
orCPINTERNAL
in this calculation, so we need to handle that at runtime.
This is alredy implemented in I18nOps
conversion worker. If source CP is the same as the target CP no conversion happening. This check is performed for every call that is code page capable. So agree, may be not very optimal approach.
Please check the default (whether
CPSTREAM = CPINTERNAL
) once and save the result in the context-local area. Then implement the default conversion or no conversion based on this.
OK. Will re-work on local context basis.
#62 Updated by Eugenie Lyzenko about 5 years ago
The settable_cp_attributes.p
session:cpinternal = "ISO8859-1" no-error.
converts to:
... silent(() -> SessionUtils.readOnlyError("cpinternal")); if (_or(ErrorManager.isError(), () -> not(isEqual(ErrorManager.getErrorNumber(1), 4052)))) { message("ERROR: expected error 4052 to be returned."); } ...
So there is no conversion/compilation issues. I think we do not need to implement setters for SESSION
code pages related attributes.
#63 Updated by Greg Shah about 5 years ago
So there is no conversion/compilation issues. I think we do not need to implement setters for SESSION code pages related attributes.
Correct.
#64 Updated by Eugenie Lyzenko almost 5 years ago
Task branch 3753a
has been rebased with trunk 11306
, new revision is 11321
.
#65 Updated by Eugenie Lyzenko almost 5 years ago
Task branch 3753a
has been updated for review to revisions 11322
, 11323
.
The update adds session context area to store cached variables. Also introduces the support for directory.xml
overrides for all CP related internal variables. Initially when the service is called for the first time FWD
requests for overrides. If the variable is still null
- there are no directory definitions and we will use current charset value obtained from JVM.
Also the no-translation mode(when target CP equals to source CP) is automatically handling on Stream
class level when no explicit target/source are defined. Otherwise we need to do checking for every stream based operation. We can define if we need the real transformation when both source and target are defined for stream.
#66 Updated by Eugenie Lyzenko almost 5 years ago
Task branch 3753a
has been updated for review to revisions 11324
.
The update fixes the bug in Stream
while convert flag computing. Also added code to properly handle the directory service on both client and server sides. This unification is required because some operation uses server side local session area, while others need the client side asking the directory values. The using DirectoryManager.getInstance()
. Continue working with next part of the task - translation manager replacement.
#67 Updated by Greg Shah almost 5 years ago
Continue working with next part of the task - translation manager replacement.
Actually, we are going to defer this work (and #3817) until the summertime. This is not needed for our next customer milestones.
As long as there is no conversion/compilation issue with reading and writing SESSION:CURRENT-LANGUAGE
, then I think the rest of the work on translation manager support can be paused.
#68 Updated by Eugenie Lyzenko almost 5 years ago
Greg Shah wrote:
Continue working with next part of the task - translation manager replacement.
Actually, we are going to defer this work (and #3817) until the summertime. This is not needed for our next customer milestones.
OK.
As long as there is no conversion/compilation issue with reading and writing
SESSION:CURRENT-LANGUAGE
, then I think the rest of the work on translation manager support can be paused.
The conversion/compilation is OK for CURRENT-LANGUAGE
function/statement. What should be investigated additionally I think is how changing CURRENT-LANGUAGE
affects all others CP*
attributes for SESSION
. For example CPINTERNAL
, CPSTREAM
, CHARSET
etc. Do we need to implement 4GL behavior here?
#69 Updated by Greg Shah almost 5 years ago
We are arranging for someone to write 4GL testcases to do a deep/comprehensive look at the I18N features.
What should be investigated additionally I think is how changing CURRENT-LANGUAGE affects all others CP* attributes for SESSION. For example CPINTERNAL, CPSTREAM, CHARSET etc.
I agree. If there is any relationship there, we need to understand what it is. Please make a list of all the questions that you have which are not covered by the items in notes 11, 38 and 39. Post that list in this task so that we can include those questions in the testcases work.
#70 Updated by Eugenie Lyzenko almost 5 years ago
4GL tests requirement
The following option's dependency need to be tested in 4GL environment fromCURRENT-LANGUAGE
settings(most of them are SESSION
attributes):
CPINTERNAL
CPSTREAM
CPCASE
CPCOLL
CPLOG
CPPRINT
CPRCODEIN
CPRCODEOUT
CPTERM
CHARSET
CODEPAGE
The possible 4GL
test scenario is:
1. Check the initial attribute value from the list.
2. Change the CURRENT-LANGUAGE
.
3. Re-check the attribute value to find out if it is changed.
In a perfect world we need to know the behavior for both Linux
and Windows
4GL
system.
#71 Updated by Eugenie Lyzenko almost 5 years ago
4GL tests requirement
Another point of the interest for 4GL
testing is the behavior of the CHR()
function in characters that outside the 255
range.
For now I'm leaving the TODO
commented out code to the moment we have clear picture of the > 255 integer code values in CHR()
.
#72 Updated by Eugenie Lyzenko almost 5 years ago
Task branch 3753a
has been updated for review to revisions 11325
.
This is the improvement for mapping Java code page names to Progress ones. The idea is to have bi-directional mapping to speed up the process of returning 4GL compatible values in getCP*()
functions.
As far as I understand the only work left for now in this task is to create the list of the specifications for 4GL testcases to have comprehensive base for further code debugging. The tests itself will be written by someone other, right?
Do I need to shift to another task? Make regression tests and merge the 3753a
into trunk?
#73 Updated by Greg Shah almost 5 years ago
As far as I understand the only work left for now in this task is to create the list of the specifications for 4GL testcases to have comprehensive base for further code debugging. The tests itself will be written by someone other, right?
Yes.
Make regression tests and merge the 3753a into trunk?
Yes. I will do a code review.
Do I need to shift to another task?
I'll let you know.
#74 Updated by Eugenie Lyzenko almost 5 years ago
4GL tests requirement
More test related area to investigate:LC()
/CAPS()
functions for different code page combinations(-cpinternal/-cpcase). Single/double byte charset support.- The set of tests to verify different
-cpinternal
/-cpstream
combinations for stream based input and output. ASC()
/CHR()
functions for different code page combinations in source/target/missing defaults. Single/double byte charset support.
#75 Updated by Eugenie Lyzenko almost 5 years ago
4GL tests additional wishes
It will be good to also have the 4GL testcases that leverage the following ProgressCONVMAP.CP
related transformations:
- table that defines is the current character is alpha
- collation table
Also for Linux based system it will be good to have working test with PROTERMCAP
file mapping usage.
#76 Updated by Greg Shah almost 5 years ago
Code Review 3753a Revision 11325
This is a really good update. Some feedback:
1. The change in DirectoryServer
is not correct. The idea of ID_ABSOLUTE
is that the full path and node is specified by the caller. What you have implemented is an approach that uses Utils.getDirectoryNodeWorker()
to service the request of both ID_ABSOLUTE
and ID_RELATIVE
.
Utils.getDirectoryNodeWorker()
can only be used to implement ID_RELATIVE
because it checks multiple paths to see if the given node is there:
Utils.DirScope.ACCOUNT search: 1. /server/<serverID>/runtime/<account_or_group>/<id>.<project> 2. /server/<serverID>/runtime/<account_or_group>/<id> 3. /server/<serverID>/runtime/default/<id>.<project> 4. /server/<serverID>/runtime/default/<id> 5. /server/default/runtime/<account_or_group>/<id>.<project> 6. /server/default/runtime/<account_or_group>/<id> 7. /server/default/runtime/default/<id>.<project> 8. /server/default/runtime/default/<id> Utils.DirScope.SERVER search: 1. /server/<serverID>/<id>.<project> 2. /server/<serverID>/<id> 3. /server/default/<id>.<project> 4. /server/default/<id>
If search scope is Utils.DirScope.BOTH
, then we do Utils.DirScope.ACCOUNT
and if we don't find something then we do Utils.DirScope.SERVER
.
ID_ABSOLUTE
would be checking ONLY the given path. For example, the caller might provide this: /server/default/runtime/default/some_node
. There is an argument that we should probably check 2 paths here (the exact one given and then another with .<project>
added so that project tokens can be honored. But the point I'm trying to make is that we don't have any Utils
helpers to implement ID_ABSOLUTE
. If we had these, we would have already resolved the TODOs in DirectoryService
.
The problem here is that we have implemented ID_RELATIVE
to mean Utils.DirScope.ACCOUNT
. Perhaps we need to provide additional options like ID_RELATIVE_ACCOUNT
, ID_RELATIVE_SERVER
and ID_RELATIVE_BOTH
. We can "alias@ ID_RELATIVE
to be the same meaning as ID_RELATIVE_ACCOUNT
so that existing code will not break.
2. In progress.g
, the change in io_options
should reference STRING
instead of literal
. The reason: if you specify literal
, then you can also match lots of non-string things like true
or 01/01/1999
or -3.14
.
In gaps/expressions.rules
, I think the kw_cp_cvt
, kw_get_codp
, kw_is_cp_fx
and kw_get_cp
should probably be marked rt_lvl_basic
(instead of rt_lvl_full
) until we have run the 4GL testcases to confirm full compatibility.
3. The getters for SESSION:CP*
should be all in one place. Today we have them in both EnvironmentOps
and I18nOps
. Let's put them all in I18nOps
.
4. In I18nOps.getCodePages()
, I wonder if the order will match the 4GL. This could be a compatibility issue.
5. In I18nOps.getCodePages()
, the returned string will always end in a ,
. Is that how the 4GL does it?
6. We need explicit processing of unknown value for Text
and character
parameters to TextOps.codepageConvert()
, character.asc()
and character.chr()
. It is not safe to call getValue()
on these. The value
member can be out of sync with the unknown
flag, leading to wrong results.
7. In I18nOps.codepageConvert()
, this code:
sourceJavaCP = convmap2Java.get(((longchar)text).getCodePage(). toStringMessage(). toUpperCase());
should use an alternate version longchar._getCodePage()
which returns the proper string directly instead of using the wrapper version. You'll have to create this new version.
8. In I18nOps.codepageConvert()
, in this code:
// target CP is valid, check the source CP if (text instanceof longchar) { ... }
I think this should have a check to see if the longchar
has its codepage fixed. If it is not fixed, won't this return unknown value which will appear like a codepage named "?"? This seems wrong. I think if the logchar
has no fixed codepage, then it should probably default to the SESSION:CPINTERNAL
, right?
9. In I18nOps.chr()
, the use of a byte[4]
to convert seems wrong. It seems like this could treat values greater than 255 as 4 single byte characters depending on the source/target codepages specified.
10. I think it is incorrect to pass a char
as input to I18nOps.asc()
and it is incorrect to return a char
from I18nOps.chr()
. Can't we encounter unicode characters that would be more than 2 bytes in size? I think we should be passing this as int
to handle all possible unicode characters (e.g. UTC32).
11. In Stream.setConvertSource(String)
, why is convert
set true
when targetCp null
? It seems like this is the opposite of what should happen. This is especially the case since it is possible since sourceCp
may be set to null
. There is the similar question for Stream.setConvertTarget()
and the use of sourceCp null
.
12. Stream
is used on both the server and the client. I think the direct usage of EnvironmentOps
is a problem in this case. I also think we need to send cpinternal and cpstream values to the client once for the whole session, otherwise it will be expensive to use an up-call to the server each time a stream instance is created.
13. The Stream.convert
member is modified multiple times, during construction, possibly when the converted 4GL code calls Stream.setConvert()
and during setConvertSource|Target()
. This seems like the order may be different from the 4GL processing, causing the flag to be set to the wrong value.
14. In gaps/lang_stmts.rules
, I think the kw_fix_cp
should probably be marked rt_lvl_basic
(instead of rt_lvl_full
) until we have run the 4GL testcases to confirm full compatibility.
15. methods_attributes.rules
needs a history entry.
#77 Updated by Eugenie Lyzenko almost 5 years ago
Greg Shah wrote:
5. In
I18nOps.getCodePages()
, the returned string will always end in a,
. Is that how the 4GL does it?
I think it is not the case. I thought about this possibility and the code:
... public static character getCodePages() { // TODO: implement me with taking into account the convmap.cp compatibility StringBuilder sbRes = new StringBuilder(); // getting available code pages Iterator csIter = convmap2Java.keySet().iterator(); while (csIter.hasNext()) { sbRes.append(csIter.next()); if (csIter.hasNext()) { sbRes.append(","); } } ...
has protection against it. After
csIter.next()
the csIter.hasNext()
returns true when there is more items in iterator. For last item it is not a true
so after last item adding the csIter.hasNext()
is false
and ,
is not appended to the end of the string.#78 Updated by Eugenie Lyzenko almost 5 years ago
- File code_pages_order.jpg added
The original 4GL ordering for GET-CODEPAGES()
:
#79 Updated by Greg Shah almost 5 years ago
I think it is not the case. I thought about this possibility and the code:
Sorry, I mis-read the code.
The original 4GL ordering for GET-CODEPAGES:
Please add this comment in the static {}
initializer (where convmap2JavaDefault
is initialized):
// These mappings are explicitly being added in the same exact order // they appear in the 4GL GET-CODEPAGES() function. Do not change // the order, otherwise the result of that function will be incorrect.
#80 Updated by Eugenie Lyzenko almost 5 years ago
Greg Shah wrote:
11. In
Stream.setConvertSource(String)
, why isconvert
settrue
whentargetCp null
? It seems like this is the opposite of what should happen. This is especially the case since it is possible sincesourceCp
may be set tonull
. There is the similar question forStream.setConvertTarget()
and the use ofsourceCp null
.
The idea behind this logic is: assume only sourceCP
or targetCP
is set to valid value while other value (targetCP
or souceCP
) remains null
. What does it mean from conversion perspective? I think it means source != target
in general and causes the conversion to be active. Then if missing CP
is resolved to default in I18nOps
code and finally we found the source and target are the same the actual conversion code will be ignored in I18nOps
. Yes, this will happen not too early but I guess extra check is better than wrong handling.
#81 Updated by Eugenie Lyzenko almost 5 years ago
Greg Shah wrote:
13. The
Stream.convert
member is modified multiple times, during construction, possibly when the converted 4GL code callsStream.setConvert()
and duringsetConvertSource|Target()
. This seems like the order may be different from the 4GL processing, causing the flag to be set to the wrong value.
Yes this is the design update approach. In 4GL the explicit setting for this member is in NO-CONVERT
option, while default value is true
. There is no 4GL calls to get the current option value. My idea is the most recent related call Stream.setConvert()
or setConvertSource|Target()
will define the current effective value for the flag. All the calls are happening on the stream definition step. I think it is safe but may be I'm missing something.
BTW. One more idea for 4GL testcases I've got. We need to know how code page conversion approach if we change CPINTERNAL
or CPSTREAM
several time between I/O operations. Will the conversion be affected? I've upated the 4GL test related entry.
#82 Updated by Eugenie Lyzenko almost 5 years ago
Greg Shah wrote:
12.
Stream
is used on both the server and the client. I think the direct usage ofEnvironmentOps
is a problem in this case. I also think we need to send cpinternal and cpstream values to the client once for the whole session, otherwise it will be expensive to use an up-call to the server each time a stream instance is created.
With my upcoming notes resolution update there will be no EnvironmentOps
calls for code page related code in Stream. The I18nOps
class has local context area for both server and client. The respective constants are initialized once per session so if my understanding is correct there will be no extra calls from client to server for every IO operation.
#83 Updated by Eugenie Lyzenko almost 5 years ago
Task branch 3753a
has been updated for review to revisions 11326
.
The notes resolved except point 1
(not yet finished) and points 11
, 12
, 13
(there are something to discuss there).
Also for note 9
. Changed to use code point string constructor for greater than 255 chars.
Continue working with note 1
.
#84 Updated by Eugenie Lyzenko almost 5 years ago
Task branch 3753a
has been updated for review to revisions 11327
.
Completed code review notes resolution. To continue work on notes 11
, 12
, 13
I need some feedback to decide what to do.
#85 Updated by Greg Shah almost 5 years ago
We need to know how code page conversion approach if we change CPINTERNAL or CPSTREAM several time between I/O operations. Will the conversion be affected? I've upated the 4GL test related entry.
I don't think this is possible. These attributes are read-only as found in #3753-55. Th only way I have seen to set these is with the command line options. If that is correct, then these are fixed at the start of the 4GL process. Do you know of a way to change these values during the 4GL session instead of just with command line options?
If not, please edit the test entry to remove this item.
#86 Updated by Eugenie Lyzenko almost 5 years ago
Greg Shah wrote:
We need to know how code page conversion approach if we change CPINTERNAL or CPSTREAM several time between I/O operations. Will the conversion be affected? I've upated the 4GL test related entry.
I don't think this is possible. These attributes are read-only as found in #3753-55. Th only way I have seen to set these is with the command line options. If that is correct, then these are fixed at the start of the 4GL process. Do you know of a way to change these values during the 4GL session instead of just with command line options?
Yes, you are right, it is read only features setting up one time on application start. Sorry for confusion, I've lost it for a some time.
If not, please edit the test entry to remove this item.
Removed.
#87 Updated by Greg Shah almost 5 years ago
Eugenie Lyzenko wrote:
Greg Shah wrote:
11. In
Stream.setConvertSource(String)
, why isconvert
settrue
whentargetCp null
? It seems like this is the opposite of what should happen. This is especially the case since it is possible sincesourceCp
may be set tonull
. There is the similar question forStream.setConvertTarget()
and the use ofsourceCp null
.The idea behind this logic is: assume only
sourceCP
ortargetCP
is set to valid value while other value (targetCP
orsouceCP
) remainsnull
. What does it mean from conversion perspective? I think it meanssource != target
in general and causes the conversion to be active. Then if missingCP
is resolved to default inI18nOps
code and finally we found the source and target are the same the actual conversion code will be ignored inI18nOps
. Yes, this will happen not too early but I guess extra check is better than wrong handling.
The first call to either one of setConvertSource()
or setConvertTarget()
will always find the other value to be null
. For example, if both setConvertSource()
and setConvertTarget()
are being called for a stream, the first one called will set convert
true and the result of the second call will depend on the specific values being used.
It is possible to specify NO-CONVERT
in the stream definition. That will call setConvert(false)
. Then if other code specifies CONVERT SOURCE x TARGET y
(or one of the other forms), then this will be overridden. Is that what the 4GL does in this case?
Interestingly enough, when neither NO-CONVERT
or CONVERT ...
is present, then the convert
flag should default to true
and the source/target codepages are simply the cpinternal and cpstream (on output and the other way around for input).
My concern is that we can lose state in all this processing. It seems to me that convert
should default to true
and only ever be flipped to false
if NO-CONVERT
is specified.
The setConvertSource()
and setConvertTarget()
don't need to change that flag (unless you find evidence that the 4GL does it that way). For example, that an earlier NO-CONVERT
is overridden by a later CONVERT ...
. We should check the other ordering too (CONVERT...
followed by NO-CONVERT
).
Then at input or output time, we should be able to resolve the source/target codepages and whether conversion is needed as follows:
String sourceCodepage(boolean input) { return (sourceCp == null) ? sourceCp : (input ? streamCp : internalCp); } String targetCodepage(boolean input) { return (targetCp == null) ? targetCp : (input ? internalCp : streamCp); } boolean needsConvert(String sourceCp, String targetCp) { return convert && ((sourceCp == null && targetCp != null) || (sourceCp != null && targetCp == null) || !sourceCp.equalsIgnoreCase(targetCp)); }
Does this make sense? I think this will work so long as we ensure that sourceCp
and targetCp
are never set to ""
or a string with " "
(whitespace).
#88 Updated by Eugenie Lyzenko almost 5 years ago
Greg Shah wrote:
The first call to either one of
setConvertSource()
orsetConvertTarget()
will always find the other value to benull
. For example, if bothsetConvertSource()
andsetConvertTarget()
are being called for a stream, the first one called will setconvert
true and the result of the second call will depend on the specific values being used.It is possible to specify
NO-CONVERT
in the stream definition. That will callsetConvert(false)
. Then if other code specifiesCONVERT SOURCE x TARGET y
(or one of the other forms), then this will be overridden. Is that what the 4GL does in this case?Interestingly enough, when neither
NO-CONVERT
orCONVERT ...
is present, then theconvert
flag should default totrue
and the source/target codepages are simply the cpinternal and cpstream (on output and the other way around for input).My concern is that we can lose state in all this processing. It seems to me that
convert
should default totrue
and only ever be flipped tofalse
ifNO-CONVERT
is specified.The
setConvertSource()
andsetConvertTarget()
don't need to change that flag (unless you find evidence that the 4GL does it that way). For example, that an earlierNO-CONVERT
is overridden by a laterCONVERT ...
. We should check the other ordering too (CONVERT...
followed byNO-CONVERT
).Then at input or output time, we should be able to resolve the source/target codepages and whether conversion is needed as follows:
[...]
Does this make sense?
Yes, I'm still thinking about this too, having similar solution(actually I prepared the changes and was going to upload it):
1. Introduce another flag: convertInt
.
2. The legacy convert
flag will store only explicit change by NO-CONVERT
option(it will always match the current CONVERT
read-only attribute):
... public void setConvert(boolean convert) { this.convert = convert; this.convertInt = convert; } ... public boolean getConvert() { return convert; }
3. The effective convert flag can be changed in different places(leaving
convert
untouched):public void setConvertSource(String cp) { ... if (sourceCp != null && targetCp != null) { convertInt = !targetCp.equalsIgnoreCase(sourceCp); } } ... public void setConvertTarget(String cp) { ... if (sourceCp != null && targetCp != null) { convertInt = !sourceCp.equalsIgnoreCase(targetCp); } } ... private void initDefaultCodePages() { ... convertInt = !streamCp.equalsIgnoreCase(internalCp); }
The initDefaultCodePages()
is always calling on String
construction while setConvert(Source|Target)
are optional and we can not sure which one will be first, so I've made duplication.
4. Then when the IO should happen the conversion condition become:
if (convertInt && convert) { make I/O operation with conversion }
Is it acceptable approach? If not please let me know and I will rework with your case.
I think this will work so long as we ensure that
sourceCp
andtargetCp
are never set to""
or a string with" "
(whitespace).
We certainly need to add the protection for this case.
#89 Updated by Greg Shah almost 5 years ago
I prefer to go with the workers as I documented them. The primary reason is that the logic of whether or not we are converting (and which codepages are used) is very clear. It can be seen from just the 3 helper methods. A secondary reason is that the code as you've recorded it does not handle the cases where we have only one (sourceCp
or targetCp
) set null
and the other not set. I am not opposed to caching the result of the first input or output, but I don't want to do it "as we go". There is no advantage to doing that and it just spreads the calculate out over lots of places, making it harder to see the complete logic.
#90 Updated by Eugenie Lyzenko almost 5 years ago
Greg Shah wrote:
I prefer to go with the workers as I documented them. The primary reason is that the logic of whether or not we are converting (and which codepages are used) is very clear. It can be seen from just the 3 helper methods. A secondary reason is that the code as you've recorded it does not handle the cases where we have only one (
sourceCp
ortargetCp
) setnull
and the other not set. I am not opposed to caching the result of the first input or output, but I don't want to do it "as we go". There is no advantage to doing that and it just spreads the calculate out over lots of places, making it harder to see the complete logic.
OK. Did you mean to implement helper methods in Stream
class, correct?
#91 Updated by Greg Shah almost 5 years ago
Did you mean to implement helper methods in Stream class, correct?
Yes.
#92 Updated by Eugenie Lyzenko almost 5 years ago
Greg Shah wrote:
Did you mean to implement helper methods in Stream class, correct?
Yes.
Done. Task branch 3753a
has been updated for review to revisions 11328
.
Notes resloution for points 11
and 13
.
#93 Updated by Greg Shah almost 5 years ago
Code Review Task Branch 3753a Revision 11328
The changes are a good step.
I have reworked the code to make Stream
more efficient, to fix a bug in my original Stream
code from #3753-87, to eliminate code duplication in Stream
and to move the actual conversion processing in the I18nOps
worker code which avoids use of BDT wrappers when called from Stream
.
Please see revision 11329. If you have no concerns, then retest your testcases with this version. Fix any issues. Then you can start regression testing.
#94 Updated by Greg Shah almost 5 years ago
Revision 11330 fixes a bug in my management of the caching flag.
#95 Updated by Eugenie Lyzenko almost 5 years ago
I'm OK with changes. Minor fixing.
Task branch 3753a
has been updated for review to revisions 11331
.
This is the fix for switch operator handling issue that gets incorrect search type calculation and prevents BOTH
mode search.
So far the local tests are OK on my guess. Starting the regression tests for conversion and runtime.
#96 Updated by Eugenie Lyzenko almost 5 years ago
Greg,
The conversion testing is in progress.
If the conversion will be OK can we include the changes for 4010
and 4066
into 3753a
branch to speed up the fixes to be in trunk?
#97 Updated by Greg Shah almost 5 years ago
If the conversion will be OK can we include the changes for
4010
and4066
into3753a
branch to speed up the fixes to be in trunk?
Yes, go ahead with this.
#98 Updated by Eugenie Lyzenko almost 5 years ago
Conversion passed, source are identical except one added new call to setNoMap()
which is OK considering NO-MAP
option new support.
But there are many compilation errors like this:
... [javac] /home/evl/testing/majic/src/aero/timco/majic/item/Item58R.java:589: error: incompatible types: int cannot be converted to int64 [javac] new PutField(() -> chr(10)) [javac] ^ ...
Working on resolution. Looks like the character.java
code has a regression in handling chr()
method.
#99 Updated by Eugenie Lyzenko almost 5 years ago
Task branch 3753a
has been updated for review to revisions 11332
.
Fixed the regression in character.java
and merged fixes from 4010
, 4066
. Starting the runtime regression tests.
#100 Updated by Eugenie Lyzenko almost 5 years ago
One main round of the runtime testing passed, started another one to exclude false failing tests. The CTRL-C
part is OK
.
#101 Updated by Eugenie Lyzenko almost 5 years ago
Testing completed. No regression has been found. The results: 3753a_11332_32748a0_20190510_evl.zip
.
So far the branch 3753a
rev 11332
is ready to be merged to the trunk. Let me know please if I can do this now?
#102 Updated by Greg Shah almost 5 years ago
You can merge to trunk.
#103 Updated by Eugenie Lyzenko almost 5 years ago
Greg Shah wrote:
You can merge to trunk.
OK. Starting the merge process.
#104 Updated by Eugenie Lyzenko almost 5 years ago
Branch 3753a
was merged to trunk as revno 11307
then it was archived.
#105 Updated by Greg Shah almost 5 years ago
TODO: We need to check the FWD runtime for references like this: a_string.getBytes(Charset.forName("ISO-8859-1"))
(this example is from BinaryData
). I suspect these kinds of cases need to be switched to honoring one of the CP*
attributes (e.g. CPINTERNAL
).
#106 Updated by Greg Shah almost 5 years ago
Are 4GL system error messages translated, too? (we have been assuming YES, that CURRENT-LANGUAGE will select different message sources in OpenEdge)
The answer to this is YES. We will need to localize the messages.
Please note that we will need to find out if this will vary by current setting (at error time) of SESSION:CURRENT-LANGUAGE
, by SESSION:CURRENT-LANGUAGE
at the time the failing code was loaded or if it is global to the 4GL process (based on locale).
Fixing this will also be a good opportunity to create a better set of error helpers that can allow us to centralize the error processing.
#107 Updated by Greg Shah almost 5 years ago
TODO: Stanislav notes the following:
When a longchar/clob value is assigned to a clob field, it is implicitly converted into the codepage of the target field. I suppose we'll have to handle it in RecordBuffer.invoke in the future.
We will need to handle this in assignments in both directions (e.g. also TO longchar) if we don't already do it properly.
I think BUFFER-COPY will need this too.
#108 Updated by Eric Faulhaber almost 5 years ago
Greg Shah wrote:
TODO: Stanislav notes the following:
When a longchar/clob value is assigned to a clob field, it is implicitly converted into the codepage of the target field. I suppose we'll have to handle it in RecordBuffer.invoke in the future.
We will need to handle this in assignments in both directions (e.g. also TO longchar) if we don't already do it properly.
I think BUFFER-COPY will need this too.
FWD's implementation of BUFFER-COPY uses the RecordBuffer
invocation handler for its individual fields, so if we implement it in the invocation handler, we are covered for BUFFER-COPY.
#109 Updated by Marian Edu over 4 years ago
Hi @Greg, that task was mentioned in our last list so if you still need some test cases there can someone please make a list of what needs to be covered here?
Thanks
#110 Updated by Greg Shah over 4 years ago
Marian Edu wrote:
Hi @Greg, that task was mentioned in our last list so if you still need some test cases there can someone please make a list of what needs to be covered here?
Thanks
Yes, we still need tests. Please see the items in these notes:
#111 Updated by Greg Shah over 4 years ago
- Related to Feature #4378: properly handle clob/lonchar assignment, especially the implicit codepage conversion added
#112 Updated by Greg Shah over 4 years ago
Eugenie: I don't think that the NO-MAP
I/O option was ever implemented in the runtime. It just looks like a stub.
#113 Updated by Greg Shah over 4 years ago
Eugenie: The NO-CONVERT
and CONVERT
I/O options are listed as full and stub. I think this is incorrect. I think NO-CONVERT
should be runtime full and CONVERT
should be runtime "basic" (first working implementation but needs full testing and compatibility). Is that correct?
#114 Updated by Eugenie Lyzenko over 4 years ago
Greg Shah wrote:
Eugenie: The
NO-CONVERT
andCONVERT
I/O options are listed as full and stub. I think this is incorrect. I thinkNO-CONVERT
should be runtime full andCONVERT
should be runtime "basic" (first working implementation but needs full testing and compatibility). Is that correct?
Yes.
#115 Updated by Eugenie Lyzenko over 4 years ago
Greg Shah wrote:
Eugenie: I don't think that the
NO-MAP
I/O option was ever implemented in the runtime. It just looks like a stub.
Yes, I think MAP/NO-MAP
should have conversion support and stubbed in runtime(because was not clear what we need to do with this option at runtime).
#116 Updated by Mihai Popescu-Tiganea over 4 years ago
Eugenie Lyzenko wrote:
4GL tests requirement
The following option's dependency need to be tested in 4GL environment fromCURRENT-LANGUAGE
settings(most of them areSESSION
attributes):
CPINTERNAL
- session_cpinternal.pCPSTREAM
- session_cpstream.pCPCASE
- session_cpcase.pCPCOLL
- session_cpcoll.pCPLOG
- session_cplog.pCPPRINT
- session_cpprint.pCPRCODEIN
- session_cprcodein.pCPRCODEOUT
- session_cprcodeout.pCPTERM
- session_cpterm.pCHARSET
- session_charset.pCODEPAGE
- rcode_info_codepage.pThe possible
4GL
test scenario is:
1. Check the initial attribute value from the list.
2. Change theCURRENT-LANGUAGE
.
3. Re-check the attribute value to find out if it is changed.In a perfect world we need to know the behavior for both
Linux
andWindows
4GL
system.
Tests requested on this note has been created.
For each attribute, the corespondent test file is mentioned.
All files are in same directory: /testcases/i18n.
Tests were made in Windows
4GL
system.
#117 Updated by Mihai Popescu-Tiganea over 4 years ago
Constantin Asofiei wrote in #3753-10
All runtime needs to be implemented. Below is a list of what it needs to be done.
For FIX-CODEPAGE and GET-CODEPAGE, both conversion and runtime support required. Testing should be done for:dlc/convmap.cp support:
- is the codepage copied from one longchar value to another? - codepage_copy.p
- is the codepage involved in comparison operators? - codepage_operators.p
- FIX-CODEPAGE with empty, unknown, non-empty longchar vars - codepage_fix.p
- what if the codepage is already set? - codepage_fix.p
- clob fields - can they work with fix-codepage and get-codepage? Can the codepage be set in some other way? - codepage_clob.p
- editor with large-object (which can display a LONGCHAR val, with or without a codepage set) - how is the text displayed?
- assigning a longchar to a char - is the codepage inherited from the rvalue? - codepage_character.p
- assignment between longchars - the same, is the code page included in the assign? - codepage_longchar.p
- is the codepage affecting the character bytes? For example: - codepage_char_bytes.p
[...]
Are lc1a and lc1b equal - is the final text in lc1b unaffected by the initial codepage in lc2? The idea here is to determine if the longchar's codepage is used when assigning a text to it (thus the reference text is kept in memory converted in the target codepage).- how are other statements which work with strings, affected? - codepage_asc.p, codepage_caps.p, codepage_lc.p, compare_abl.p
- we need to specify the list of known codepages (or default to the Java's available codepages) -
- CODEPAGE-CONVERT, INPUT STREAM CONVERT work with this
- what is the default codepage value - some explanation is in https://documentation.progress.com/output/ua/OpenEdge_latest/index.html#page/dvint/determining-the-code-page.html
We have develop tests suggested by Constantin in this note.
Tests are located in /testcases/i18n.
For:
CODEPAGE-CONVERT and INPUT STREAM CONVERT:
- combinations of source and target codepages
- xml files with accepted combinations are in /testcases/i18n/4gl
- this files are generated by asc_support.p; chr_support.p; cpconvert_support.p; input_stream_support.p
- when run this files with FWD_VERSION env. variable, the output director will be /testcases/i18n/fwd. Comparison of xml files from 4gl vs fwd directories should help.
- we add ASC and CHR functions in tests
- the source codepage is not the real text's codepage
- test made for CODEPAGE-CONVERT - cpconvert_incorrect.p
- for INPUT STREAM CONVERT, ASC and CHR, source codepage mismatch error is not throwable because this information seems to be only in longchar variable.
- source/target codepages are not in the convmap.cp list
- test made for ASC - asc_no_convmap.p
- test made for CHR - * chr_no_convmap.p*
- test made for CODEPAGE-CONVERT - cpconvert_no_convmap.p
- test made for INPUT STREAM CONVERT - input_stream_no_convmap.p
#118 Updated by Greg Shah over 4 years ago
- Assignee deleted (
Eugenie Lyzenko)
#119 Updated by Mihai Popescu-Tiganea over 4 years ago
To run tests, follow steps should be done:
Create a database named tran in folder testcases/tran_man/db
Load definition and content from testcases/tran_man/data.
After that please disconnect from db.
Update or create database fwd with definition existed in testcases/db using files: fwd.db
Because fwd database is used in majority of tests do not forget to load users and domains from same folder.
Constantin Asofiei wrote on #3753-11:
Greg, see above for the runtime I18N - are these what you are looking for?
About the translation manager and translatable strings; we need tests to prove:
- Are all strings without
:U
translatable? - see these https://documentation.progress.com/output/ua/OpenEdge_latest/index.html#page/dvref/-22--22character-string-literal.html
- if string from code who is marked with quotes ex: “Test string” or apostrophe ex: ‘Test string’ is found in translation table then is translated
- strings without
:U
that are not in translation table are not translated - strings with
:U
are not translated even if there are in translation table - translation table contain only translation of string from code
- file u_option.p test this
- Translation Database
- How is the translation database saved? Is this a simple Progress DB? If so, what is the schema?
- translation database is created by translation manager
- it is a simple progress DB
- after a dump operation the files – schema and content - are in testcases/tran_man/data folder
- database is composed from 13 tables but it seems that 4 are important for performing translation without tranman
- tables are:
XL_instance
,XL_Language
,XL_string_info
andXL_translation
- translation mechanism could be:
- add info regarding file and string length from this file who will be translated in
XL_instance
- add language in
XL_Language
- add string who will be translated in
XL_string_info
- add language and translated string
XL_translation
- detailed explications
XL_instance
contain lines like11 14 "tran_man\static_string.p" 1 2458685.5095 no 17 1 "MESSAGE" "" ? 1
11
is index fromXL_string_info
who indicate string who will be translated14
is this table indextran_man\static_string.p
procedure who contain string who will be translated17
number of characters of string- remaining information seems that is not important
XL_Language
contain lines like"German" 0
where is mention language of translationXL_string_info
has line like11 "transparent glass" 2458683.38797 "" "transparent glass"
11
is index of this table - is mentioned also inXL_instance
"transparent glass"
string who should be translated- we do not know why this text appear twice in every line
XL_translation
has line like11 14 "German" "farbloses glas" 2458685.51116
11 14
is string who should be translated fromXL_instance
"German"
is language of translation"farbloses glas"
is translated version of string
- What is the character encoding of the database?
Code Page : UNDEFINED
andCollation : TRANMAN
- Can the source text and translated text be in different encodings?
- compiled file (.r) contain translated strings and source string for all language specified in compile statement.
- in this file there is only one codepage
- Is there any functionality in OpenEdge that read the translation database at runtime or is this only for building the compile-time r-code text segments?
- translation database is only used at compile-time for building r-code text segments.
- I do not know other functionality who read translation database at runtime
- How you can switch between translations, is this related to
CURRENT-LANGUAGE
?
- switching between translations is possible using
CURRENT-LANGUAGE
- to use
CURRENT-LANGUAGE
you should compile code with languages option - language set in
CURRENT-LANGUAGE
should match with language specified in compile statement
- Are 4GL system error messages translated, too? (we have been assuming YES, that
CURRENT-LANGUAGE
will select different message sources in OpenEdge)
- System error messages are translated using
PROMSG
statement likePROMSGS = prolang/promsg.rus .
- extension indicate language who will be used for error messages
- we did not find any connection between
CURRENT-LANGUAGE
andPROMSGS
statements
- Are only standalone static strings translated? What if the string is in an expression, like
"there is an error in program" + pname
- thethere is an error in program
, will this string be translatable?
- file static_string.p demonstrate that only static standalone strings are translated ( who are found at compile-time and are identical with one defined in
XL_string_info
- How does the 4GL behave if a string has a translation and others do not; is this something done at compile time, so a translation can't be done in future, or at runtime, and the .r will see any newly added translation?
- if a string does not have a translation then is left untranslated
- translation is made at compile-time not at runtime
Constantin Asofiei wrote on #3753-13:
Something else to add:
- texts at the schema definition (labels, and so on) - are these translatable?
- texts from schema definition are not translatable
- in fwd db we add label ’Search’ to field customerAddress from customer table
- file db_meta.p reveal this label but is not translated
#120 Updated by Mihai Popescu-Tiganea over 4 years ago
For Chinese language we have to use UTF-8 code page for all involved entities.
To run this tests following actions must be taken:
- create a database UTF-8 in testcases/tran_man/db named tran.db
- starting from Data Dictionary -> Create Database -> A Copy of Some Other Database
- use empty.db from DLC/prolang/utf
- load definitions and content from testcases/tran_man/data
- start a session using UTF-8 code page
Procedures who use test files are located in testcases/tran_man/run
Test files are compiled using compile statement with different languages.
Compiled code is saved in testcases/tran_man/obj/tran_man.
Compiled code is run using languages mentioned on compile statement and translation is checked.
#121 Updated by Marian Edu almost 4 years ago
Some things we've found and tried to fix soo far while implementing OO base classes...
- codepage conversion support is not using any 'convmap table', not sure if there are any plans for custom codepage conversion at all or just use what is available in JAVA
- 'iso8859-1' seems to be considered as the 4GL default
- CHR, ASC are not double byte enabled, codepages defaults and validation incomplete
Hopefully we will get some of those fixes so our tests on OO implementation passes... there are probably more like keyboard, stream, code codepage support.
Ah, as a side note we've found a strange issue with conversion - the backslash escape in strings present in source code, the backslash simply disappears in the generated Java code, Otherwise the 4GL escape character (tilde) seems to work just fine.
#122 Updated by Greg Shah almost 4 years ago
Ah, as a side note we've found a strange issue with conversion - the backslash escape in strings present in source code, the backslash simply disappears in the generated Java code, Otherwise the 4GL escape character (tilde) seems to work just fine.
In cfg/p2j.cfg.xml
we have the opsys
parameter. If it is set to UNIX
, then we act like the 4GL compiler on Linux/UNIX and honor both the \
and ~
as escape chars. If it is set to WIN32
, then we act like the 4GL compiler on Windows and we only honor ~
. Perhaps this is what you are seeing.
#123 Updated by Greg Shah almost 4 years ago
- Related to Feature #4761: I18N phase 3 added
#124 Updated by Greg Shah almost 4 years ago
- % Done changed from 0 to 100
- Status changed from WIP to Closed
The support from this task is already in trunk. The remaining items will be tracked in #4761.
#125 Updated by Greg Shah almost 2 years ago
- Related to Feature #6451: I18N phase 4 added