Bug #4766
fix CHR and ASC
0%
Related issues
History
#1 Updated by Greg Shah almost 4 years ago
As noted during #4384 work, Marian's team found that CHR and ASC were not double byte enabled, they use the wrong default codepage and validation is incomplete. This task is for tracking the cleanup work on those built-ins.
#2 Updated by Greg Shah about 3 years ago
Please see #4761 for more details.
#3 Updated by Ovidiu Maxiniuc about 3 years ago
CHR
and ASC
functions. Both of them will handle 'normal' parameters as expected (tested with UTF-8
, 8859-1
, 8859-15
, 1252
). The problem are the exceptions, because 4GL is a bit chaotic. Here are some issues I discovered:
- if a character code is not defined/supported for
8859-x
or1252
a non-emptycharacter
is still returned. In the case ofUTF-8
, an empty string will be returned instead; - the functions are multi-byte, meaning that they will successfully encode/decode characters like
€
inUTF-8
which occupies 3 bytes (14844588
/0x00E282AC
); - a funny thing: running
message asc("www", "UTF-8", "UTF-8").
will print7829367
. In hexadecimal this number is0x00777777
. Now, looking at character map we can see thatasc("w") = 119 = 0x77
. Runningmessage chr(7829367, "UTF-8", "UTF-8")
will print backwww
! Of course, this is valid not only forw
character, but seems limited to multi-byte codepages. chr(-1, "1252", "1252")
will actually return a value but it is meaningless (ÿÿÿÿÿÿÿÿ1
). Also it seems to alter over time;- executing
chr(188, "ISO8859-15", "ISO8859-15")
with the defaultCPINTERNAL
of"ISO8859-1"
will print¼
. However,¼
is not part of the target CP"ISO8859-15"
,Œ
should have been printed instead. But this character is not part of theCPINTERNAL
, so the character with same code is printed. "1252"
defines at code 140 the characterŒ
(\u0152
). The same character can be found in8859-15
at position 188 (as noted in previous item). Theoretically the character should have been printed bymessage chr(140, "ISO8859-15", "1252").
But it is not, instead errors 6063 and 1586 are issued.
#4 Updated by Greg Shah about 3 years ago
chr(-1, "1252", "1252") will actually return a value but it is meaningless (ÿÿÿÿÿÿÿÿ1). Also it seems to alter over time;
Yikes! I wonder if this is memory overflow/underflow problem. If the 4GL directly adds the value (e.g. -1) to a C/C++ pointer (memory address) then a negative value might be looking outside of the conversion tables.
Unless this proves to be stable in some way that an application can use, we will probably consider this an unimplemented "quirk".
#5 Updated by Marian Edu over 2 years ago
This is probably still a work in progress but just mentioned this here since we've found some of our tests in OO implementation were failing and that turned out to be because of some CHR related changes. Previously the CHR returned empty string, as it should or at least this is what 4GL does, now it returns space (%20). There was previously a condition in I18nOps
that returned null for codes less or equal to zero, that was removed - not sure about the high watermark test (65535) though.
#6 Updated by Ovidiu Maxiniuc over 2 years ago
Marian, please provide the (isolated) testcases you refer to in note #4766-5. Please specify the active CP you are working with.
#7 Updated by Ovidiu Maxiniuc over 2 years ago
Marian Edu wrote:
[...] There was previously a condition in
I18nOps
that returned null for codes less or equal to zero, that was removed
It was not, see I18nOps.java
:570
private static String get4glCharacter(int ascCode, Charset charset)
{
if (ascCode < 0)
{
return null;
}
not sure about the high watermark test (65535) though.
it returns the empty string. I do not think the new implementation will return a space except when the parameter is 0x20 / 32dec.
#8 Updated by Marian Edu over 2 years ago
Ovidiu Maxiniuc wrote:
Marian Edu wrote:
[...] There was previously a condition in
I18nOps
that returned null for codes less or equal to zero, that was removedIt was not, see
I18nOps.java
:570
The test there is for less than zero, I was (trying) to refer to the case when the code number is actually zero, sorry for not being clear in my message :(
CHR gives " " (space) now while in 4GL is "" (empty).
#9 Updated by Ovidiu Maxiniuc over 2 years ago
Nice catch. The problem is not actually in I18nOps
(which converts chr(0)
to "\0"
) but in character
constructor. Before assigning the value to its internal value, it is a bit processed by Text.javaSpacifyNull()
. Apparently the instances of character
data type are spacified in 4GL. Since there is a single \0
character, it will be converted to space. This is new for me. I will address this issue in the next commit.
#10 Updated by Ovidiu Maxiniuc over 2 years ago
The fix for chr(0)
was committed in revision 12898/3821c.
#11 Updated by Greg Shah over 2 years ago
Code Review Task Branch 3821c Revision 12898
The changes look good.
#12 Updated by Greg Shah about 1 year ago
What is left to do in this task? Please note that in #6428 Joe is fixing an issue related to lead-byte processing in CHR. I don't know what else is needed but would like to list the items here.
#13 Updated by Greg Shah about 1 year ago
- Related to Feature #6428: implement IS-LEAD-BYTE() built-in function added