Project

General

Profile

Feature #1884

add some of the v10 data types and core built-ins

Added by Greg Shah over 11 years ago. Updated over 7 years ago.

Status:
Closed
Priority:
Normal
Start date:
08/19/2013
Due date:
% Done:

0%

billable:
No
vendor_id:
GCD

ail_upd20121206a.zip (165 KB) Adrian Lungu, 12/06/2012 03:37 PM

ca_upd20130128a.zip (167 KB) Constantin Asofiei, 01/28/2013 05:23 AM

ca_upd20130129a.zip - P2J sources (312 KB) Constantin Asofiei, 01/29/2013 10:35 AM

ca_upd20130129b.zip - testcases (3.61 KB) Constantin Asofiei, 01/29/2013 10:35 AM

ca_upd20130204b.zip (319 KB) Constantin Asofiei, 02/04/2013 07:08 AM

ca_upd20130206a.zip (323 KB) Constantin Asofiei, 02/06/2013 01:40 PM

ca_upd20130206b.zip (324 KB) Constantin Asofiei, 02/06/2013 02:04 PM

ca_upd20130207b.zip (319 KB) Constantin Asofiei, 02/07/2013 02:04 AM

ca_upd20130207e.zip (333 KB) Constantin Asofiei, 02/07/2013 10:23 AM

ca_upd20130208a.zip (341 KB) Constantin Asofiei, 02/08/2013 04:39 AM

ca_upd20130226e.zip - fix for LENGT(var, type) (22.7 KB) Constantin Asofiei, 02/26/2013 03:25 PM

ca_upd20130306h.zip (18.3 KB) Constantin Asofiei, 03/06/2013 01:30 PM


Subtasks

Feature #2167: add longchar and codepage supportNewOvidiu Maxiniuc


Related issues

Related to Base Language - Feature #1584: add conversion and runtime support for INT64 and DATETIME data types Closed 12/17/2012 05/10/2013
Blocks Base Language - Feature #1646: implement BASE64-ENCODE/BASE64-DECODE built-in functions Closed 09/27/2013 10/04/2013

History

#1 Updated by Greg Shah over 11 years ago

types: DATETIME, LONGCHAR, INT64 (needed for DB support of _lock-recid and for sequences); DATETIME, NOW, ADD-INTERVAL, MTIME

#2 Updated by Greg Shah over 11 years ago

  • Target version changed from Milestone 7 to Milestone 3

#3 Updated by Greg Shah over 11 years ago

  • Start date set to 11/01/2012
  • Assignee set to Adrian Lungu

The LONGCHAR support provides character data in 4GL that can exceed 32KB in size. Our character class already can handle that basic feature, however someone will have to look at all the functionality that can be done with a LONGCHAR to see if we should just create a subclass of character with some extra capabilities (but which stores its data the same way) OR if we need a sibling class that has a different internal data structure (e.g. usage of a char[]). Either way, we will have a com/goldencode/p2j/util/longchar.java class added.

Investigate the 4GL features specific to LONGCHAR and document them here. Then we will decide on an approach.

#4 Updated by Adrian Lungu over 11 years ago

  • Status changed from New to WIP

#5 Updated by Greg Shah over 11 years ago

Some other thoughts on LONGCHAR:

0. Look through the com/goldencode/p2j/util/character class carefully. Also look at BaseDataType.
1. Search through the Progress 4GL Language Reference to find all the unique ways that LONGCHAR is used. Look carefully at those features.
2. Write testcases to explore these features.
3. Expand the testcases to explore how LONGCHAR is interoperable with CHARACTER. In what ways can they be used interchangeably? For example, most code (like built-in functions) that can take an INTEGER can also take a DECIMAL because there are implicit conversions between the two types. Likewise, arithmetic and comparison operators work on both numeric types. This kind of "hidden" behavior is important to consider in our design.
4. Are there any ways to directly access specific characters by index position that are NOT supported for CHARACTER?

Document your findings here.

#6 Updated by Adrian Lungu over 11 years ago

LONGCHAR data type:
  • CHARACTER data that is not limited to 32K in size. LONGCHAR variables can be any size (limited by system resources).
  • metadata:
    - code page information (defaults to -cpinternal)
    - other information ( the codepage is fixed or not)
Code page manipulation :
  • FIX-CODEPAGE statement
  • CODEPAGE–CONVERT function
  • COPY-LOB statement
Character functions applied to LONGCHAR:
  • the LONGCHAR is converted to -cpinternal
  • most functions which alter the string only apply to LONGCHAR values that are in -cpinternal
  • all functions which deal with LONGCHAR values and have offset and length input parameters use character offsets and
    lengths, not byte values.
Defining LONGCHAR
  • default initial value "" (like character)
  • the documentation states that: you cannot use the INITIAL option to specify an initial value for this data type as part of the definition of a variable, procedure parameter, or class-based property. But in fact specifying initial value will assign a value to the LONGCHAR variable.
  • defining a LONGCHAR variable supports the same options as a CHARACTER variable, except for the FORMAT option and all VIEW-AS options except VIEW-AS EDITOR LARGE.
Other restrictions:
  • Export statement: for LONGCHAR only if it's only field in the list. This will save the data in (UTF8)
  • "Remote functions". If you invoke a function on an AppServer, the function cannot return a value as a LONGCHAR.
  • The expression COLUMN is not valid for a LONGCHAR variable.(to be tested)

#7 Updated by Greg Shah over 11 years ago

My only question so far is related to the codepage info. Is there a difference between how character and longchar handle codepages? The points about codepages don't make this clear. For example, as far as I understand, the character data is stored and processed in the 4GL using cpinternal. Only input and output of character values is done in a different codepage. Of course, my understanding may be defective. :)

#8 Updated by Adrian Lungu over 11 years ago

There are no differences between CHAR and LONGCHAR related to code page.
  • CODEPAGE–CONVERT works the same way for CHAR and LONGCHAR.
  • FIX-CODEPAGE and COPY-LOB cannot take CHAR parameters.
Another point stressed a lot in the documentation is the fact that characters are used as measuring units and not bytes. For example for the SUBSTRING function:
  • for a CHAR parameter there are CHARACTER, FIXED, COLUMN, and RAW as length/position types
  • for LONGCHAR only CHARACTER is a valid length/position type.

Operators:

Comparision
  • LT,GT,LE,GE,EQ,NE between two LONGCHARS or a LONGCHAR and a CHARACTER -> values are converted to -cpinternal
  • MATCHES: only the first parameter could be LONGCHAR. The LONGCHAR variable value converted to -cpinternal prior to
    comparison.
  • BEGINS: only the first parameter could be LONGCHAR. The LONGCHAR variable value converted to -cpinternal prior to
    comparison. BEGINS always uses the -cpcoll collation.
Concatenation (a + b)
  • if a or b is LONGCHAR the result is LONGCHAR.
  • the result has the codpage of the expression on the left
Assignment (longchar_var = expr)
  • there are default assignment character conversions (CHAR <-> LONGCHAR <-> CLOB ...)
  • expr is converted to -cpinternal or fixed code page(if longchar_var code page is fixed)
  • char_var = longchar_var . Works if LONGCHAR could be fit to a CHAR (32k?). For a LONGCHAR with a larger size there will be a runtime error.
  • a LONGCHAR cannot be used in a place where a CHAR is required. There will be a compile time error. Also for functions taking only LONGCHAR parameters passing a CHAR will result in a compile time error.

Note:
The undocumented function _cbit() works only for CHAR data type and not for LONGCHARS.

#9 Updated by Greg Shah over 11 years ago

From Adrian:

I believe that this longchar type could be implemented on the existing character class. I didn't find any case(a function, operator or just a phrase in the documentation) that will suggest that longchar is a different type from char apart from the size and some emphasis on codepage(protecting this with fix-codepage).

We need to consider this carefully. Your descriptions of the features do note that in the 4GL, there are cases where there are runtime limits (e.g. errors that can occur) and runtime behaviors that are specific to LONGCHAR. We may need to duplicate these features exactly. For example, the SUBSTRING function has very specific behavior with LONGCHAR and different behavior with CHARACTER.

It also seems like there is very specific codepage behavior when using operators on LONGCHAR. I don't think CHARACTER has any such behavior like that.

Some of the other limits could possibly be implemented at conversion time, but I think we might want to implement with a real LONGCHAR class to get it exactly right.

#10 Updated by Adrian Lungu over 11 years ago

I reviewed the 4gl operator functions and procedures implemented by the character class and also the ones taking LONCHARs as parameters or having a LONGCHAR result type.

Functions that will take only CHARACTER parameters. This will remain implemented only in character class.

_cbit
asc

Functions/operators having a main parameter of type LONGCHAR or CHARACTER and the result of the function determined by the type of this parameter(c/lc means CHARACTER or LONGCHAR):

c = c + c                lc = lc + c    lc = c + lc    lc = lc + lc 
c = substitute(c,c/lc)   lc = substitute(lc,c/lc)
c = substring(c,..)      lc = substring(lc,..)
c = caps(c)              lc = caps(lc)              
c = replace(c,c/lc,c/lc) lc = replace(lc,c/lc,c/lc)
c = right-trim(c)        lc = right-trim(lc)
c = left-trim(c)         lc = left-trim(lc)
c = trim(c)              lc = trim(lc)
c = entry(,c)            lc = entry(,lc)

Functions that could take a combination of LONGCHAR and CHARACTER variables:

compare(c/lc,c/lc)
index(c/lc,c/lc)
r-index(c/lc,c/lc)
overlay(c/lc, , ) = c/lc
num-entries(c/lc)
lookup(c,c/lc)

Functions with parameters of the same type( no mixing):

minimum(c,c) minimum(lc,lc)
maximum(c,c) maximum(lc,lc)

Restrictions on the type specification:

length(c,["CHARACTER" | "RAW" | "COLUMN"])
length(lc,"CHARACTER")

substring(c,pos,len,["CHARACTER" | "FIXED" | "RAW" | "COLUMN"])
substring(lc,pos,len,"CHARACTER")

Functions related to LONGCHAR:

FIX-CODEPAGE(lc) = codepage
IS-CODEPAGE-FIXED(lc)
GET-CODEPAGE(lc)
CODEPAGE-CONVERT(c/lc,target_codepage,source_codepage) - also for CHARACTER type but not implemented
BASE64-DECODE(lc)
lc = BASE64-ENCODE()
COPY-LOB    
SHA1-DIGEST(c/lc,c/lc)
ENCRYPT(c/lc,,,)
MD5-DIGEST(c/lc,c/lc)
GENERATE-PBE-KEY(c/lc)
NORMALIZE(c/lc)

A lot of the functions/procedures listed above could end up being replicated in both character and longchar classes(probably with all parameter type combinations).

#11 Updated by Adrian Lungu over 11 years ago

In order to go further with implementing the LONGCHAR data type I created the longchar class as a copy of the character class and (successfully)tested the base64-encode and base64-decode functions.

The next steps will be to add two new fields to this class:
  • codepage
  • fixedcodepage
and implement methods related to codepage:
  • CODEPAGE-CONVERT
  • FIX-CODEPAGE
  • GET-CODEPAGE

For example, as far as I understand, the character data is stored and processed in the 4GL using cpinternal. Only input and output of character values is done in a different codepage. Of course, my understanding may be defective. :)

I arrived on the same conclusion here. I tried to find a function that will give more information about internal representation but most of the functions will convert the longchar value to:
  • cpinternal (PUT-STRING)
  • utf-8 (EXPORT)
  • cpstream (COPY-LOB TO FILE)

Whatever the 4GL internal representation (and it is probably cpinternal ...) is for a LONGCHAR, the value will always be converted to one codepage or another. The same will be in P2J, just care should be taken to make the right conversion and produce the same output. The same for assignment of values to a LONGCHAR.

Please advise if I should go on with this class and implement all the methods needed.

#12 Updated by Greg Shah over 11 years ago

1. We don't want to duplicate code. BUT
2. We need to limit the operators, built-in functions and so forth as needed to exactly match longchar.

We definitely don't want to substantially duplicate the character class in longchar. Before going further with the current approach, let's consider/discuss the following. Can we meet all objectives using one of these options:

A. Create longchar as a subclass of character (and overriding methods where needed). It seems we might encounter issues with implementing proper runtime limits on usage of longchar in this case. But it would be easy to maximize the common code.

B. Create a common base class for both character and longchar. Move common code there and then specialize as needed.

In both cases, we must consider that most operators and built-in functions are implemented using static methods. Both options above may encounter some difficulties in that regard.

Please consider these and offer your thoughts (and other options too should you have any).

#13 Updated by Adrian Lungu over 11 years ago

I admit that a new longchar class with a lot of duplicated methods is not a solution but I went on this route to have a start and see what else is to be done like:
  • rules needed for adding a new type
  • rules needed to set the result type to CHAR/LONGCHAR on functions that will take both type of parameters and the result is based on the type of the parameter
I also have only pieces of the big picture and it is hard for me to envision the final result when the current way of action for me is:
  • make the change ( like adding a new function)
  • fix the errors ( check what rules need to be changed / added)
  • verify if all functionality is in place ( and take into consideration if there will be undesired functionality)

I considered that making longchar class a subclass of character or extracting common functionality in a base class could be done as a refactoring at the end (wishful thinking) or, more probably, on the first big issue encountered that will suggest(or impose) a solution or another. If I'll start with a subclass of the character class the same reasoning could be applied.

Apart from a subclass or a common base class as final implementation solutions I do not see what else could be done.

#14 Updated by Greg Shah over 11 years ago

Go forward with this approach:

1. Make longchar a subclass of character. Override only as minimally necessary. If is OK to make the value, unknown and/or caseSens members of character protected if longchar needs direct access.

2. To the extent that the static methods in character need to be aware of both classes, then it is best to move those methods to a new class TextOps. I don't want to make the methods in character have references to the child class longchar. That is poor design. Make a list of the stuff that MUST move out. We may want to just move all function and operator support out to be consistent.

Is there anything about longchar that cannot be cleanly supported using the above approach?

#15 Updated by Adrian Lungu over 11 years ago

I recreated the longchar class as a subclass of character and added only the constructors.
I created the new class TextOps and moved the following methods
  • numEntriesOf (NUM-ENTRIES 4GL function)
  • toUpperCase (CAPS 4GL function)
    along with utility methods fixupRegexChar and safeDelimiter

The problem:
Progress built-in function num-entries(c/lc,c). The first parameter could be CHARACTER or LONGCHAR but the second only CHARACTER.

The current corresponding method signature is:
public static integer numEntries(character list, character delimit)

This will allow using both character and longchar as parameters in P2J but in 4GL there will be an error on using a longchar for the second parameter.
It is like the hierarchy is upside down. If the character class will be derived from longchar then:
public static integer numEntries(longchar list, character delimit)
will limit the second parameter to the character class.

The rule in 4GL seems to be that you can use a CHARACTER as a function parameter defined to take a LONGCHAR.

a LONGCHAR cannot be used in a place where a CHAR is required. There will be a compile time error. Also for functions taking only LONGCHAR parameters passing a CHARACTER will result in a compile time error.

An update of this statement:

1. compile time error when using a LONGCHAR where a CHARACTER is required (as a function parameter).
Cases:
  • _CBIT
  • Asc
  • MINIMUM and MAXIMUM functions where both parameters should be of the same type
  • User defined function parameters( and statements ).
2. auto widening when using a CHARACTER for a LONGCHAR parameter
Exceptions:
  • FIX-CODEPAGE
  • IS-CODEPAGE-FIXED
  • GET-CODEPAGE
  • INPUT-OUTPUT parameters (for INPUT parameters works)
    The first three built-in functions are LONGCHAR specifics.

I'm thinking of inversing the hierarchy and making longchar the parent class.

#16 Updated by Adrian Lungu over 11 years ago

This is a list of static fields/methods that should be moved from the character class to the new TextOps class.

Static fields to be moved to TextOps (all this are used by the compare method and this was moved to TextOps)
CASE_INSENSITIVE
CASE_SENSITIVE
CHARACTER
COLUMN
EQ
FIXED
GE
GT
LE
LT
NE
RAW

Static methods to be moved to TextOps:
_begins
begins
_matches
matches
byteLength
byteLengthOf
compare
entry
fill
indexOf
lastIndexOf
trim
leftTrim
rightTrim
length
lengthOf
lookup
matches
matchesList
numEntries
replaceAll
substitute
substring
toLowerCase
toUpperCase

fixupRegexChar
genNegativeIndexError
genOutsideRangeError
safeDelimiter

Static methods NOT to be moved to TextOps:
testBitAt
asc
chr
maximum
minimum
concat

convertToRegEx
convertToSQLLike
convertToSQLLike
convertToSQLBegins
convertToSQLBegins
fromExportString

progressToJavaString
progressSpacifyNull
javaSpacifyNull
javaTruncateNull
preprocessFormatString
postprocessStringLiteral
readchar

valueOf

This refactoring will cause changes in:
com.goldencode.p2j.convert.ExpressionConversionWorker ( toUpperCase )
com.goldencode.p2j.persist.DMOValidator ( toUpperCase )
com.goldencode.p2j.persist.FieldReference ( toUpperCase )
com.goldencode.p2j.persist.Persistence ( rightTrim )
com.goldencode.p2j.persist.pl.Functions ( begins, entry, indexOf,... )
com.goldencode.p2j.preproc.Preprocessor ( lengthOf )
com.goldencode.p2j.uast.ExpressionEvaluator ( begins, matches)
com.goldencode.p2j.ui.client.FillIn ( lengthOf )

Note:
value, unknown and caseSens fields remained private as I modified direct field access with getValue(), isUnknown(), isCaseSensitive()

#17 Updated by Adrian Lungu over 11 years ago

TextOps class
I'm attaching an update containing:
- the TextOps class including fields and (static and some utility)methods transferred from the character class.
- the new stripped down character class.
- a basic longchar class derived from character including only the constructors.
- files updated as a result of moving the methods from the character class.

base64 encode/decode functions
This update also include the base64-encode and base64-decode current implementation:
- SecurityOps.java
- builtin-function.rules

If this files should not be included with this update remove the SecurityOps.java and the following lines from
builtin-function.rules :

<rule>ftype == prog.kw_base64_e
      <action>methodText = "SecurityOps.base64Encode"</action>
</rule>

<rule>ftype == prog.kw_base64_d
      <action>methodText = "SecurityOps.base64Decode"</action>
</rule>

#18 Updated by Eric Faulhaber over 11 years ago

  • Target version changed from Milestone 3 to Milestone 4

#19 Updated by Greg Shah over 11 years ago

  • Assignee changed from Adrian Lungu to Constantin Asofiei

#20 Updated by Constantin Asofiei over 11 years ago

After looking over Adrian's findings, to me it looks like the best solution is to have two classes - character and longchar - both inheriting the same base class. This way, we can enforce (at compile time) the parameter types for all functions which expect a parameter to be only char or longchar (like minimum/maximum or num-entries case).

Something else to consider, about concatenation. The following example:
def var ch as char.
def var l as longchar.
do i = 1 to 17000:
   ch = ch + "1".
end.
l = "".

l = ch + ch. /* case 1 */
l = (l + ch) + ch. /* case 2 */
l = l + (ch + ch). /* case 3 */

will produce different results:
  1. as the resulted length is greater than the maximum length for a character value (31984), it will result in a runtime error, as when concatening ch with itself, the result is still a character variable, which can not hold the combined string (of length 34000).
  2. the paranthesis are to emphasize that when concatenating a longchar with a char, it results a longchar. Here, the concatenation finishes, with l holding a 34000-char string.
  3. produces the same runtime error as case 1 - the concatenation in the paranthesis is done first, which produces a character which can not hold the resulted string of length 34000.

Finally, please confirm that all the longchar functions mentioned until now will need to be supported.

#21 Updated by Constantin Asofiei about 11 years ago

Merged ail_upd20121206a.zip with latest bazaar files (the APIs moved from character to TextOps were re-moved to TextOps, to not lose other changes).

#22 Updated by Greg Shah about 11 years ago

to me it looks like the best solution is to have two classes - character and longchar - both inheriting the same base class

Yes, I completely agree.

 please confirm that all the longchar functions mentioned until now will need to be supported.

Yes, but only for conversion purposes. You can and should defer runtime implementation where possible.

#23 Updated by Constantin Asofiei about 11 years ago

  • Assignee changed from Constantin Asofiei to Stanislav Lomany

More notes about the longchar and character classes. In 4GL, if we have an user-defined function like this:

function f0 returns longchar(lc as longchar):
   return lc.
end.

following are valid calls:
def var ch as char.
def var lc as char.
ch = "ch".
lc = "lc".

message string(f0("foo")). /* shows foo */
message string(f0(ch)). /* shows ch */
message string(f0(lc)). /* shows lc */

This shows that character values can be passed to a longchar parameter, but only as long as that parameter is an INPUT parameter. If the parameter is defined OUTPUT or INPUT-OUTPUT, in 4GL there are compile-time errors when character variables are passed. But, these compile-time errors appear only for simple function calls. In case DYNAMIC-FUNCTION is used, like in:
function f0 returns longchar(output lc as longchar):
   message string(lc). /* shows empty string */
   lc = "bar".
   return lc.
end.
def var ch as char.
ch = "foo".
message dynamic-function("f0", output ch). /* shows "bar" */
message ch. /* shows empty string */

even if the parameter is defined as output and a character variable is passed, no runtime errors appear; the weird part is that the value contained by ch is "lost" by dynamic-function when the function body is executed and also after dynamic-function ends, ch is set to empty string.

Checking how a character parameter behaves for user-defined functions:

function f0 returns char(output ch as char):
   message ch. /* shows empty string */
   ch = "bar".
   return ch.
end.
def var lc as longchar.
lc = "foo".
message f0(output lc). /* shows bar */
message string(lc). /* shows bar */
message dynamic-function("f0", output lc). /* shows "bar" */
message string(lc). /* shows bar */

shows that data is lost when passing a longchar variable to a character output parameter, but after function invocation the longchar variable gets updated.

The fact that the character and longchar are interchangeable, to me it looks similar to how the DYNAMIC-FUNCTION "auto-casts" the returned value to the type of the left-value in an assignment. Regardless of how we will implement the hierarchy, I think we will need to emit some special code to handle these cases.

Testing how the builtin function work when a parameter can accept both char and longchar, it seems that they are a special case, as nothing is lost when this is done (at least for the REPLACE function).

Finnaly, some more thoughts about the possible ways of how we would implement longchar and character:
  1. If a function has a character parameter and we use the "longchar extends character" hierarchy, the generated Java code will get compile errors, as the function expects a longchar, but character class is not a longchar. One way to solve this is to wrap (at the function call) each expression passed to a longchar parameter in a new longchar(<character>) c'tor. This will be fine for INPUT parameter cases, but for OUTPUT/INPUT-OUTPUT there will be problems, as we will not be able to alter the variable.
  2. If we use a common base class for character and longchar, and looking at how longchar/character are interchangeable, the same solution of wrapping the expression in a c'tor appears.
For both cases, it will be difficult to implement the OUTPUT/INPUT-OUTPUT cases presented above: we will not be able to change the variable, as we will not be able to pass the variable reference to the parameter (unless we declare the parameter as the base class, and not as explicit longchar/character types). Declaring the parameter as the base type instead of the explicit longchar/character type will mean that we will need to emit special code in cases when:
  • the real parameter type is longchar, the mode is OUTPUT/INPUT-OUTPUT and a character variable is passed. In this case, it will reset the character variable to empty string, for both function body and after the function call.
  • the real parameter type is character, the mode is OUTPUT/INPUT-OUTPUT and a longchar variable is passed. In this case, it will reset the variable, but any changes will survive after the function body ends.
    This special code would look like (in Block.init):
    <parameter> = character.prepare(<parameter>); /* for character parameters */
    <parameter> = longchar.prepare(<parameter>); /* for longchar parameters */
    

If we use a base class, the decision we need to make is if we will emit the paramters as the base class or as their explicit type.

#24 Updated by Constantin Asofiei about 11 years ago

  • Assignee changed from Stanislav Lomany to Constantin Asofiei

I've re-changed asignee back to me, I've changed it by mistake.

#25 Updated by Greg Shah about 11 years ago

Normally, we always define parameters as the exact match to the type originally specified. I prefer to stay with that approach as it is more clear to the reader of the code, what the intent was.

I think using a base class for the hierarchy makes sense. This parameter case is messy, but it also should be rare, since it has little possible useful purpose. So if there is a bit of extra messiness in this case, I can accept that.

How about using a technique similar to the decimal.PrecisionResetter inner class? This is a case where in downstream procedure, the precision of a shared decimal variable can be temporarily forced to a new value and then reset upon leaving the nested scope. In this longchar/char param case, it seems like for OUTPUT/INPUT-OUTPUT parms we could save a reference to the original var in an inner class instance and then reset the original parm values as needed based on the scope notifications. A special wrapping constructor or static factory method would be used to prepare the parm before the call. It can setup the new instance with the proper type and the inner class instance prepped. The registration would have to be done inside the function, although we can make a facility similar to TransactionManager.registerNextExternal() to hide this away.

#26 Updated by Constantin Asofiei about 11 years ago

OK, I think I understand your approach. As we will use a base class (maybe named "text"?), all the built-in functions which can accept both character and longchar for a parameter will have that parameter set to the base class; the character/longchar types will be set only when it is sure the parameter can be only of that type.

#27 Updated by Greg Shah about 11 years ago

Actually, I was still wanting the parameter set to the specific type. The reason is that this parameter mismatch case should be rare. So instead of optimizing for that case, I want to just put in special handling only when needed.

Yes, I like the idea of the Text base class. But I don't want to force all parameters to that type.

#28 Updated by Constantin Asofiei about 11 years ago

I refer to the built-in functions which can accept both character and longchar for a parameter, like replace. As it was moved to TextOps, we can set its parameter to Text instead of character. If we don't do this, we will have to create a TextOps.replaceAll API for each element of the (String, character, longchar)x(String, character, longchar)x(String, character, longchar) cartesian product (because replace accepts 3 parameters and each can be set to either string constant, character or longchar).
If you worry about their return type, this will still need to be Text, because as far as I can tell, the type of the returned value is dependent on the type of some parameter. This applies to the concatenation operator too: the returned value can be either character or longchar.

#29 Updated by Greg Shah about 11 years ago

Sorry, I missed the "builtin-functions" part. Yes, I agree that the built-ins should take the base class.

In regards to the return type, I think using generics <T extends Text> is a good solution, especially for the cases where the return type matches one of the inputs.

#30 Updated by Constantin Asofiei about 11 years ago

Attached update (although it misses javadoc for a lot of the new APIs) has the following changes:
  1. added base class Text, from which longchar and character are derived. I didn't invest much in separating the "internal representation" of character and longchar, but I think we will be able to use the java.lang.String as a backend, limiting the length for character class and allowing codepage support for longchar.
  2. TextOps is cleaned up and contains all APIs common to character and longchar (the code-page related APIs are stubbed).
  3. SecurityOps is added misc security-related builtins
  4. conversion support for longchar IMO is done (please take a look at the longchar_functions.p and longchar1.p from attached 29b.zip update - is not yet in bazaar). Let me know if you can think of something else I should test related to longchar/char.

#31 Updated by Constantin Asofiei about 11 years ago

The attached update has the P2J sources merged with bzr revision 10160; javadocs are added, can be reviewed.

#32 Updated by Greg Shah about 11 years ago

Feedback:

1. The Progress documentation states that the default initializer for longchar is unknown value. But notes above state otherwise. If the Progress documentation is right, then longchar needs to be added to function is_unknown_init_var_def in common-progress.rules and the default initializer in variable_definitions.rules would not be "". Please confirm this and if the Progress documentation is wrong, put comments in both of those rulesets to note that fact.

2. Copyright date updates are needed in expressions.rules.

3. Should byteLength() remain in character since longchar can't be used on a byte basis?

Otherwise, I think this is ready for testing.

#33 Updated by Constantin Asofiei about 11 years ago

1 - although documentation states longchar inits to unknown, practice says different - testing showed that initial value is the same as character, empty string.
2 - ok, changed.
3 - actually, 4GL allows this code:

def var l as longchar.
message length(l, "raw").

So, byteLength must be in TextOps.

BTW, I didn't check all involved character functions for new parameters, so I can't tell if there is anything new in OE 10.

#34 Updated by Greg Shah about 11 years ago

The code changes look good. My only concern is that the issues 1 (default init) and 3 (byte length) should have the javadoc changed to explain that the Progress documentation is incorrect. Can you do that before you leave?

Eugenie is making a change on his update too, so we have a bit of time...

#35 Updated by Constantin Asofiei about 11 years ago

For byteLength, OK, I've changed the javadoc. But I don't understand what javadoc I should change related to default init - in converted code, a longchar variable is emitted and initialized as:

longchar l = new longchar("");

Idea is, the default c'tor is not used (this is the same how character vars get instantiated).

#36 Updated by Greg Shah about 11 years ago

Just put it in the following:

1. In the TRPL rules where it emits "" as the default.
2. In the class level Javadoc for longchar.

That should be sufficient.

#37 Updated by Constantin Asofiei about 11 years ago

OK, I've made the changes, the update is attached.

#38 Updated by Greg Shah about 11 years ago

I have applied it. There are some compile warnings:

[javac] /mnt/san/sata/gc/20121029/p2j/src/com/goldencode/p2j/util/TextOps.java:3361: warning: unmappable character for encoding UTF8
[javac] * &lt;li&gt; NFD � Canonical Decomposition
[javac] ^
[javac] /mnt/san/sata/gc/20121029/p2j/src/com/goldencode/p2j/util/TextOps.java:3362: warning: unmappable character for encoding UTF8
[javac] * &lt;li&gt; NFC � Canonical Decomposition, followed by Canonical Composition
[javac] ^
[javac] /mnt/san/sata/gc/20121029/p2j/src/com/goldencode/p2j/util/TextOps.java:3363: warning: unmappable character for encoding UTF8
[javac] * &lt;li&gt; NFKD � Compatibility Decomposition
[javac] ^
[javac] /mnt/san/sata/gc/20121029/p2j/src/com/goldencode/p2j/util/TextOps.java:3364: warning: unmappable character for encoding UTF8
[javac] * &lt;li&gt; NFKC � Compatibility Decomposition, followed by Canonical Composition
[javac] ^
[javac] /mnt/san/sata/gc/20121029/p2j/src/com/goldencode/p2j/util/TextOps.java:3365: warning: unmappable character for encoding UTF8
[javac] * &lt;li&gt; NONE � Returns the source string unchanged

This is just in the javadoc, so I think it can be fixed before check-in but without further testing.

#39 Updated by Constantin Asofiei about 11 years ago

I've fixed the javadoc problems in this update (you are right, no special testing is needed for these changes, as they are only in javadoc).

#40 Updated by Constantin Asofiei about 11 years ago

Beside the MAJIC-related changes (btw, MAJIC update should be attached here or to #1973?), there are a few other problems in generated code:
  1. a bug in operator.rules which leaked the chp_wrapper to other nodes on line 452 (needed to add protection for prog.plus)
  2. TextOps.concat can be set as parameter for lots of builtin functions/methods, so for now I think is best to let it return character, not Text. When we will add full longchar support, we will either have to find each and every case of builtin function/method which has a character parameter and change it to accept Text parameter instead (if it accepts longchar values too) or we will have to wrap each expression which returns Text (passed to a character parameter) with a new character c'tor.
  3. fields defined as character can be set to expressions which are evaluated using Text APIs (which return Text) or longchar variables, i.e.:
    def var ch as char.
    def temp-table tt1 field f1 as char field f2 as int.
    def var lc as longchar.
    
    overlay(tt1.f1, 1) = ch.
    tt1.f1 = lc.
    

    Thus, we will either have to wrap the parameter depending on the fields type, using new longchar or new character c'tor, or (the easiest way) would be to add setters which accept a Text value, for each character field.

#41 Updated by Greg Shah about 11 years ago

 I think is best to let it return character

That is fine for milestone 4.

After that:

we will either have to find each and every case of builtin function/method which has a character parameter and change it to accept Text parameter instead (if it accepts longchar values too)

Yes, this is the way.

And for longchar passed to user defined functions using character (and vice versa), we will just wrap in the proper type.

or (the easiest way) would be to add setters which accept a Text value, for each character field.

Yes and no. We would change the setter to take Text (just like the setter for both integer and decimal fields is always NumberType). We don't want to add new setters, the Text setter should be sufficient.

I've spoken with Eric on this and he agrees. If you need to make the change now, do it. Otherwise, we will make the related changes after milestone 4.

#42 Updated by Constantin Asofiei about 11 years ago

Update contains P2J fixes for:
  1. setter for character fields emit as Text
  2. operator.rules bug
  3. TextOps.concat returns character.

I will apply this to staging (plus the Majic changes) and run conversion there.

#43 Updated by Constantin Asofiei about 11 years ago

Fixed another problem, related to the fact that Text was missing a Text(Text) c'tor, to initialize it properly (character.duplicate is calling new character(character) which in turn called Text(BaseDataType), which sets the instance to unknown).

Applied to staging.

#44 Updated by Constantin Asofiei about 11 years ago

Passed regression testing, committed to bzr revision 10165.

#45 Updated by Constantin Asofiei about 11 years ago

  • % Done changed from 0 to 40

#46 Updated by Greg Shah about 11 years ago

  • Target version changed from Milestone 4 to Milestone 7

#47 Updated by Constantin Asofiei about 11 years ago

The server project has a case of:

def temp-table tt1 field f1 as char extent 10 field tt1-txt as char.

for each tt1:
   message tt1 f1[1] length(tt1,f1[1]).
end.

I think the purpose was length(tt1.f1[1]), and is just a typo, but as the code compiles OK in 4GL, I've added support for the second parameter of the LENGTH function.

#48 Updated by Constantin Asofiei about 11 years ago

Passed conversion regression testing, committed to bzr revision 10215.

#49 Updated by Constantin Asofiei about 11 years ago

Added missing TextOps.numEntries(String, String). I don't see a reason to put this through conversion regression testing. I think this is a side-effect of the EXPRESSION node problem, as this works:

message num-entries("aa", "bb"). /* both params get converted to character instances */

but this didn't:
def var i as int.
i = num-entries("aa", "bb"). /* both params get converted to string literals */

#50 Updated by Greg Shah about 11 years ago

I agree. The code is fine. Check it in and distribute it.

#51 Updated by Constantin Asofiei about 11 years ago

Committed to bzr revision 10259.

#52 Updated by Constantin Asofiei about 11 years ago

  • File ca_upd20130311c.zip added

Added more missing APIs. Committed to bzr revision 10276 was for task #1640

#53 Updated by Constantin Asofiei about 11 years ago

  • File deleted (ca_upd20130311c.zip)

#54 Updated by Constantin Asofiei about 11 years ago

This one is even more obscure than the IN WINDOW clause. The main issues are:
  1. character/longchar interchangeability. Note that this is related to integer/int64 and date/datetime/datetime-tz, when OUTPUT parameters are used (all cases should use some common approach). See notes 23 and 25.
  2. implement the character limitation related to length.
  3. the hard part will be to determine how the codepage and -cpinternal can be implemented (I suspect we could use "file.encoding" property to some extent).
  4. implement the codepage-related statements/functions:
    FIX-CODEPAGE(lc) = codepage
    IS-CODEPAGE-FIXED(lc)
    GET-CODEPAGE(lc)
    CODEPAGE-CONVERT(c/lc,target_codepage,source_codepage) - also for CHARACTER type but not implemented
    NORMALIZE(c/lc)
    
    • COPY-LOB is a challenge on its own
      For the above, an initial estimate is 60 hours.

Should we make a separate task for the SecurityOps stuff? I refer to these statements/functions (estimate is 8 hours for them):

ENCRYPT
GENERATE-PBE-KEY
GENERATE-PBE-SALT
SHA1-DIGEST
MDG-DIGEST

#55 Updated by Greg Shah about 11 years ago

We will deal with COPY-LOB in #2135, which is not needed for milestone 7.

#56 Updated by Greg Shah about 11 years ago

We will handle the hashing and encryption functions in #2136, which is not needed for milestone 7.

#57 Updated by Greg Shah about 11 years ago

  • Estimated time changed from 64.00 to 140.00
  • % Done changed from 40 to 60

The remaining work on this task are summarized in points 1 through 4 in note 54. They are estimated to take 60 hours.

#58 Updated by Eric Faulhaber almost 11 years ago

  • Assignee changed from Constantin Asofiei to Ovidiu Maxiniuc
  • Due date set to 08/01/2013

Assigned to Ovidiu to finish longchar support, beginning July 23.

#59 Updated by Greg Shah over 10 years ago

Is this work complete? If not, what exactly is left?

#60 Updated by Ovidiu Maxiniuc over 10 years ago

As far as I observed, 54.1 should already be at the same level as int/in64 and date/datetime.
54.3 must be investigated against 4GL/OE.
Quick looking over the code, I see that code-page support is mainly missing (including conversion of some parts).

I will start working on this issue tomorrow.

#61 Updated by Greg Shah over 10 years ago

Please just add a new task for the remaining work. Do not start work on it. We will schedule it for later.

#62 Updated by Greg Shah over 10 years ago

  • Status changed from WIP to Closed

#63 Updated by Greg Shah over 7 years ago

  • Target version changed from Milestone 7 to Runtime Support for Server Features

Also available in: Atom PDF