Project

General

Profile

Feature #1603

enhance name conversion to support internal procedure names that contain a '.' character

Added by Greg Shah over 11 years ago. Updated over 7 years ago.

Status:
New
Priority:
Normal
Target version:
-
Start date:
Due date:
% Done:

0%

Estimated time:
4.00 h
billable:
No
vendor_id:
GCD
version_reported:
version_resolved:

ca_upd20130308b.zip (13.8 KB) Constantin Asofiei, 03/08/2013 04:40 AM

evl_upd20130313a.zip - Frame and variable names collision fix first drop (35.5 KB) Eugenie Lyzenko, 03/13/2013 05:48 PM

evl_upd20130314a.zip - The modified ruleset to fix name duplication (35.5 KB) Eugenie Lyzenko, 03/14/2013 10:31 AM

diffs_1603.txt Magnifier (564 KB) Greg Shah, 03/14/2013 12:41 PM

evl_upd20130314b.zip - The new drop to fix name collisions (35.7 KB) Eugenie Lyzenko, 03/14/2013 04:38 PM

evl_upd20130315a.zip - The fix candidate for name collisions (35.7 KB) Eugenie Lyzenko, 03/15/2013 04:04 PM

evl_upd20130318a.zip - Name conflict fix with dictionary usage (36.3 KB) Eugenie Lyzenko, 03/18/2013 12:13 PM

History

#1 Updated by Greg Shah over 11 years ago

  • Target version set to Milestone 4

#2 Updated by Greg Shah over 11 years ago

  • Assignee set to Constantin Asofiei

#3 Updated by Constantin Asofiei over 11 years ago

I'm not sure what is needed here. The following converts nicely in P2J:

procedure proc.0.
   message "a".
end.

procedure p.r.o.c.0.
   message "b".
end.

run proc.0.

converted to:
   public void proc_0()
   public void p_r_o_c_0()
   ControlFlowOps.invoke("proc.0", (handle) null);

  <class-mapping jname="ProcNames" pname="proc_names.p">
    <method-mapping jname="proc_0" pname="proc.0" type="PROCEDURE"/>
    <method-mapping jname="p_r_o_c_0" pname="p.r.o.c.0" type="PROCEDURE"/>
  </class-mapping>

Is there something I'm missing ?

#4 Updated by Greg Shah over 11 years ago

Interesting. I wonder what is going on there. I can duplicate your result as well.

But when I used the testcases/uast/ project to run the main() in NameConverter:

java -classpath p2j/build/lib/p2j.jar com.goldencode.p2j.convert.NameConverter "my.proc" 2 true
my.proc ---> my.proc (Java method)

In this case '.' is completely ignored (and passed through). It seems to be working during the full conversion processing when it doesn't through the NameConverter directly.

...

OK, I found the place where we "patched" the name (in rules/annotations/procedures.rules):

         <action>
            javaname = names.convert(child.text, names.method, null)
         </action>
         <rule>javaname.indexOf(".") &gt;= 0
            <action>javaname = javaname.replaceAll("\.", "_")</action>
         </rule>

I guess this should do for now. I would have preferred the NameConverter to handle this for us.

Are there any implications in regards to the use of such names in event specifications (like for SUBSCRIBE)? Or will our runtime mapping processing be completely happy with the way we have things working now?

#5 Updated by Constantin Asofiei over 11 years ago

The only real problem is when converting the 4GL proc name, to make the name compatbile with Java syntax. As we don't emit direct procedure calls anymore, all hard-coded procedure names in RUN or SUBSCRIBE ... RUN-PROCEDURE emit as string literals, which represent the legacy 4GL name; the mapping knows both the legacy and converted name, so the converted name will be identified.

#6 Updated by Greg Shah over 11 years ago

  • Start date deleted (10/19/2012)
  • Target version changed from Milestone 4 to 24

OK, for now this is done. I am going to move this to another milestone. The only remaining work is to remove the hack in procedures.rules and make the NameConverter handle the '.' as a standard part of the syllable tokenization. This usage of '.' in a proc name is reasonably common so we want the name converter to handle it in a first class way.

#7 Updated by Constantin Asofiei over 11 years ago

This one did turn and bit us, because the server project has files named like "foo.bar.p", which get converted to "Foo.bar.java" (and "Foo.bar" class name). I guess we need to move the "." to NameConverter.

#8 Updated by Constantin Asofiei over 11 years ago

OK, looks like this addition in NamepConverte's static c'tor:

mappings.put(".", "dot");

would solve this. A question: are you sure it's safe to "eat" the "-", "_" and "/" character from the names? Because without collision protection, names may collide once these are removed.

#9 Updated by Greg Shah over 11 years ago

Go ahead with this change.

A question: are you sure it's safe to "eat" the "-", "_" and "/" character from the names? Because without collision protection, names may collide once these are removed.

Yes, we know this is "lossy". So far, we have taken the approach that we will fix naming collisions as we encounter them.

The real solution that I plan to have at some point: a centralized and project-wide name conversion and disambiguation facility. The idea is that this facility would have different namespaces (one for each of the target Java namespaces like labels, method names, variables and class names). Where there are potential collisions, this facility will detect these across the whole project. Examples of collisions:

1. Multiple 4GL namespaces are being collapsed into a single Java namepsace (e.g. streams, buffers, frames and variables are all in the Java variable namespace). Today we often handle this by brute force. For example, we always postfix stream (like rptStream) because it MIGHT be conflicting with other resources named "rpt".
2. The 4GL allows characters in names that are not allowed in Java (hyphen is the most obvious and ridiculous). This means that name conversion is lossy and can easily generate conflicts.
3. Filesystem case-sensitivity on the source system can yield multiple input files that have the same target output file.

This would have to be done in multiple steps, where all names of all resources are collected and stored in a first phase. These would need to be stored in a way that is easy to get back to the node that needs to get the converted name. Then the next phase would deterministically convert all the names, taking hints into account and disambiguating everything project wide. These names would then get pushed back into the nodes across the project.

Anyway, this is something I was trying to avoid for milestone 4.

#10 Updated by Constantin Asofiei over 11 years ago

This converts dots in names to "dot" string. I will put this through conversion.

#11 Updated by Constantin Asofiei over 11 years ago

Passed conversion regression testing, committed to bzr revision 10265.

#12 Updated by Constantin Asofiei over 11 years ago

I'm putting this case here, because is a name collision:

form i with frame f-1.
form i with frame f1.

must produce two distinct frames.

#13 Updated by Eugenie Lyzenko over 11 years ago

The question. Why adding replacement:

mappings.put("-", "hyphen");

in NameConverter.java map does not work? The name already modified at the time the name convertion worker is called? It is happening on annotation level, correct?

#14 Updated by Eugenie Lyzenko over 11 years ago

OK. I've found the answer for note #13. The matchlist.xml config file overrides the NameConverter.java internals. The question - do we prefer to adjust in config file or P2J engine?

Another question is do we need this modification only for frame and variable names or globally for all names encountered?

#15 Updated by Greg Shah over 11 years ago

If we have a substitution that we want to be the default, then we would always want to encode it in P2J and not in the configuration file. The config is for overrides only.

BUT, in the case of the "-" character, it is MUCH too common for us to implement "hyphen" as a replacement. It is used constantly and by far the best default is to use it to separate syllables (for camel-casing) but to delete it from the result.

In this case, we want to implement a disambiguation of the two frame names only. We don't want to affect all other name conversions or even all other frame name conversions.

Please write a solution in TRPL where the frame names are being converted. Detect when 2 (or more) frames in the same program that start with different 4GL names and end up with the same javaname. Then append an index number to the end of the names to differentiate them (like FrameF1_1 and FrameF1_2).

#16 Updated by Constantin Asofiei over 11 years ago

Another name collision, when vars are used:

def var c-1 as char.
def var c1 as char.

Both get converted to the same name, c1.

#17 Updated by Eugenie Lyzenko over 11 years ago

Another word we need to decide what are the exact cases when we need special processing for "-" character. For now I see the cases:
1. frame names.
2. variable names(and looks like widgets too)

What about the source files to be run as procedires like proc01.p and proc0-1.p?

#18 Updated by Greg Shah over 11 years ago

The strange thing is that I thought we already did some of this kind of disambiguation.

We don't want this to be specific to the hyphen character. There are other characters that can be dropped and which would cause name collisions. The code should be generic: any name collision should be fixed.

For now, just implement the variable and frame cases.

#19 Updated by Eugenie Lyzenko over 11 years ago

The places to modify:
for frames: rule file annotation/frame_scoping.rules, internal function "process_frame"

...
            <!-- convert the instance name -->
            <action>
               iname = names.convert(fname, names.variable, null)
            </action>
...

for variables: rule file annotation/variable_definitions.rules
...
         <action>
            javaname = names.convert(getNoteString("name"),
                                     names.variable,
                                     null)
         </action>
...

The suggested approach(per one *.p file processing):
1. We create temporary HashSet or HashMap (if we need to store original 4GL name).
2. Every generated java will be checked in this set to find if we already have the same.
3. If found, add new suffix "_X" where X - from 1 until first available new one. Check again if new suggested name already used. Repeat this step until new name is generated.
4. Store new name in the set for future checks.
5. Use the same approach for both frames and variables.

Is this OK?

#20 Updated by Eugenie Lyzenko over 11 years ago

The suggested fix has been uploaded. It properly converts the sample:

def var c1 as char.
def var c-1 as char.
def var i as int.

form i c1 with frame f1.
form i c-1 with frame f-1.

Converting c1 as c1 and c-1 as c1_1, f1 as NameMap0F1 f1Frame, f-1 as NameMap0F1_1 f1_1Frame.
The fix is working only for variable and frame names.

The only note is the first encountered name (not depending on has or dows not have the eaten character) will always be without suffix, so the order of the f1 and f-1 will define which one will get the suffix.

#21 Updated by Greg Shah over 11 years ago

Code Review:

Overall, it is very good. Some minor things:

1. In both files, please start with namesfx = 2. I think this is more natural for the 2nd instance of that var.

2. Why is this code in the variable_definitions.rules?

<action>putNote("javaname", sprintf("%s",javaname))</action>

Is the sprintf() needed? I'm not sure why it was added.

Make the changes, upload the final version and put this through conversion regression testing.

#22 Updated by Eugenie Lyzenko over 11 years ago

1. In both files, please start with namesfx = 2. I think this is more natural for the 2nd instance of that var.

OK.

2. Why is this code in the variable_definitions.rules?

> <action>putNote("javaname", sprintf("%s",javaname))</action>
> 

Is the sprintf() needed? I'm not sure why it was added.

Yes, this is not needed. I've forgot to remove this code when I detected the places to be modified, sorry. Fixed in update.

#23 Updated by Greg Shah over 11 years ago

Do you want me to conversion test this on devsrv01? It is much faster than lightning and will not impact the conversion running there.

#24 Updated by Eugenie Lyzenko over 11 years ago

Do you want me to conversion test this on devsrv01?

Yes, please run this on devsrv01.

#25 Updated by Greg Shah over 11 years ago

There are many diffs in Majic. Please review the following and determine if these are correct or not.

#26 Updated by Eugenie Lyzenko over 11 years ago

There are many diffs in Majic. Please review the following and determine if these are correct or not.

I think because in MAJIC we have no name collisions - there should be no differences in conversion. Right? The main issue with approach currenly implemented is: two same (in 4GL) local variables in the same file will be converted with different java names. For example in local function declaration. Generally this is not a problem but I do not like this result. So we need to store and check the 4GL name too to be sure the new artificial name with suffix will be constructed only for really different 4GL name. I'll prepare the new drop update shortly.

The question. The conversion test is the comparison for MAJIC converted result before and after update. Is it correct?

#27 Updated by Greg Shah over 11 years ago

The conversion test is the comparison for MAJIC converted result before and after update. Is it correct?

Yes.

I think because in MAJIC we have no name collisions - there should be no differences in conversion. Right?

Probably.

The main issue with approach currenly implemented is: two same (in 4GL) local variables in the same file will be converted with different java names. For example in local function declaration. Generally this is not a problem but I do not like this result. So we need to store and check the 4GL name too to be sure the new artificial name with suffix will be constructed only for really different 4GL name. I'll prepare the new drop update shortly.

I think the core issue is that we need to keep a ScopedSymbolDictionary not just a simple Map. A ScopedSymbolDictionary is basically a stack of HashMap instances. When we enter a new scope, we push a new map on the stack and do all processing inside that scope only on that map. When we leave the scope, we pop the map off.

I think the following rules can be used as an example (they are from scope_promotion.rules):

   <init-rules>
      <!-- initialize our main dictionary -->
      <rule>deleteDictionary("var_defs")</rule>

   </init-rules>  

   <walk-rules>

      <!-- add a scope for functions, internal procedures and triggers -->
      <rule>evalLib("name_scope_boundary") 
         <action>addDictionaryString("var_defs", "dname", dname)</action>
         <action>addScope("var_defs")</action>
      </rule>               

   </walk-rules>

   <ascent-rules>

      <!-- delete the scope for functions, internal procs and triggers -->
      <rule>evalLib("name_scope_boundary")                  
         <action>deleteScope("var_defs")</action>
         <action>dname = lookupDictionaryString("var_defs", "dname")</action>
      </rule>         

This example uses a dictionary. The dictionary is basically a stack of maps, so it is a pretty good fit. The only trick is that you should use the *AtScope() versions of the lookup methods. For example, lookupDictionaryStringAtScope(dictionary_name, scope, keyname) where scope == 0 (for the current scope). I think this will solve the problem, because these conflicts will no longer exist.

#28 Updated by Eugenie Lyzenko over 11 years ago

OK. I'll investigate the ScopedSymbolDictionary keeping approach. The next drop implements the Map usage and some minor logic changes.

#29 Updated by Eugenie Lyzenko over 11 years ago

The remaining conversion issue with recent changes is related to the name case sensitivity issue. Consider the following two lines:
dc/bl-jtcp.p.cache:

  define input  parameter  ipin_BatchId  as integer    no-undo.
  define input  parameter  ipin_BatchID  as integer    no-undo.

The conversion engine trying to convert both to the same:
            <annotation datatype="java.lang.String" key="javaname" value="ipinBatchId"/>

So these two names are considered as the same one. The question is do we need to ignore 4GL case sensitivity? Another word, are these names ipin_BatchId and ipin_BatchID the same for TRPL in source code? Or we need to make differences?

#30 Updated by Constantin Asofiei over 11 years ago

The question is do we need to ignore 4GL case sensitivity?

Please test your case in 4GL environment, and you will see that 4GL is not case sensitive. The two names represent the same parameter and 4GL will get you a compile error, if you use them in the same procedure.

Looking at the source you mentioned, I see that these are input parameters provided in two distinct internal procedures. What troubles you? The fact that the procedure's parameter gets promoted and collides with another proc's parameter, with the same name? If this is the case, you need to disambiguate between them, when the parameters/local variables get promoted.

#31 Updated by Eugenie Lyzenko over 11 years ago

What troubles you? The fact that the procedure's parameter gets promoted and collides with another proc's parameter, with the same name?

No. The case when we consider the names as different when they are really the same. This is what I have worried about.

OK. Making P2J conversion case insensitive gives conversion passing result. The new drop has been uploaded for review. And I would prefer not to complicate the code by adding Dictionary. This fix is just lexical handling, leaving other logic untouched, promote nothing etc and does nothing for really same 4GL identifiers.

#32 Updated by Constantin Asofiei over 11 years ago

Ok, I just lost a comment due to "redmine wants me to login again" so here is the short part. Your approach is too aggressive, as in this example:

def var i-1 as int.
message i-1.
procedure proc0.
   def var i1 as int.
   message i1.
end.

proc0's i1 var is renamed to i1_2. Please use the info provided by Greg on note 27 and add your namesused map to a scoped dictionary. The name_scope_boundary function should return true if current node is an internal procedure, function, trigger or block.

#33 Updated by Eugenie Lyzenko over 11 years ago

def var i-1 as int.
message i-1.
procedure proc0.
   def var i1 as int.
   message i1.
end.

The question is: do we really want in example you have provided i1 and i-1 will be converted to the absolutely same java name i1? In two different places.

#34 Updated by Greg Shah over 11 years ago

This is the reason we use the dictionary. The example in note 33 has 2 scopes. It will result in a parameter for the external procedure of the same name as a parameter for the internal procedure. That is OK. The internal procedure has its own scope which hides the outer parameter of the same name. We have to use the dictionary to duplicate the same scoping behavior. Please implement the dictionary approach.

#35 Updated by Eugenie Lyzenko over 11 years ago

I'm testing the implemented approach with dictionary usage. And I have a question. Do we need the same code to be implemented for frame names handling?

#36 Updated by Greg Shah over 11 years ago

Yes, frames are scoped too.

#37 Updated by Eugenie Lyzenko over 11 years ago

The new update with scoping support has been added for review. The case when we has separate handling for button, rectangle or browse is now also included in name resolution conflict. The conversion testing is now in progress.

#38 Updated by Eugenie Lyzenko over 11 years ago

The update 20130318a has passed the conversion testing, compared directories are /ddl and /src/aero/timco/majic.

#39 Updated by Greg Shah over 11 years ago

Check it in and distribute it.

#40 Updated by Eugenie Lyzenko over 11 years ago

The evl_upd20130318a.zip has been committed in bzr as 10296. As I note before it passed the conversion testing.

#41 Updated by Greg Shah over 7 years ago

  • Target version deleted (24)

Also available in: Atom PDF