Project

General

Profile

Feature #6457

finish COPY-LOB support

Added by Eric Faulhaber almost 2 years ago. Updated over 1 year ago.

Status:
WIP
Priority:
Normal
Target version:
-
Start date:
Due date:
% Done:

0%

billable:
No
vendor_id:
GCD

Related issues

Related to Base Language - Bug #6623: validate the memptr bytes in "copy-lob from memptr to clob/longchar" statement New

History

#1 Updated by Eric Faulhaber almost 2 years ago

The remaining work:

  • Encoding behavior needs better testing (at least #6623 must be fixed).
  • BOM handling for file source/target not implemented yet
  • Some remaining issues (see #4768-8)

#2 Updated by Greg Shah almost 2 years ago

  • Related to Bug #6623: validate the memptr bytes in "copy-lob from memptr to clob/longchar" statement added

#3 Updated by Greg Shah almost 2 years ago

  • Assignee set to Stanislav Lomany

#4 Updated by Greg Shah over 1 year ago

The BOM handling has a testcase already written. See testcases/copy_lob/tests/bom.p.

#5 Updated by Stanislav Lomany over 1 year ago

The BOM handling has a testcase already written. See testcases/copy_lob/tests/bom.p.

OK, the first issue I met in bom.p surprisingly was that the following sequence leads to invalid output file content in 4GL because it runs some conversion due to convert option. FWD doesn't do it and the output file is fine UTF-8.

inputLongchar = ?.
fix-codepage(inputLongchar) = 'UTF-8'.
inputLongchar = "A" + CHR(14844588, "UTF-8", "UTF-8") + "A".
copy-lob from inputLongchar to file 'copy_lob/output/out.tmp' convert target codepage 'UTF-8' no-error.

#6 Updated by Stanislav Lomany over 1 year ago

  • Status changed from New to WIP

Guys, I don't quite understand how 4GL produces output. Consider the following UTF-16 output example:

inputLongchar = ?.
fix-codepage(inputLongchar) = 'utf-8'.
inputLongchar = "A" + CHR(14844588, "UTF-8", "UTF-8") + "A".
copy-lob from inputLongchar to file 'copy_lob/output/out16.tmp' convert target codepage 'UTF-16' no-error.

4GL produces this for A€A:

ff fe 41 00 e2 00 82 00 ac 00 41 00   
BOM   A     Euro????          A  

While normal UTF-16 output is

ff fe 41 00 ac 20 41 00   
BOM   A     Euro  A

Do you have any idea what and how 4GL produces?

#7 Updated by Constantin Asofiei over 1 year ago

Stanislav, looking at the 14844588 value, in hex is e282ac. See this: https://community.progress.com/s/article/P181822 Please experiment with -cpstream and see what happens. More, in the docs https://documentation.progress.com/output/ua/OpenEdge_latest/pdsoe/PLUGINS_ROOT/com.openedge.pdt.langref.help/rfi1424920632352.html it states:

If either the source or the target object is a file, the target's code page defaults to -cpstream.

On a side note, test abends in OE on Windows if the target path does not exist on disk.

#8 Updated by Ovidiu Maxiniuc over 1 year ago

Stanislav Lomany wrote:

Do you have any idea what and how 4GL produces?

Try launching the 4GL using -cpinternal utf8 command-line parameter. This may shed some light on how it works. In my opinion, it assigns the representation in UTF-8 to longchar variable using some UTF to default CP conversion which breaks the content, ignoring the previous fix-codepage statement.

#9 Updated by Stanislav Lomany over 1 year ago

Guys, I've been trying to experiment with -cpstream and -cpinternal parameters and found that when it comes to setting them to UTF-16/UTF-32 I can set -cpstream to to UTF-16 and that's it. For other cases it produces "Case table for code page UTF-16 and case name BASIC was not found in convmap.cp (1038)" error message or silenlty abends.
Do you have any idea how to fix it?

#10 Updated by Ovidiu Maxiniuc over 1 year ago

I know, I had the same problems :(.
Try setting the same value for all these cp parameters, unless having them different is your goal.

#11 Updated by Stanislav Lomany over 1 year ago

Try setting the same value for all these cp parameters

Errors are the same for this case too.

#12 Updated by Stanislav Lomany over 1 year ago

I cannot change -cpstream/-cpinternal parameters on a customer's VM either, so I cannot experiment with it.
Moreover, testcases/copy_lob/tests/bom.p fails on this VM because output produced for UTF-8 is different (using default -cpstream/-cpinternal).

Greg, what should we do about these parameters if we cannot be sure how they work?

#13 Updated by Greg Shah over 1 year ago

I cannot change -cpstream/-cpinternal parameters on a customer's VM either, so I cannot experiment with it.

Is the testing issue caused by the virtual machine not having the needed codepages installed? Otherwise I don't understand why OE would not honor these command line specifications.

#14 Updated by Stanislav Lomany over 1 year ago

Is the testing issue caused by the virtual machine not having the needed codepages installed?

Some cases claim that "case table" for this particular case is missing in convmap.cp (quick googling didn't tell me how to add something to it). Some cases abend. And some lead to artefacts in screen output.

#15 Updated by Greg Shah over 1 year ago

Please provide the list of codepages/scenarios which need testing (and which you cannot test).

Marian: I think we need your team to do this testing.

#16 Updated by Marian Edu over 1 year ago

Greg Shah wrote:

Please provide the list of codepages/scenarios which need testing (and which you cannot test).

Marian: I think we need your team to do this testing.

Not sure what need to be tested here, the codepage/collation support might be different from a version to another and some customisation can be done but we've made some procedures that list:
  • the supported codepages/collation combinations: i18n/4gl/cp_collation.xml
  • the accepted conversion between codepages for codepage-convert: i18n/4gl/cp_conversion.xml
  • the accepted conversion between codepages for chr: i18n/4gl/chr_conversion.xml
  • the accepted conversion between codepages for input stream: i18n/4gl/is_conversion.xml

#17 Updated by Stanislav Lomany over 1 year ago

i18n/4gl/cp_collation.xml i18n/4gl/cp_conversion.xml i18n/4gl/chr_conversion.xml i18n/4gl/is_conversion.xml

Marian, these are theoretically allowed conversions. The problem is that I cannot set -cpstream/-cpinternal parameters to test how they work. I either get an error that "case table" for this particular case is missing in convmap.cp or it just abends.
Do you think you'll be able to set -cpstream/-cpinternal with a working result? If so, I'll make a testcase to be tested with all -cpstream/-cpinternal combinations.

#18 Updated by Marian Edu over 1 year ago

Stanislav Lomany wrote:

i18n/4gl/cp_collation.xml i18n/4gl/cp_conversion.xml i18n/4gl/chr_conversion.xml i18n/4gl/is_conversion.xml

Marian, these are theoretically allowed conversions. The problem is that I cannot set -cpstream/-cpinternal parameters to test how they work. I either get an error that "case table" for this particular case is missing in convmap.cp or it just abends.

What are the values you're trying to use for the codepages/collation tables - for cpstream and cpinternal?

Do you think you'll be able to set -cpstream/-cpinternal with a working result? If so, I'll make a testcase to be tested with all -cpstream/-cpinternal combinations.

Yes, the idea is that if the combination is valid then there shouldn't be any error although is quite rare to have different codepages exactly because of the conversion overhead - the client uses the codepage appropriate for the user, the server often use the same codepage as client or at times some UTF if the database needs to support various codepages.

#19 Updated by Stanislav Lomany over 1 year ago

What are the values you're trying to use for the codepages/collation tables - for cpstream and cpinternal?

I want to test this ones:
UTF-8
UTF-16
UTF-16LE
UTF-16BE
UTF-32
UTF-32LE
UTF-32BE

Also available in: Atom PDF