Project

General

Profile

Database Collation

At the time of this writing, the database most heavily tested with FWD is PostgreSQL, so this discussion will focus on a comparison of the Progress 4GL basic collation with the default collation of a PostgreSQL database, given the same combination of language and code page. To the author's knowledge, the default database collation in the 4GL (i.e., "basic" collation) for some of the most commonly used code pages does not match any default collation supported by the PostgreSQL RDBMS for those code pages.

Methodology

The comparison of collation behavior between the 4GL and PostgreSQL is based on collated output generated programmatically for a given code page in a simple table of the following form:

Field/Column Description General Type 4GL Type PostgreSQL Type
codepoint integer integer int
encoded character (case-sensitive) text character (case sensitive) text
encoded character (case-insensitive) text character citext

A database is created in each environment, using the same code page and default collation for a particular language ("basic" collation in the 4GL case). The integer code points 1 through 255 are iterated and inserted into the codepoint field/column. For each record, the code point's corresponding, encoded character representation is inserted into both the case-sensitive and case-insensitive columns. 1

Several queries are then executed against this data to produce comparable output. The queries are executed in pairs, sorted as follows:

  • primary sort by encoded character field (whether case-sensitive or case-insensitive) ascending; secondary sort by codepoint ascending;
  • primary sort by encoded character field (whether case-sensitive or case-insensitive) ascending; secondary sort by codepoint descending;

The purpose of the primary sort is fairly obvious. The secondary sort serves two purposes. It makes the sort deterministic for characters which are collated with identical weight; otherwise, the order of these will be indeterminate. Using the secondary sort in opposite directions between the two queries allows one to identify those groups, or "bands" of encoded characters which are collated with identical weight. These bands become apparent when comparing the output of the two queries, as the records within those bands will have their order inverted in the second query, compared to the first. Unless specifically presenting the inverted sort to illustrate the identification of a band of identically weighted characters, the samples of output shown in the discussions below will be those with an ascending, secondary sort.

Notes:
  1. In the 4GL case, each encoded character actually was injected between the constant characters a and z, so that the SPACE character would not be treated as a trailing space and trimmed when compared to other records. Thus, the true sort of the character between the constant characters could be observed. On output, the surrounding constants were stripped away, so only the encoded character was written.

Comparison of OE ISO-8859-1 basic collation vs. PostgreSQL ISO-8859-1 default collation (en_US)

Case-Sensitive

4GL

The 4GL basic collation for this codepage sorts in codepoint order up through code point 64:

 codepoint src | codepoint tgt | hex codepoint | character 
---------------+---------------+---------------+-----------
             0 |             0 |            00 | <NULL>    
             1 |             1 |            01 | <?>       
             2 |             2 |            02 | <?>       
             3 |             3 |            03 | <?>       
             4 |             4 |            04 | <?>       
             5 |             5 |            05 | <?>       
             6 |             6 |            06 | <?>       
             7 |             7 |            07 | <?>       
             8 |             8 |            08 | <?>       
             9 |             9 |            09 | <?>       
            10 |            10 |            0A | <LF>      
            11 |            11 |            0B | <?>       
            12 |            12 |            0C | <?>       
            13 |            13 |            0D | <CR>      
            14 |            14 |            0E | <?>       
            15 |            15 |            0F | <?>       
            16 |            16 |            10 | <?>       
            17 |            17 |            11 | <?>       
            18 |            18 |            12 | <?>       
            19 |            19 |            13 | <?>       
            20 |            20 |            14 | <?>       
            21 |            21 |            15 | <?>       
            22 |            22 |            16 | <?>       
            23 |            23 |            17 | <?>       
            24 |            24 |            18 | <?>       
            25 |            25 |            19 | <?>       
            26 |            26 |            1A | <?>       
            27 |            27 |            1B | <?>       
            28 |            28 |            1C | <?>       
            29 |            29 |            1D | <?>       
            30 |            30 |            1E | <?>       
            31 |            31 |            1F | <?>       
            32 |            32 |            20 | <SP>      
            33 |            33 |            21 | !         
            34 |            34 |            22 | "         
            35 |            35 |            23 | #         
            36 |            36 |            24 | $         
            37 |            37 |            25 | %         
            38 |            38 |            26 | &         
            39 |            39 |            27 | '         
            40 |            40 |            28 | (         
            41 |            41 |            29 | )         
            42 |            42 |            2A | *         
            43 |            43 |            2B | +         
            44 |            44 |            2C | ,         
            45 |            45 |            2D | -         
            46 |            46 |            2E | .         
            47 |            47 |            2F | /         
            48 |            48 |            30 | 0         
            49 |            49 |            31 | 1         
            50 |            50 |            32 | 2         
            51 |            51 |            33 | 3         
            52 |            52 |            34 | 4         
            53 |            53 |            35 | 5         
            54 |            54 |            36 | 6         
            55 |            55 |            37 | 7         
            56 |            56 |            38 | 8         
            57 |            57 |            39 | 9         
            58 |            58 |            3A | :         
            59 |            59 |            3B | ;         
            60 |            60 |            3C | <         
            61 |            61 |            3D | =         
            62 |            62 |            3E | >         
            63 |            63 |            3F | ?         
            64 |            64 |            40 | @         

This includes many non-printing/control characters, punctuation, symbols, and digits 0-9.

The 4GL sorts the capital letters of the alphabet next, including all alphabetical characters with diacritical marks:

            65 |            65 |            41 | A         
           192 |           192 |            C0 | À         
           193 |           193 |            C1 | Á         
           194 |           194 |            C2 | Â         
           195 |           195 |            C3 | Ã         
           196 |           196 |            C4 | Ä         
            66 |            66 |            42 | B         
            67 |            67 |            43 | C         
           199 |           199 |            C7 | Ç         
            68 |            68 |            44 | D         
            69 |            69 |            45 | E         
           200 |           200 |            C8 | È         
           201 |           201 |            C9 | É         
           202 |           202 |            CA | Ê         
           203 |           203 |            CB | Ë         
            70 |            70 |            46 | F         
            71 |            71 |            47 | G         
            72 |            72 |            48 | H         
            73 |            73 |            49 | I         
           204 |           204 |            CC | Ì         
           205 |           205 |            CD | Í         
           206 |           206 |            CE | Î         
           207 |           207 |            CF | Ï         
            74 |            74 |            4A | J         
            75 |            75 |            4B | K         
            76 |            76 |            4C | L         
            77 |            77 |            4D | M         
            78 |            78 |            4E | N         
           209 |           209 |            D1 | Ñ         
            79 |            79 |            4F | O         
           210 |           210 |            D2 | Ò         
           211 |           211 |            D3 | Ó         
           212 |           212 |            D4 | Ô         
           213 |           213 |            D5 | Õ         
           214 |           214 |            D6 | Ö         
            80 |            80 |            50 | P         
            81 |            81 |            51 | Q         
            82 |            82 |            52 | R         
            83 |            83 |            53 | S         
           223 |           223 |            DF | ß         
            84 |            84 |            54 | T         
            85 |            85 |            55 | U         
           217 |           217 |            D9 | Ù         
           218 |           218 |            DA | Ú         
           219 |           219 |            DB | Û         
           220 |           220 |            DC | Ü         
            86 |            86 |            56 | V         
            87 |            87 |            57 | W         
            88 |            88 |            58 | X         
            89 |            89 |            59 | Y         
           221 |           221 |            DD | Ý         
            90 |            90 |            5A | Z         

It is worth noting that for each letter, variants with accents/diacritical marks sort with the same weight. Comparing the opposite, secondary sorts by codepoint, we can see this. For example, sorting primarily by character ascending, and secondarily by codepoint ascending yields the following for variants of A:

            65 |            65 |            41 | A         
           192 |           192 |            C0 | À         
           193 |           193 |            C1 | Á         
           194 |           194 |            C2 | Â         
           195 |           195 |            C3 | Ã         
           196 |           196 |            C4 | Ä         

Reversing the secondary sort to descending by codepoint yields:

           196 |           196 |            C4 | Ä         
           195 |           195 |            C3 | Ã         
           194 |           194 |            C2 | Â         
           193 |           193 |            C1 | Á         
           192 |           192 |            C0 | À         
            65 |            65 |            41 | A         

Next, there is a small block of symbols:

            91 |            91 |            5B | [         
            92 |            92 |            5C | \         
            93 |            93 |            5D | ]         
            94 |            94 |            5E | ^         
            95 |            95 |            5F | _         
            96 |            96 |            60 | `         

This is followed by the lowercase alphabet. The same rule regarding accented characters applies within this run of characters:

            97 |            97 |            61 | a         
           224 |           224 |            E0 | à         
           225 |           225 |            E1 | á         
           226 |           226 |            E2 | â         
           227 |           227 |            E3 | ã         
           228 |           228 |            E4 | ä         
            98 |            98 |            62 | b         
            99 |            99 |            63 | c         
           231 |           231 |            E7 | ç         
           100 |           100 |            64 | d         
           101 |           101 |            65 | e         
           232 |           232 |            E8 | è         
           233 |           233 |            E9 | é         
           234 |           234 |            EA | ê         
           235 |           235 |            EB | ë         
           102 |           102 |            66 | f         
           103 |           103 |            67 | g         
           104 |           104 |            68 | h         
           105 |           105 |            69 | i         
           236 |           236 |            EC | ì         
           237 |           237 |            ED | í         
           238 |           238 |            EE | î         
           239 |           239 |            EF | ï         
           106 |           106 |            6A | j         
           107 |           107 |            6B | k         
           108 |           108 |            6C | l         
           109 |           109 |            6D | m         
           110 |           110 |            6E | n         
           241 |           241 |            F1 | ñ         
           111 |           111 |            6F | o         
           242 |           242 |            F2 | ò         
           243 |           243 |            F3 | ó         
           244 |           244 |            F4 | ô         
           245 |           245 |            F5 | õ         
           246 |           246 |            F6 | ö         
           112 |           112 |            70 | p         
           113 |           113 |            71 | q         
           114 |           114 |            72 | r         
           115 |           115 |            73 | s         
           116 |           116 |            74 | t         
           117 |           117 |            75 | u         
           249 |           249 |            F9 | ù         
           250 |           250 |            FA | ú         
           251 |           251 |            FB | û         
           252 |           252 |            FC | ü         
           118 |           118 |            76 | v         
           119 |           119 |            77 | w         
           120 |           120 |            78 | x         
           121 |           121 |            79 | y         
           253 |           253 |            FD | ý         
           255 |           255 |            FF | ÿ         
           122 |           122 |            7A | z         

This is followed by a jumble of additional symbols (including currency), non-printing characters, a handful more alphabetic characters with accents, and numeric fractions and subscripts in no discernable order.

           123 |           123 |            7B | {         
           124 |           124 |            7C | |         
           125 |           125 |            7D | }         
           126 |           126 |            7E | ~         
           127 |           127 |            7F |          
           229 |           229 |            E5 | å         
           197 |           197 |            C5 | Å         
           230 |           230 |            E6 | æ         
           198 |           198 |            C6 | Æ         
           248 |           248 |            F8 | ø         
           163 |           163 |            A3 | £         
           216 |           216 |            D8 | Ø         
           159 |           159 |            9F | <?>       
           215 |           215 |            D7 | ×         
           191 |           191 |            BF | ¿         
           189 |           189 |            BD | ½         
           188 |           188 |            BC | ¼         
           161 |           161 |            A1 | ¡         
           171 |           171 |            AB | «         
           187 |           187 |            BB | »         
           128 |           128 |            80 | <?>       
           129 |           129 |            81 | <?>       
           130 |           130 |            82 | <?>       
           131 |           131 |            83 | <?>       
           132 |           132 |            84 | <?>       
           169 |           169 |            A9 | ©         
           133 |           133 |            85 | <?>       
           134 |           134 |            86 | <?>       
           135 |           135 |            87 | <?>       
           136 |           136 |            88 | <?>       
           162 |           162 |            A2 | ¢         
           165 |           165 |            A5 | ¥         
           137 |           137 |            89 | <?>       
           138 |           138 |            8A | <?>       
           139 |           139 |            8B | <?>       
           140 |           140 |            8C | <?>       
           141 |           141 |            8D | <?>       
           142 |           142 |            8E | <?>       
           143 |           143 |            8F | <?>       
           174 |           174 |            AE | ®         
           172 |           172 |            AC | ¬         
           144 |           144 |            90 | <?>       
           145 |           145 |            91 | <?>       
           146 |           146 |            92 | <?>       
           147 |           147 |            93 | <?>       
           148 |           148 |            94 | <?>       
           149 |           149 |            95 | <?>       
           150 |           150 |            96 | <?>       
           164 |           164 |            A4 | ¤         
           240 |           240 |            F0 | ð         
           208 |           208 |            D0 | Ð         
           151 |           151 |            97 | <?>       
           152 |           152 |            98 | <?>       
           153 |           153 |            99 | <?>       
           154 |           154 |            9A | <?>       
           155 |           155 |            9B | <?>       
           166 |           166 |            A6 | ¦         
           156 |           156 |            9C | <?>       
           170 |           170 |            AA | ª         
           186 |           186 |            BA | º         
           181 |           181 |            B5 | µ         
           254 |           254 |            FE | þ         
           222 |           222 |            DE | Þ         
           175 |           175 |            AF | ¯         
           180 |           180 |            B4 | ´         
           173 |           173 |            AD | ­         
           177 |           177 |            B1 | ±         
           157 |           157 |            9D | <?>       
           190 |           190 |            BE | ¾         
           182 |           182 |            B6 | ¶         
           167 |           167 |            A7 | §         
           247 |           247 |            F7 | ÷         
           184 |           184 |            B8 | ¸         
           176 |           176 |            B0 | °         
           168 |           168 |            A8 | ¨         
           183 |           183 |            B7 | ·         
           185 |           185 |            B9 | ¹         
           179 |           179 |            B3 | ³         
           178 |           178 |            B2 | ²         
           158 |           158 |            9E | <?>

Non-breaking space is always sorted last.

           160 |           160 |            A0 | <NBSP>    

To summarize the 4GL case-sensitive, basic collation for ISO-8859-1 at a high level:

  1. NULL first
  2. Non-printing/control characters, punctuation, space, symbols, and digits in ASCII order through codepoint 64
  3. Uppercase alphabetic letters, including those with accents/diacritical marks (which each have the same sorting weight for the same, base letter)
  4. A few more symbols
  5. Lowercase alphabetic letters, including those with accents/diacritical marks (which each have the same sorting weight for the same, base letter)
  6. Additional symbols (including currency) and non-printing characters
  7. Non-breaking space last

PostgreSQL

The database was created in a cluster built using the en_US.iso88591 locale provided by Ubuntu 16.04:

   Name    |  Owner   | Encoding |    Collate     |     Ctype      |   Access privileges   |  Size   | Tablespace |                Description                 
-----------+----------+----------+----------------+----------------+-----------------------+---------+------------+--------------------------------------------
 fwd       | postgres | LATIN1   | en_US.iso88591 | en_US.iso88591 |                       | 7104 kB | pg_default | 

LATIN1 is an alias for ISO-8859-1. Note that "Collate" is reported as en_US.iso88591, meaning that the collation behavior in this case comes from how the libc library works with the operating system's collation, defined by the en_US.iso88591 locale.

The default PostgreSQL collation for this codepage sorts all of the following characters before codepoint 1:

 codepoint src | codepoint tgt | hex codepoint | character 
---------------+---------------+---------------+-----------
           180 |           180 |          00B4 | ´         
            96 |            96 |          0060 | `         
            94 |            94 |          005E | ^         
           168 |           168 |          00A8 | ¨         
           126 |           126 |          007E | ~         
           247 |           247 |          00F7 | ÷         
           215 |           215 |          00D7 | ×         
            60 |            60 |          003C | <         
            61 |            61 |          003D | =         
            62 |            62 |          003E | >         
           172 |           172 |          00AC | ¬         
           124 |           124 |          007C | |         
           166 |           166 |          00A6 | ¦         
           176 |           176 |          00B0 | °         
           181 |           181 |          00B5 | µ         
            32 |            32 |          0020 | <SP>      
            95 |            95 |          005F | _         
           175 |           175 |          00AF | ¯         
           173 |           173 |          00AD | ­         
            45 |            45 |          002D | -         
            44 |            44 |          002C | ,         
            59 |            59 |          003B | ;         
            58 |            58 |          003A | :         
            33 |            33 |          0021 | !         
           161 |           161 |          00A1 | ¡         
            63 |            63 |          003F | ?         
           191 |           191 |          00BF | ¿         
            47 |            47 |          002F | /         
            46 |            46 |          002E | .         
           183 |           183 |          00B7 | ·         
           184 |           184 |          00B8 | ¸         
            39 |            39 |          0027 | '         
            34 |            34 |          0022 | "         
           171 |           171 |          00AB | «         
           187 |           187 |          00BB | »         
            40 |            40 |          0028 | (         
            41 |            41 |          0029 | )         
            91 |            91 |          005B | [         
            93 |            93 |          005D | ]         
           123 |           123 |          007B | {         
           125 |           125 |          007D | }         
           167 |           167 |          00A7 | §         
           182 |           182 |          00B6 | ¶         
           169 |           169 |          00A9 | ©         
           174 |           174 |          00AE | ®         
            64 |            64 |          0040 | @         
           164 |           164 |          00A4 | ¤         
           162 |           162 |          00A2 | ¢         
            36 |            36 |          0024 | $         
           163 |           163 |          00A3 | £         
           165 |           165 |          00A5 | ¥         
            42 |            42 |          002A | *         
            92 |            92 |          005C | \         
            38 |            38 |          0026 | &         
            35 |            35 |          0023 | #         
            37 |            37 |          0025 | %         
            43 |            43 |          002B | +         
           177 |           177 |          00B1 | ±         

These are punctuation marks and symbols (apologies, cutting and pasting into Redmine is losing some of the correct glyphs), including currency symbols.

Next, we have a block of non-printing/control characters in codepoint order from codepoint 1 through 31:

             1 |             1 |          0001 | <?>       
             2 |             2 |          0002 | <?>       
             3 |             3 |          0003 | <?>       
             4 |             4 |          0004 | <?>       
             5 |             5 |          0005 | <?>       
             6 |             6 |          0006 | <?>       
             7 |             7 |          0007 | <?>       
             8 |             8 |          0008 | <?>       
             9 |             9 |          0009 | <?>       
            10 |            10 |          000A | <LF>      
            11 |            11 |          000B | <?>       
            12 |            12 |          000C | <?>       
            13 |            13 |          000D | <CR>      
            14 |            14 |          000E | <?>       
            15 |            15 |          000F | <?>       
            16 |            16 |          0010 | <?>       
            17 |            17 |          0011 | <?>       
            18 |            18 |          0012 | <?>       
            19 |            19 |          0013 | <?>       
            20 |            20 |          0014 | <?>       
            21 |            21 |          0015 | <?>       
            22 |            22 |          0016 | <?>       
            23 |            23 |          0017 | <?>       
            24 |            24 |          0018 | <?>       
            25 |            25 |          0019 | <?>       
            26 |            26 |          001A | <?>       
            27 |            27 |          001B | <?>       
            28 |            28 |          001C | <?>       
            29 |            29 |          001D | <?>       
            30 |            30 |          001E | <?>       
            31 |            31 |          001F | <?>       

This is followed by another block of non-printing characters in codepoint order from 127 through 159:

           127 |           127 |          007F |          
           128 |           128 |          0080 | <?>       
           129 |           129 |          0081 | <?>       
           130 |           130 |          0082 | <?>       
           131 |           131 |          0083 | <?>       
           132 |           132 |          0084 | <?>       
           133 |           133 |          0085 | <?>       
           134 |           134 |          0086 | <?>       
           135 |           135 |          0087 | <?>       
           136 |           136 |          0088 | <?>       
           137 |           137 |          0089 | <?>       
           138 |           138 |          008A | <?>       
           139 |           139 |          008B | <?>       
           140 |           140 |          008C | <?>       
           141 |           141 |          008D | <?>       
           142 |           142 |          008E | <?>       
           143 |           143 |          008F | <?>       
           144 |           144 |          0090 | <?>       
           145 |           145 |          0091 | <?>       
           146 |           146 |          0092 | <?>       
           147 |           147 |          0093 | <?>       
           148 |           148 |          0094 | <?>       
           149 |           149 |          0095 | <?>       
           150 |           150 |          0096 | <?>       
           151 |           151 |          0097 | <?>       
           152 |           152 |          0098 | <?>       
           153 |           153 |          0099 | <?>       
           154 |           154 |          009A | <?>       
           155 |           155 |          009B | <?>       
           156 |           156 |          009C | <?>       
           157 |           157 |          009D | <?>       
           158 |           158 |          009E | <?>       
           159 |           159 |          009F | <?>       

Next are digits 0-9, but interleaved with fractions and subscripts in a sensible order:

            48 |            48 |          0030 | 0         
           188 |           188 |          00BC | ¼         
           189 |           189 |          00BD | ½         
           190 |           190 |          00BE | ¾         
            49 |            49 |          0031 | 1         
           185 |           185 |          00B9 | ¹         
            50 |            50 |          0032 | 2         
           178 |           178 |          00B2 | ²         
            51 |            51 |          0033 | 3         
           179 |           179 |          00B3 | ³         
            52 |            52 |          0034 | 4         
            53 |            53 |          0035 | 5         
            54 |            54 |          0036 | 6         
            55 |            55 |          0037 | 7         
            56 |            56 |          0038 | 8         
            57 |            57 |          0039 | 9         

This is followed by the alphabet, including accented characters and those with diacritical marks:

            97 |            97 |          0061 | a         
            65 |            65 |          0041 | A         
           225 |           225 |          00E1 | á         
           193 |           193 |          00C1 | Á         
           224 |           224 |          00E0 | à         
           192 |           192 |          00C0 | À         
           226 |           226 |          00E2 | â         
           194 |           194 |          00C2 | Â         
           229 |           229 |          00E5 | å         
           197 |           197 |          00C5 | Å         
           228 |           228 |          00E4 | ä         
           196 |           196 |          00C4 | Ä         
           227 |           227 |          00E3 | ã         
           195 |           195 |          00C3 | Ã         
           170 |           170 |          00AA | ª         
           230 |           230 |          00E6 | æ         
           198 |           198 |          00C6 | Æ         
            98 |            98 |          0062 | b         
            66 |            66 |          0042 | B         
            99 |            99 |          0063 | c         
            67 |            67 |          0043 | C         
           231 |           231 |          00E7 | ç         
           199 |           199 |          00C7 | Ç         
           100 |           100 |          0064 | d         
            68 |            68 |          0044 | D         
           240 |           240 |          00F0 | ð         
           208 |           208 |          00D0 | Ð         
           101 |           101 |          0065 | e         
            69 |            69 |          0045 | E         
           233 |           233 |          00E9 | é         
           201 |           201 |          00C9 | É         
           232 |           232 |          00E8 | è         
           200 |           200 |          00C8 | È         
           234 |           234 |          00EA | ê         
           202 |           202 |          00CA | Ê         
           235 |           235 |          00EB | ë         
           203 |           203 |          00CB | Ë         
           102 |           102 |          0066 | f         
            70 |            70 |          0046 | F         
           103 |           103 |          0067 | g         
            71 |            71 |          0047 | G         
           104 |           104 |          0068 | h         
            72 |            72 |          0048 | H         
           105 |           105 |          0069 | i         
            73 |            73 |          0049 | I         
           237 |           237 |          00ED | í         
           205 |           205 |          00CD | Í         
           236 |           236 |          00EC | ì         
           204 |           204 |          00CC | Ì         
           238 |           238 |          00EE | î         
           206 |           206 |          00CE | Î         
           239 |           239 |          00EF | ï         
           207 |           207 |          00CF | Ï         
           106 |           106 |          006A | j         
            74 |            74 |          004A | J         
           107 |           107 |          006B | k         
            75 |            75 |          004B | K         
           108 |           108 |          006C | l         
            76 |            76 |          004C | L         
           109 |           109 |          006D | m         
            77 |            77 |          004D | M         
           110 |           110 |          006E | n         
            78 |            78 |          004E | N         
           241 |           241 |          00F1 | ñ         
           209 |           209 |          00D1 | Ñ         
           111 |           111 |          006F | o         
            79 |            79 |          004F | O         
           243 |           243 |          00F3 | ó         
           211 |           211 |          00D3 | Ó         
           242 |           242 |          00F2 | ò         
           210 |           210 |          00D2 | Ò         
           244 |           244 |          00F4 | ô         
           212 |           212 |          00D4 | Ô         
           246 |           246 |          00F6 | ö         
           214 |           214 |          00D6 | Ö         
           245 |           245 |          00F5 | õ         
           213 |           213 |          00D5 | Õ         
           248 |           248 |          00F8 | ø         
           216 |           216 |          00D8 | Ø         
           186 |           186 |          00BA | º         
           112 |           112 |          0070 | p         
            80 |            80 |          0050 | P         
           113 |           113 |          0071 | q         
            81 |            81 |          0051 | Q         
           114 |           114 |          0072 | r         
            82 |            82 |          0052 | R         
           115 |           115 |          0073 | s         
            83 |            83 |          0053 | S         
           223 |           223 |          00DF | ß         
           116 |           116 |          0074 | t         
            84 |            84 |          0054 | T         
           117 |           117 |          0075 | u         
            85 |            85 |          0055 | U         
           250 |           250 |          00FA | ú         
           218 |           218 |          00DA | Ú         
           249 |           249 |          00F9 | ù         
           217 |           217 |          00D9 | Ù         
           251 |           251 |          00FB | û         
           219 |           219 |          00DB | Û         
           252 |           252 |          00FC | ü         
           220 |           220 |          00DC | Ü         
           118 |           118 |          0076 | v         
            86 |            86 |          0056 | V         
           119 |           119 |          0077 | w         
            87 |            87 |          0057 | W         
           120 |           120 |          0078 | x         
            88 |            88 |          0058 | X         
           121 |           121 |          0079 | y         
            89 |            89 |          0059 | Y         
           253 |           253 |          00FD | ý         
           221 |           221 |          00DD | Ý         
           255 |           255 |          00FF | ÿ         
           122 |           122 |          007A | z         
            90 |            90 |          005A | Z         

Note that for each letter (including each accented form), the lowercase character sorts first, then the uppercase form of the same character sorts next. The accented forms sort consistently whether the secondary sort by codepoint is ascending or descending.

In fact, it should be noted that no two codepoints in this codepage sort with the same weight; the collation of every code point w.r.t. the others is deterministic.

Next, there is a small block of two forms of the same character:

           254 |           254 |          00FE | þ         
           222 |           222 |          00DE | Þ         

As with the 4GL collation, this collation ends with non-breaking space:

           160 |           160 |          00A0 | <NBSP>    

Please note that the NULL character (codepoint 0x00) is missing from the PostgreSQL test output, because inserting this character from JDBC raised an error:

java.sql.BatchUpdateException: Batch entry 0 insert into tt1(codepoint_ori, encoded_cs, encoded_ci) values (0, '', '') was aborted: ERROR: invalid byte sequence for encoding "UTF8": 0x00  Call getNextException to see other errors in the batch.
    at org.postgresql.jdbc.BatchResultHandler.handleCompletion(BatchResultHandler.java:166)
    at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:492)
    at org.postgresql.jdbc.PgStatement.executeBatch(PgStatement.java:840)
    at org.postgresql.jdbc.PgPreparedStatement.executeBatch(PgPreparedStatement.java:1538)
    at sandbox.CollationTest.main(CollationTest.java:424)
Caused by: org.postgresql.util.PSQLException: ERROR: invalid byte sequence for encoding "UTF8": 0x00
    at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2440)
    at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2183)
    at org.postgresql.core.v3.QueryExecutorImpl.flushIfDeadlockRisk(QueryExecutorImpl.java:1299)
    at org.postgresql.core.v3.QueryExecutorImpl.sendQuery(QueryExecutorImpl.java:1324)
    at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:467)
    ... 3 more

To summarize this collation:

  1. Punctuation marks and symbols (including currency symbols and the SPACE character)
  2. Non-printing/control characters in codepoint order from codepoint 1 through 31
  3. Non-printing characters in codepoint order from 127 through 159
  4. Digits 0-9, interleaved with fractions and subscripts
  5. Alphabetic characters, including accent/diacritical marks, sorting lowercase first for each letter variant
  6. Two forms of the "thorn" character
  7. Non-breaking space

Comparison Notes

Looking at the summaries for each collation provided above, we can see that major categories of characters are organized very differently between the basic 4GL collation and the default PostgreSQL collation.

The 4GL collates accented forms of a letter with the same weight as every other form of that letter, including the base letter.

All uppercase letters sort before all lowercase letters in the 4GL collation. PostgreSQL sorts each lowercase letter before each associated uppercase letter. So, it is important to note that in case-sensitive collations, these two systems sort uppercase and lowercase in the opposite directions.

One thing that cannot be seen here is that SQL NULL (i.e., not the NULL character 0x00, but the absence of data in the SQL sense or unknown value in the 4GL sense), sorts before non-NULL data in both systems.

Case-Insensitive

4GL

Just as the case-sensitive collation for the same code page, the 4GL case-insensitive, basic collation for this codepage sorts in codepoint order up through code point 64:

 codepoint src | codepoint tgt | hex codepoint | character 
---------------+---------------+---------------+-----------
             0 |             0 |            00 | <NULL>    
             1 |             1 |            01 | <?>       
             2 |             2 |            02 | <?>       
             3 |             3 |            03 | <?>       
             4 |             4 |            04 | <?>       
             5 |             5 |            05 | <?>       
             6 |             6 |            06 | <?>       
             7 |             7 |            07 | <?>       
             8 |             8 |            08 | <?>       
             9 |             9 |            09 | <?>       
            10 |            10 |            0A | <LF>      
            11 |            11 |            0B | <?>       
            12 |            12 |            0C | <?>       
            13 |            13 |            0D | <CR>      
            14 |            14 |            0E | <?>       
            15 |            15 |            0F | <?>       
            16 |            16 |            10 | <?>       
            17 |            17 |            11 | <?>       
            18 |            18 |            12 | <?>       
            19 |            19 |            13 | <?>       
            20 |            20 |            14 | <?>       
            21 |            21 |            15 | <?>       
            22 |            22 |            16 | <?>       
            23 |            23 |            17 | <?>       
            24 |            24 |            18 | <?>       
            25 |            25 |            19 | <?>       
            26 |            26 |            1A | <?>       
            27 |            27 |            1B | <?>       
            28 |            28 |            1C | <?>       
            29 |            29 |            1D | <?>       
            30 |            30 |            1E | <?>       
            31 |            31 |            1F | <?>       
            32 |            32 |            20 | <SP>      
            33 |            33 |            21 | !         
            34 |            34 |            22 | "         
            35 |            35 |            23 | #         
            36 |            36 |            24 | $         
            37 |            37 |            25 | %         
            38 |            38 |            26 | &         
            39 |            39 |            27 | '         
            40 |            40 |            28 | (         
            41 |            41 |            29 | )         
            42 |            42 |            2A | *         
            43 |            43 |            2B | +         
            44 |            44 |            2C | ,         
            45 |            45 |            2D | -         
            46 |            46 |            2E | .         
            47 |            47 |            2F | /         
            48 |            48 |            30 | 0         
            49 |            49 |            31 | 1         
            50 |            50 |            32 | 2         
            51 |            51 |            33 | 3         
            52 |            52 |            34 | 4         
            53 |            53 |            35 | 5         
            54 |            54 |            36 | 6         
            55 |            55 |            37 | 7         
            56 |            56 |            38 | 8         
            57 |            57 |            39 | 9         
            58 |            58 |            3A | :         
            59 |            59 |            3B | ;         
            60 |            60 |            3C | <         
            61 |            61 |            3D | =         
            62 |            62 |            3E | >         
            63 |            63 |            3F | ?         
            64 |            64 |            40 | @         

After this, we start to see differences from the case-sensitive collation. The next block is the alphabetic characters:

            65 |            65 |            41 | A         
            97 |            97 |            61 | a         
           192 |           192 |            C0 | À         
           193 |           193 |            C1 | Á         
           194 |           194 |            C2 | Â         
           195 |           195 |            C3 | Ã         
           196 |           196 |            C4 | Ä         
           224 |           224 |            E0 | à         
           225 |           225 |            E1 | á         
           226 |           226 |            E2 | â         
           227 |           227 |            E3 | ã         
           228 |           228 |            E4 | ä         
            66 |            66 |            42 | B         
            98 |            98 |            62 | b         
            67 |            67 |            43 | C         
            99 |            99 |            63 | c         
           199 |           199 |            C7 | Ç         
           231 |           231 |            E7 | ç         
            68 |            68 |            44 | D         
           100 |           100 |            64 | d         
           208 |           208 |            D0 | Ð         
           240 |           240 |            F0 | ð         
            69 |            69 |            45 | E         
           101 |           101 |            65 | e         
           200 |           200 |            C8 | È         
           201 |           201 |            C9 | É         
           202 |           202 |            CA | Ê         
           203 |           203 |            CB | Ë         
           232 |           232 |            E8 | è         
           233 |           233 |            E9 | é         
           234 |           234 |            EA | ê         
           235 |           235 |            EB | ë         
            70 |            70 |            46 | F         
           102 |           102 |            66 | f         
            71 |            71 |            47 | G         
           103 |           103 |            67 | g         
            72 |            72 |            48 | H         
           104 |           104 |            68 | h         
            73 |            73 |            49 | I         
           105 |           105 |            69 | i         
           204 |           204 |            CC | Ì         
           205 |           205 |            CD | Í         
           206 |           206 |            CE | Î         
           207 |           207 |            CF | Ï         
           236 |           236 |            EC | ì         
           237 |           237 |            ED | í         
           238 |           238 |            EE | î         
           239 |           239 |            EF | ï         
            74 |            74 |            4A | J         
           106 |           106 |            6A | j         
            75 |            75 |            4B | K         
           107 |           107 |            6B | k         
            76 |            76 |            4C | L         
           108 |           108 |            6C | l         
            77 |            77 |            4D | M         
           109 |           109 |            6D | m         
            78 |            78 |            4E | N         
           110 |           110 |            6E | n         
           209 |           209 |            D1 | Ñ         
           241 |           241 |            F1 | ñ         
            79 |            79 |            4F | O         
           111 |           111 |            6F | o         
           210 |           210 |            D2 | Ò         
           211 |           211 |            D3 | Ó         
           212 |           212 |            D4 | Ô         
           213 |           213 |            D5 | Õ         
           214 |           214 |            D6 | Ö         
           242 |           242 |            F2 | ò         
           243 |           243 |            F3 | ó         
           244 |           244 |            F4 | ô         
           245 |           245 |            F5 | õ         
           246 |           246 |            F6 | ö         
            80 |            80 |            50 | P         
           112 |           112 |            70 | p         
            81 |            81 |            51 | Q         
           113 |           113 |            71 | q         
            82 |            82 |            52 | R         
           114 |           114 |            72 | r         
           223 |           223 |            DF | ß         
            83 |            83 |            53 | S         
           115 |           115 |            73 | s         
            84 |            84 |            54 | T         
           116 |           116 |            74 | t         
            85 |            85 |            55 | U         
           117 |           117 |            75 | u         
           217 |           217 |            D9 | Ù         
           218 |           218 |            DA | Ú         
           219 |           219 |            DB | Û         
           220 |           220 |            DC | Ü         
           249 |           249 |            F9 | ù         
           250 |           250 |            FA | ú         
           251 |           251 |            FB | û         
           252 |           252 |            FC | ü         
            86 |            86 |            56 | V         
           118 |           118 |            76 | v         
            87 |            87 |            57 | W         
           119 |           119 |            77 | w         
            88 |            88 |            58 | X         
           120 |           120 |            78 | x         
            89 |            89 |            59 | Y         
           121 |           121 |            79 | y         
           221 |           221 |            DD | Ý         
           253 |           253 |            FD | ý         
           255 |           255 |            FF | ÿ         
            90 |            90 |            5A | Z         
           122 |           122 |            7A | z         

All variants of a base letter sort with the same weight, regardless of case or accent marks. We can determine this by comparing, for example, the band of characters based on the letter a/A with the secondary sort in ascending order:

            65 |            65 |            41 | A         
            97 |            97 |            61 | a         
           192 |           192 |            C0 | À         
           193 |           193 |            C1 | Á         
           194 |           194 |            C2 | Â         
           195 |           195 |            C3 | Ã         
           196 |           196 |            C4 | Ä         
           224 |           224 |            E0 | à         
           225 |           225 |            E1 | á         
           226 |           226 |            E2 | â         
           227 |           227 |            E3 | ã         
           228 |           228 |            E4 | ä         

...with the same sample with the secondary sort in descending order:

           228 |           228 |            E4 | ä         
           227 |           227 |            E3 | ã         
           226 |           226 |            E2 | â         
           225 |           225 |            E1 | á         
           224 |           224 |            E0 | à         
           196 |           196 |            C4 | Ä         
           195 |           195 |            C3 | Ã         
           194 |           194 |            C2 | Â         
           193 |           193 |            C1 | Á         
           192 |           192 |            C0 | À         
            97 |            97 |            61 | a         
            65 |            65 |            41 | A         

After all the alphabetic characters, we have the same block of symbols which sorted between all the uppercase and all the lowercase in the case-sensitive collation:

            91 |            91 |            5B | [         
            92 |            92 |            5C | \         
            93 |            93 |            5D | ]         
            94 |            94 |            5E | ^         
            95 |            95 |            5F | _         
            96 |            96 |            60 | `         

Next, we have a jumble of symbols, similar to the case-sensitive collation:

           123 |           123 |            7B | {         
           124 |           124 |            7C | |         
           125 |           125 |            7D | }         
           126 |           126 |            7E | ~         
           127 |           127 |            7F |          
           197 |           197 |            C5 | Å         
           229 |           229 |            E5 | å         
           198 |           198 |            C6 | Æ         
           230 |           230 |            E6 | æ         
           163 |           163 |            A3 | £         
           216 |           216 |            D8 | Ø         
           248 |           248 |            F8 | ø         
           159 |           159 |            9F | <?>       
           215 |           215 |            D7 | ×         
           191 |           191 |            BF | ¿         
           189 |           189 |            BD | ½         
           188 |           188 |            BC | ¼         
           161 |           161 |            A1 | ¡         
           171 |           171 |            AB | «         
           187 |           187 |            BB | »         
           128 |           128 |            80 | <?>       
           129 |           129 |            81 | <?>       
           130 |           130 |            82 | <?>       
           131 |           131 |            83 | <?>       
           132 |           132 |            84 | <?>       
           169 |           169 |            A9 | ©         
           133 |           133 |            85 | <?>       
           134 |           134 |            86 | <?>       
           135 |           135 |            87 | <?>       
           136 |           136 |            88 | <?>       
           162 |           162 |            A2 | ¢         
           165 |           165 |            A5 | ¥         
           137 |           137 |            89 | <?>       
           138 |           138 |            8A | <?>       
           139 |           139 |            8B | <?>       
           140 |           140 |            8C | <?>       
           141 |           141 |            8D | <?>       
           142 |           142 |            8E | <?>       
           143 |           143 |            8F | <?>       
           172 |           172 |            AC | ¬         
           174 |           174 |            AE | ®         
           144 |           144 |            90 | <?>       
           145 |           145 |            91 | <?>       
           146 |           146 |            92 | <?>       
           147 |           147 |            93 | <?>       
           148 |           148 |            94 | <?>       
           149 |           149 |            95 | <?>       
           150 |           150 |            96 | <?>       
           164 |           164 |            A4 | ¤         
           151 |           151 |            97 | <?>       
           152 |           152 |            98 | <?>       
           153 |           153 |            99 | <?>       
           154 |           154 |            9A | <?>       
           155 |           155 |            9B | <?>       
           166 |           166 |            A6 | ¦         
           222 |           222 |            DE | Þ         
           254 |           254 |            FE | þ         
           156 |           156 |            9C | <?>       
           170 |           170 |            AA | ª         
           186 |           186 |            BA | º         
           181 |           181 |            B5 | µ         
           175 |           175 |            AF | ¯         
           180 |           180 |            B4 | ´         
           173 |           173 |            AD | ­         
           177 |           177 |            B1 | ±         
           157 |           157 |            9D | <?>       
           190 |           190 |            BE | ¾         
           182 |           182 |            B6 | ¶         
           167 |           167 |            A7 | §         
           247 |           247 |            F7 | ÷         
           184 |           184 |            B8 | ¸         
           176 |           176 |            B0 | °         
           168 |           168 |            A8 | ¨         
           183 |           183 |            B7 | ·         
           185 |           185 |            B9 | ¹         
           179 |           179 |            B3 | ³         
           178 |           178 |            B2 | ²         
           158 |           158 |            9E | <?>       

For the handful of these that could be considered alphabetic in nature, and which have lowercase and uppercase representations, those pairs sort with the same weight in the case-insensitive collation, as one would expect. For example, each of the characters in these pairs sort identically (i.e., non-deterministically when the secondary sort is omitted):

           197 |           197 |            C5 | Å         
           229 |           229 |            E5 | å         
           198 |           198 |            C6 | Æ         
           230 |           230 |            E6 | æ         

As expected, the same four characters sort deterministically in the case-sensitive collation:

           229 |           229 |            E5 | å         
           197 |           197 |            C5 | Å         
           230 |           230 |            E6 | æ         
           198 |           198 |            C6 | Æ         

In an odd quirk, though, the Pound Sterling symbol sorts before the pair of upper/lower case Ø characters in the case-insensitive collation:

           163 |           163 |            A3 | £         
           216 |           216 |            D8 | Ø         
           248 |           248 |            F8 | ø         

However, it sorts between them in the case-sensitive collation:

           248 |           248 |            F8 | ø         
           163 |           163 |            A3 | £         
           216 |           216 |            D8 | Ø         

After this group of symbols, we again have the non-breaking space character sorting last:

           160 |           160 |            A0 | <NBSP>    

To summarize the 4GL case-insensitive, basic collation for ISO-8859-1 at a high level:

  1. NULL first
  2. Non-printing/control characters, punctuation, space, symbols, and digits in ASCII order through codepoint 64
  3. All upper- and lowercase alphabetic letters, including those with accents/diacritical marks
  4. Additional symbols (including currency) and non-printing characters
  5. Non-breaking space last

It is worth noting that there is no deterministic sorting among the upper- and lowercase variants, and all accented variants of a base letter. For example, a, A, and all accented forms of either sort with identical weight; followed by b, B (sorted identically); then c, C and all accented/diacritically marked forms of either (sorted with identical weight); and so on.

Other than some minor quirks (such as the placement of the Pound Sterling symbol), the case-sensitive and case-insensitive collations compare with one another as one might expect.

PostgreSQL

The case-insensitive collation for this code page is nearly identical to the case-sensitive collation described above, except that each pair of upper- and lowercase characters sorts non-deterministically for the former and deterministically for the latter. All non-alphabetic characters sort deterministically and in the same order as described above for the case-sensitive collation.

Each letter, whether the base letter, or an accented variant has exactly two representations: an uppercase and a lowercase. The pairs of these letters sort deterministically relative to other pairs, but the upper- and lowercase forms within each pair sort non-deterministically. For example, the pair:

            65 |            65 |          0041 | A         
            97 |            97 |          0061 | a         

sorts before the pair:

           193 |           193 |          00C1 | Á         
           225 |           225 |          00E1 | á         

...which sorts before the pair:

           192 |           192 |          00C0 | À         
           224 |           224 |          00E0 | à         

...and so on...

However, the letters within each of those pairs are weighted the same as one another.

Other than this difference from the case-sensitive collation, the other characters are sorted and grouped the same as described in the case-sensitive section above.

Note that the "banding" of characters which sort non-deterministically is very different than that of the 4GL case-insensitive collation for the same code page, as the 4GL includes all cases and accented forms of a base letter in the same band, whereas PostgreSQL includes only the pairs as described above.

UTF-8

[TODO: document]

Practical Effects of Incompatible Collation

FWD implements compatible collation between the legacy environment and the FWD environment, because allowing incompatibilities in collation can change application behavior in significant and unexpected ways. Given the substantial differences in the default collations of the 4GL and PostgreSQL for the same code pages and languages, compatible behavior of converted 4GL programs running in FWD would not be guaranteed in the absence of a strategy to emulate the 4GL's collation in a PostgreSQL database.

Consider, for example, the following table definition:

ADD TABLE "simple" 
  AREA "Schema Area" 
  DUMP-NAME "simple" 

ADD FIELD "number" OF "simple" AS integer
  FORMAT "->,>>>,>>9" 
  INITIAL "0" 
  LABEL "Number" 
  MAX-WIDTH 4
  ORDER 10
  MANDATORY

ADD FIELD "description" OF "simple" AS character
  FORMAT "x(24)" 
  INITIAL "" 
  LABEL "Description" 
  MAX-WIDTH 48
  ORDER 20

ADD INDEX "pi-number" ON "simple" 
  AREA "Schema Area" 
  UNIQUE
  PRIMARY
  INDEX-FIELD "number" ASCENDING

ADD INDEX "si-desc" ON "simple" 
  AREA "Schema Area" 
  INDEX-FIELD "description" ASCENDING

The following program operates on the simple table by:

  1. clearing it;
  2. populating it with some data;
  3. searching for records by the positions of the description fields relative to some constant text.
define var i as int init 0 no-undo.

do transaction:

   for each simple:
      delete simple.
   end.

   create simple.
   assign number = 1 description = "aaa".
   create simple.
   assign number = 2 description = "bbb".
   create simple.
   assign number = 3 description = "CCC".
   create simple.
   assign number = 4 description = "zzz".
   create simple.
   assign number = 5 description = "£20".
   create simple.
   assign number = 6 description =  "¡".
   create simple.
   assign number = 7 description = "¥4500".

end.

/* sort by ascending index on description */
for each simple no-lock use-index si-desc:
   display simple with frame f0 10 down.
end.

find first simple where description > "zzz" no-error.
message "First after 'zzz':" if avail simple then description else "[not found]".
pause.

find first simple where description < "¡" no-error.
message "First before '¡':" if avail simple then description else "[not found]".
pause.

find last simple where description < "¡" no-error.
message "Last before '¡':" if avail simple then description else "[not found]".
pause.

find first simple where description > "¡" no-error.
message "First after '¡':" if avail simple then description else "[not found]".
pause.

for each simple no-lock where description < "¡":
   i = i + 1.
end.
message "Records found before '¡':" string(i).
pause.

i = 0.
for each simple no-lock where description > "¡":
   i = i + 1.
end.
message "Records found after '¡':" string(i).

The 4GL database uses basic collation and the ISO-8859-1 code page. The output of the program (with some whitespace and pause messages removed) is as follows:

┌───────────────────────────────────┐
│    Number Description             │
│────────── ────────────────────────│
│         1 aaa                     │
│         2 bbb                     │
│         3 CCC                     │
│         4 zzz                     │
│         5 £20                     │
│         6 ¡                       │
│         7 ¥4500                   │
│                                   │
│                                   │
│                                   │
└───────────────────────────────────┘

First after 'zzz': £20
First before '¡': aaa
Last before '¡': £20
First after '¡': ¥4500
Records found before '¡': 5
Records found after '¡': 1

Running the converted form of this program with a schematically compatible simple table in a PostgreSQL database built upon the en_US@iso88591_fwd_basic custom locale produces identical output. This custom locale is built to collate text data in a compatible way with the legacy 4GL database for which it was originally written (i.e., basic collation and the ISO-8859-1 code page).

Running the same converted program against the same table in a PostgreSQL database built upon the en_US.iso88591 locale, but which is otherwise identical, produces markedly different output. This standard locale ships with Ubuntu Linux. As such, it does not contain the FWD implementation of the backward compatible, basic collation for the ISO-8859-1 character set. The program output is as follows:

┌───────────────────────────────────┐
│    Number Description             │
│────────── ────────────────────────│
│         6 ¡                       │
│         5 £20                     │
│         7 ¥4500                   │
│         1 aaa                     │
│         2 bbb                     │
│         3 CCC                     │
│         4 zzz                     │
│                                   │
│                                   │
│                                   │
└───────────────────────────────────┘

First after 'zzz': [not found]
First before '¡': [not found]
Last before '¡': [not found]
First after '¡': £20
Records found before '¡': 0
Records found after '¡': 6

In comparing the two outputs, it is immediately apparent in the down frames that the records are sorted in different order. The sorting of the alphabetic data is compatible; however, the two currency symbols and the inverted exclamation mark sort differently relative to the alphabetic characters, and even relative to each other.

If the results fetched from the database were always the same, and the difference in behavior were limited only to sort order, this might be an acceptable deviation in behavior for some applications. No data is missing from the initial FOR loop results; the order may or may not be important for a particular use case.

However, differences in sorting can be more disruptive. This becomes apparent, for example, when the criteria of WHERE clauses depend upon how the database collates its text. All of the FIND FIRST and FIND LAST statements in this program, as well as the final two FOR loops, use range matches which compare a record's description with a particular, textual constant.

The FINDs either do not find a record at all, or find a different record than the original program. This can lead to confusion at best, and to following completely different logic paths at worst. The final two FOR loops find entirely different sets of records (in one case the empty set). Clearly, this is not desirable.

Admittedly, the logic of this example program is contrived. Nevertheless, it illustrates the importance of compatible collation behavior. The program actually is based on idioms we have encountered in real, production applications. Not all applications use these types of idioms; encoding logic to assumptions of sort behavior is not a recommended practice. Nevertheless, there is a significant amount of 4GL code in the wild that relies upon such assumptions, thus necessitating FWD's implementation of compatible collation.

An alternative to leveraging the compatible collation infrastructure for PostgreSQL databases within FWD would be to choose a standard locale which was deemed "close enough" to the original 4GL collation. If such an approach were chosen, there would need to be a process to identify and remove problematic idioms/dependencies from the original 4GL application. Even so, incompatibilities in the converted application's behavior likely would still occur, since "close enough" is not "identical". These incompatibilities would have to be identified and differentiated from possible defects in FWD or in the 4GL application's logic, and then either corrected or further evaluated to determine whether they were acceptable to affected stakeholders.