Project

General

Profile

Feature #2949

Alternative approach to handling case insensitivity

Added by Paul E over 8 years ago. Updated over 1 year ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
Start date:
Due date:
% Done:

0%

billable:
No
vendor_id:
GCD
version:

Related issues

Related to Database - Bug #6837: prevent unnecessary upper/rtrim injection in queries if the data type doesn't require it WIP

History

#1 Updated by Paul E over 8 years ago

We understand how your P2J implementations of this work for PostgreSQL and SQL Server, and we understand why these choices were made. There is no functional concern about this for most use-cases: the P2J version of our application will behave as it should; our main approach to reporting will be fine (we push all of the data into a SQL-server database and reports run from that DB). However, there are some use cases involving customers reading directly from the DB. There are also some administration use cases to do things like data fixes. In these cases, the extra effort to ensure you're using functional indexes or including calculated fields in your queries would be hard for us to justify and enforce.

As such, we'd like an alternative implementation to support use of DB-native case insensitive features rather than the current approach (perhaps as a toggle-able option if you prefer). This alternative implementation would use case insensitive text fields for PostgreSQL (citext) and a case insensitive collation for MS SQL Server. We would do right trimming pre-import (so your import routines wouldn't need to change).

We see this as something we'd prioritise along with the rest of the enhancements to be considered once the main functional milestones have been achieved.

#2 Updated by Paul E over 8 years ago

Some benefits of this approach:
  • easier to write ad-hoc queries
  • no difference between queries written for PostgreSQL or SQL Server
  • more similarity in PostgreSQL and SQL Server schemas
  • performance benefits of not having to execute these functions in where clauses
Drawbacks:
  • Implementation and support effort - to Quote Eric from an email:

... and I am concerned about support especially. Consider that if problems arise from the change (i.e., an incompatibility with Progress behavior), we won't know whether it's due to this (and thus, presumably acceptable as a divergence from the original behavior, or a customer's responsibility to change/fix), or a "real" bug, until we have taken the hit to diagnose any such problems. This is a primary driver as to why we have been so slavish in maintaining compatibility as closely as possible.

#3 Updated by Greg Shah over 7 years ago

  • Target version deleted (24)

#4 Updated by Greg Shah over 1 year ago

  • Related to Bug #6837: prevent unnecessary upper/rtrim injection in queries if the data type doesn't require it added

#5 Updated by Greg Shah over 1 year ago

  • Start date deleted (01/08/2016)

We are considering changes to the PostgreSQL schema. One possible is to use a case-insensitive text type (e.g. something like CITEXT) instead of TEXT. As Eric notes above, this will require intrusive changes in the persistence layer and at conversion time. We are already doing this for use in MariaDB. Note that at least inMariaDB, there are issues with fixed length fields (the type used there truncates data). It doesn't look like CITEXT is fixed length. On the other hand, we will have some other issues:

  • It uses lower instead of upper for case-insensitivity. This is known to break certain use cases (upper matches 4GL behavior better).
  • We still have an issue with RTRIM() for ignoring trailing whitespace in comparisons.
  • CITEXT is now a trusted modules which means it can be loaded by non-superusers and it should exist everywhere (e.g. Amazon RDS or Aurora). But it may still require an extra setup step.
  • Based on the documentation, the performance of CITEXT may be better in some cases and worse in others.

#6 Updated by Eric Faulhaber over 1 year ago

Greg Shah wrote:

It uses lower instead of upper for case-insensitivity. This is known to break certain use cases (upper matches 4GL behavior better).

Do you recall where these cases are documented?

#7 Updated by Greg Shah over 1 year ago

It uses lower instead of upper for case-insensitivity. This is known to break certain use cases (upper matches 4GL behavior better).

Do you recall where these cases are documented?

See #1587-235. And we more recently found issues with uppercasing. See #5085. We need testcases to explore the full extent of the problem.

Also available in: Atom PDF