Classes required by user defined, server-side, database functions
implemented in Java using the PL/Java
framework.
Author
|
Eric Faulhaber |
Date
|
March 13, 2007
|
Access Control
|
CONFIDENTIAL
|
Contents
Introduction
Supporting Technology -
PL/Java
Implementation
Introduction
This package contains classes which support the creation and
execution of user defined functions, implemented in Java, within a
database backend server process. The current implementation will
work within a PostgreSQL backend process using PL/Java. However,
it is anticipated that this implementation can be adapted to work
within other database servers which support Java as a User Defined
Function (UDF) language. Specifically, both DB2 and Oracle
support Java
in this capacity.
Supporting Technology
- PL/Java
The technology which makes the current implementation of this package
possible is PL/Java.
PL/Java is a framework, implemented in Java and C (supported through
JNI on Linux and Windows), which integrates with the UDF architecture
within PostgreSQL to permit the definition
and installation of Java UDFs in PostgreSQL. PL/Java is not a
core part of the PostgreSQL codebase, but rather is a separate, open
source project, released under the Berkley/BSD license. It is worth
noting that this project has a very small membership and all evidence
suggests that it is actually maintained by a single developer.
While this poses some continuity and support risk for the project, this
risk is mitigated by its open source nature.
PL/Java works by running Java UDFs in a separate JVM which is started
within the context of the backend database server process. The
version of PL/Java (1.3.0) currently used by P2J starts a new JVM
instance for each active database connection. While this carries
with it some overhead, this overhead is amortized over the life of the
connection, which should survive multiple application-level
connections, because of the connection pooling used by P2J. Note
that the roadmap for PL/Java calls for a future version which shares a
single JVM across multiple connections, which would further amortize
this overhead.
Installation
To install PL/Java, it is necessary to make modifications to the
PostgreSQL start-up script and configuration, and to "bootstrap"
PL/Java within each database instance for which it will be used.
This "bootstrap" installation needs to be performed only once per
database instance. During this phase, PL/Java installs its own
schema (sqlj
), used to support
its internal operations, and defines its own set of UDFs as an API
which we use to implement additional P2J-specific, Java UDFs.
Unpack PL/Java
Unpack the PL/Java binary distribution to a well-known location, such
as /usr/share/pljava
(the remaining instructions assume
this directory is used). The distribution file used by P2J at the
time of writing is pljava-i686-pc-linux-gnu-pg8.1-1.3.0.tar.gz
,
and it is part of the P2J project (in the p2j/pl
directory).
PostgreSQL Launch/Config Changes
- Copy the
pljava.so
file into PostgreSQL's lib
subdirectory (e.g., /usr/local/pgsql/lib
if PostgreSQL is
installed in /usr/local/pgsql
).
- Modify the shell configuration (e.g.,
.bashrc
if
using bash) for the postgres
user, to ensure the
following
environment variables are set for the postmaster
process:
CLASSPATH=
- this is used to ensure the JVM launched by the PostgreSQL
backend does not inherit a default CLASSPATH setting for the machine,
but rather relies solely upon the classpath set by the p2jpl.jar
installer
LD_LIBRARY_PATH=$JAVA_HOME/jre/lib/i386:$JAVA_HOME/jre/lib/i386/server
- this is used to ensure that the PostgreSQL backend process
can find the libraries necessary to launch JVM instances
- note that we use the HotSpot server compiler rather than the
client compiler; this is intended to increase performance and
assumes that a JVM, once launched, is long-lived, due to the connection
pooling noted above
- Add the following settings to the
postgresql.conf
file (customarily located in the CUSTOMIZED OPTIONS
section):
custom_variable_classes = 'pljava'
pljava.classpath = '/usr/share/pljava/pljava.jar'
- Restart the backend process
Bootstrap PL/Java
The following must be executed against each database which will use
Java UDFs:
cd /usr/share/pljava
psql -U <super_user> -f install.sql
<database_name>
, where:
super_user
is a user with rights to modify the
target database
database_name
is the target database
Uninstallation
The following must be executed against each database into which PL/Java
was installed:
cd /usr/share/pljava
psql -U <super_user> -f uninstall.sql
<database_name>
, where:
super_user
is a user with rights to modify the
target database
database_name
is the target database
- Back out the changes made in the PostgreSQL
Launch/Config Changes section
- Restart the backend process
Implementation
For each Java method which must be supported as a database server
function, a user defined function ("UDF") is created, using the SQL create
function
statement. This registers our function with the
database server backend. For example:
create function myFunc(int, text,
date)
returns boolean as
'java.lang.Boolean
com.goldencode.p2j.persist.pl.Functions.myFuncImpl(
java.lang.Integer,
java.lang.String,
java.sql.Date)'
language java immutable
registers a database function myFunc
for the currently
connected database. This function accepts SQL integer
,
text
, and date
values as its parameters, and
returns a boolean
value to the backend. The myFunc
database function is mapped to a static method myFuncImpl
in
the com.goldencode.p2j.persist.pl.Functions
class.
The
method accepts parameters of type java.lang.Integer
, java.lang.String
,
and java.sql.Date
and returns a value of type java.lang.Boolean
.
PL/Java manages the type conversions from SQL types to J2SE wrappers on
input, and from J2SE wrapper to SQL type on return.
[Note: due to a parser defect in the 1.3.0 version of PL/Java,
the current P2J implementation actually embeds an equals (=
)
symbol within the Java method signature of the create function
statement, between the return type and the function name. This is
a temporary workaround for the parser defect, which has been reported
to the PL/Java developer. This symbol should be removed when a
fixed version of PL/Java becomes available.]
Forwarding API
Generally, the classes within this package do not define the
implementations of the worker methods themselves. Rather, they
provide methods which define a thin, forwarding API layer. The
typical method implementation in this API converts parameters from J2SE
wrapper types to P2J wrapper data types, delegates to the actual worker
method in a separate P2J class, and converts the return value from a
P2J wrapper type to the corresponding J2SE wrapper type. The myFuncImpl()
static method therefore might be implemented as follows:
package
com.goldencode.p2j.persist.pl;
import java.sql.*;
public final class Functions
{
public static final Boolean myFuncImpl(Integer i,
String s, Date d)
{
return SomeWorkerClass.myFuncWorker(new
integer(i),
new
character(s),
new
date(d)).toJavaType();
}
...
}
This API creates an additional layer of code to maintain, and the type
conversions performed in every method are duplicative. While it
would have been preferable in theory to map the create function
statements directly to the backing worker methods in the P2J runtime,
this was not possible in most cases, for several reasons:
- The database functions created by the
create function
statement must use data types known to the database backend. The
database schema of a P2J application uses the "primary" SQL types
(e.g., integer
, text
, boolean
,
etc.) for its data. PL/Java currently offers no way to map these
primary SQL types to custom Java classes (such as the P2J data wrapper
types). This is necessary by PL/Java's design, because type
mappings to Java classes actually are registered with the PostgreSQL
backend, which relies heavily on the integrity and stability of its
core types for internal processing. While PL/Java provides
services to map some SQL types to Java classes, the primary SQL types
are reserved; they have a hard-coded mapping to the J2SE wrapper
types (e.g., java.lang.Integer
, java.lang.String
,
java.lang.Boolean
, etc.). As such, they CAN NOT
be re-mapped to other Java classes. These primary types receive
special consideration in PL/Java's internal implementation, such that
they do not compromise the integrity of the backend.
- PL/Java permits the creation of user defined types (UDTs) within
the backend, mapped to Java classes. This might have been the
basis for an alternate implementation. However, to maximize
portability, the backing data itself is not defined to use UDTs and is
instead mapped directly to the primary SQL types. Thus, using
UDTs just for the server-side function support layer this would have
resulted in an overly complicated and impractical implementation.
Therefore, a thin, forwarding API layer was deemed an appropriate
implementation.
Function
Resolution and Data Types
Special care must be taken when defining functions to ensure the
PostgreSQL backend is able to resolve the proper function
implementation. In particular, certain functions must be
polymorphic to support a mixture of integral and arbitrary precision
numeric data, particularly since it will not necessarily be known until
runtime which data type will be used for a particular query. If a
function whose signature matches the parameter data types provided at
runtime cannot be found, an error will occur and the query will fail.
Fortunately, the backend will attempt to coerce a runtime datum to the
necessary type, if a function signature is available whose data type is
compatible with, but wider than that of a particular runtime
argument. Thus, it is not necessary to define exhaustively every
possible combination of parameter data types for each function, and in
fact doing so would create ambiguity problems during function
resolution. The following data type mappings have been chosen to
minimize ambiguity and to enable function resolution:
SQL
Input Type
|
UDF
Type (also SQL)
|
J2SE
Wrapper Type
|
P2J
Wrapper Type
|
integer
|
integer
|
java.lang.Integer
|
com.goldencode.p2j.util.integer
|
integer
numeric
double precision
|
double precision
|
java.lang.Double
|
com.goldencode.p2j.util.decimal
|
text
varchar
|
text
|
java.lang.String
|
com.goldencode.p2j.util.character
|
date
|
date
|
java.sql.Date
|
com.goldencode.p2j.util.date
|
boolean
|
boolean
|
java.lang.Boolean
|
com.goldencode.p2j.util.logical
|
bytea
|
bytea
|
byte[]
|
com.goldencode.p2j.util.raw
|
PostgreSQL Implementation Note:
due to complications handling timezone offsets, the PostgreSQL JDBC
driver binds a prepared statement date
parameter as the
string representation of that date. This implementation requires
that some additional care be taken when defining functions.
Namely, it is not advisable to create polymorphic UDFs which accept a date
parameter in one case and a text
parameter in another
case, because the text
version always will be resolved in
preference to the date
version. Clearly, this can
cause unintended consequences and errors. A workaround for this
problem is to refactor the polymorphic UDF into separately named
UDFs. An example in P2J is the polymorphic com.goldencode.p2j.persist.pl.Functions.toInt()
method. One version accepts a java.lang.String
parameter, another accepts a java.sql.Date
parameter. However, the UDFs are defined as toInt(text)
and julianDayInt(date)
, respectively, to avoid the
resolution conflict.
Packaging - p2jpl.jar
All of the code necessary to support the UDFs defined for P2J is stored
in the p2jpl.jar
file, which is built automatically with ant
jar
(or any ant targets dependent upon P2J's build.xml
JAR target). This jar file contains the subset of the full P2J
codebase which is necessary to support the functionality required by
the Progress-specific operator and built-in function implementations
used within where clauses. This jar file is used only by the JVMs
launched by the PostgreSQL backend process; the P2J server does
not use this file. Consequently, it is important to remember that
despite the fact that the codebase is common, there is no communication
or state-sharing between the PL/Java-enabled functions and the active
P2J server, except insofar as the former are used to restrict the
contents of database query result sets requested by the latter.
Installing p2jpl.jar
PL/Java installs Java as a trusted language in PostgreSQL, which means
that no special privileges are required to invoke P2J UDFs. It
also means that PL/Java does not access p2jpl.jar
from
the file system (and that P2J UDFs also cannot access the file
system; an untrusted mode also is available, but the trusted mode
suffices for our purposes). Instead, PL/Java serializes jar files
to a byte array and stores these within a database table in its sqlj
schema. At runtime, it uses a custom ClassLoader
implementation to deserialize these byte arrays back into classes and
load them into the JVM.
Consequently, some additional steps are required to install jar files
for PL/Java's use. We use the PL/Java API to install (and
uninstall) p2jpl.jar
, and to set the classpath to allow
the JVM to find our classes. A special deployment descriptor (p2j/pl/p2jpl.ddr
)
is embedded within the jar file, which executes SQL statements to
create all of our UDFs on install or to drop all of our UDFs on
uninstall.
The following statements must be executed against each target database
which is to support P2J server-side functions. This is typically
done from within psql.
select
sqlj.install_jar('file:///<project_path>/p2j/build/lib/p2jpl.jar',
'p2j', true)
, where:
<project_path>
is the directory in which the
P2J project is installed
- the second parameter,
'p2j'
, is used in future
statements as a handle for the serialized p2jpl.jar
- the third parameter,
true
, indicates that the p2jpl.jar
contains a deployment descriptor which should be executed upon
installation
select sqlj.set_classpath('public', 'p2j')
Uninstalling p2jpl.jar
The following statement uninstalls the p2jpl.jar
, and
executes its deployment descriptor to drop all of our UDFs:
select sqlj.remove_jar('p2j', true)