Project

General

Profile

Feature #4681

prepared statement caching/pooling

Added by Eric Faulhaber almost 4 years ago. Updated almost 4 years ago.

Status:
WIP
Priority:
Normal
Assignee:
-
Target version:
-
Start date:
Due date:
% Done:

0%

billable:
No
vendor_id:
GCD
version:

Related issues

Related to Database - Feature #4011: database/persistence layer performance improvements Closed

History

#1 Updated by Eric Faulhaber almost 4 years ago

This issue is to discuss the implementation of PreparedStatement caching/pooling as a potential performance optimization.

#2 Updated by Ovidiu Maxiniuc almost 4 years ago

My concern is related to multiple session close (they have a short life), which will invalidate the cached statements associated with respective connection. I am not saying that we will not benefit from this, but I estimate the gain will be marginal, when the advantage of using an already ready statement is combined with the extra management and occupied resources while item are cached.

In majority of cases if not all of them, we use prepared statements like this:

    Connection conn = session.getConnection();
    checkClosed(conn);
    stmt = conn.prepareStatement(sql);
    setParameters(stmt);
    ResultSet resultSet = stmt.executeQuery();
    hydrate(resultSet);
    stmt.close();

There are about 20-30 places where something similar is done and I guess we should create a manager to cover then all. Then, to cache the statements we need to rewrite the above code like:
    PreparedStatementManager.get().getPreparedStatement(session.getConnection(), sql);
    setParameters(stmt);
    ResultSet resultSet = stmt.executeQuery();
    hydrate(resultSet);

The manager will probably store the statement double mapped: by their sql, and by connection. The getPreparedStatement() will make sure the connection is valid, and if new, a new entry will be added. We will also need a connection close callback which will be called when a session is closed. Here is a draft of such class:
    private static Map<Connection, Map<String, PreparedStatement>> data = new HashMap<>();

    public PreparedStatement getPreparedStatement(Connection conn, String sql)
    throws SQLException
    {
       if (conn.isClosed())
       {
          connectionClosed(conn);
          return null; // or throw something
       }

       Map<String, PreparedStatement> connMap = data.computeIfAbsent(conn, k -> new HashMap<>());
       return connMap.computeIfAbsent(sql, k -> conn.prepareStatement(sql));
    }

    public void connectionClosed(Connection conn)
    throws SQLException
    {
       Map<String, PreparedStatement> map = data.remove(conn);
       if (map == null)
       {
          return;
       }
       for (Map.Entry<String, PreparedStatement> stringPreparedStatementEntry : map.entrySet())
       {
          stringPreparedStatementEntry.getValue().close();
       }
    }

Of course, the naive map can be replaced with a more intelligent cache we already have, but when a connection is dropped because of cache constraints (time or space), the associated statements should be closed using connectionClosed() method.

Some synchronization is necessary if we use static data as above, or make the instance context local. I estimate a slightly better performance this way (static).

#3 Updated by Eric Faulhaber almost 4 years ago

I think we can simplify this idea a bit. I don't see the value in making a static cache. A PreparedStatement instance can only work with the connection that created it, so if we were to create a statement cache, it makes sense for it to live close to its connection and for it to be cleared/discarded when that connection is closed (or returned to the pool). As such, I think it makes sense to store it within the Session object itself, since the Session object's lifespan also is tied to that of the connection. As long as you have a reference to the current Session instance when you need to prepare/re-use a statement, this avoids any need for synchronization or context-local lookup. It also avoids an extra layer of lookup by connection, since the relationship between statement cache and connection becomes implicit.

However, many production-ready connection pools, including c3p0, offer statement pooling as a feature. I think it makes the most sense to investigate this first, before rolling our own solution. Currently, our temporary table data source provider does not use c3p0 as a backing data source, but we should implement this, instead of the current DriverManager.getConnection approach, which is very slow when using many, short-lived sessions.

Although I already have this c3p0 setting in the directory for use with 4011a:

<node class="integer" name="maxStatementsPerConnection">
  <node-attribute name="value" value="100"/>
</node>

...I'm not sure that is quite enough to leverage this feature properly. There are some things to figure out and some benchmark testing to be done to make sure we are actually gaining the advantage of the pooling. At minimum, we have to actually be using c3p0 behind the temp table provider, but I think there may also be some tuning we need to do. For instance, we currently set prepareThreshold to 1 in directory.xml, but it looks like I didn't carry over support for this JDBC parameter/setting to 4011a, so it is being set to whatever the default value is (3?). Furthermore, the c3p0 documentation and an independent article suggests that the statement pooling could actually be detrimental to performance in some cases and should be disabled.

See:

#4 Updated by Eric Faulhaber almost 4 years ago

  • Related to Feature #4011: database/persistence layer performance improvements added

Also available in: Atom PDF