Project

General

Profile

Bug #3032

running ETF fails because some postgresql server process gets killed

Added by Ovidiu Maxiniuc about 8 years ago. Updated about 8 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
Start date:
Due date:
% Done:

0%

billable:
No
vendor_id:
GCD
case_num:

History

#1 Updated by Ovidiu Maxiniuc about 8 years ago

I encountered difficulties running ETF after upgrading my OS to Ubuntu 15.10 and completely dropping java7 support.

Apparently the test environment starts and for a couple of minutes everything goes well. After a number of testcases (their number reflected in the number of UUIDs logged in server.log differ, but the figure is about 100, usually more), the environments halt: no CPU and no disk activity.

The BPM displays some exceptions reflected in server.log:

[03/18/2016 19:06:34 GMT] (org.hibernate.engine.jdbc.spi.SqlExceptionHelper:WARN) SQL Error: 0, SQLState: 08006
[03/18/2016 19:06:34 GMT] (org.hibernate.engine.jdbc.spi.SqlExceptionHelper:ERROR) An I/O error occurred while sending to the backend.
[03/18/2016 19:06:34 GMT] (com.mchange.v2.log.slf4j.Slf4jMLog$Slf4jMLogger$WarnLogger:WARNING) [c3p0] A PooledConnection that has already signalled a Connection error is still in use!
[03/18/2016 19:06:34 GMT] (com.mchange.v2.log.slf4j.Slf4jMLog$Slf4jMLogger$WarnLogger:WARNING) [c3p0] Another error has occurred [ org.postgresql.util.PSQLException: This connection has been closed. ] which will not be reported to listeners!
org.postgresql.util.PSQLException: This connection has been closed.
        at org.postgresql.jdbc.PgConnection.checkClosed(PgConnection.java:750)
        at org.postgresql.jdbc.PgConnection.getAutoCommit(PgConnection.java:722)
        at com.mchange.v2.c3p0.impl.NewProxyConnection.getAutoCommit(NewProxyConnection.java:1200)
        at org.hibernate.engine.jdbc.internal.LogicalConnectionImpl.isAutoCommit(LogicalConnectionImpl.java:392)
        at org.hibernate.engine.transaction.internal.TransactionCoordinatorImpl.afterNonTransactionalQuery(TransactionCoordinatorImpl.java:195)
        at org.hibernate.internal.SessionImpl.afterOperation(SessionImpl.java:597)
        at org.hibernate.internal.SessionImpl.listCustomQuery(SessionImpl.java:1789)
        at org.hibernate.internal.AbstractSessionImpl.list(AbstractSessionImpl.java:231)
        at org.hibernate.internal.SQLQueryImpl.list(SQLQueryImpl.java:156)
        at org.hibernate.internal.AbstractQueryImpl.uniqueResult(AbstractQueryImpl.java:905)
        at com.goldencode.p2j.persist.id.JdbcKeyRetriever.retrieveFreeKeys(JdbcKeyRetriever.java:183)
        at com.goldencode.p2j.persist.id.IdentityPool$KeyScanner.run(IdentityPool.java:576)
        at java.lang.Thread.run(Thread.java:745)

Followed by:
[03/18/2016 19:06:34 GMT] (com.goldencode.p2j.persist.id.IdentityPool$KeyScanner:SEVERE) Identity pool has been stopped because of an exception:
org.hibernate.exception.JDBCConnectionException: could not inspect JDBC autocommit mode
        at org.hibernate.exception.internal.SQLStateConversionDelegate.convert(SQLStateConversionDelegate.java:131)
        at org.hibernate.exception.internal.StandardSQLExceptionConverter.convert(StandardSQLExceptionConverter.java:49)
        at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:125)
        at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:110)
        at org.hibernate.engine.jdbc.internal.LogicalConnectionImpl.isAutoCommit(LogicalConnectionImpl.java:395)
        at org.hibernate.engine.transaction.internal.TransactionCoordinatorImpl.afterNonTransactionalQuery(TransactionCoordinatorImpl.java:195)
        at org.hibernate.internal.SessionImpl.afterOperation(SessionImpl.java:597)
        at org.hibernate.internal.SessionImpl.listCustomQuery(SessionImpl.java:1789)
        at org.hibernate.internal.AbstractSessionImpl.list(AbstractSessionImpl.java:231)
        at org.hibernate.internal.SQLQueryImpl.list(SQLQueryImpl.java:156)
        at org.hibernate.internal.AbstractQueryImpl.uniqueResult(AbstractQueryImpl.java:905)
        at com.goldencode.p2j.persist.id.JdbcKeyRetriever.retrieveFreeKeys(JdbcKeyRetriever.java:183)
        at com.goldencode.p2j.persist.id.IdentityPool$KeyScanner.run(IdentityPool.java:576)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.postgresql.util.PSQLException: This connection has been closed.
        at org.postgresql.jdbc.PgConnection.checkClosed(PgConnection.java:750)
        at org.postgresql.jdbc.PgConnection.getAutoCommit(PgConnection.java:722)
        at com.mchange.v2.c3p0.impl.NewProxyConnection.getAutoCommit(NewProxyConnection.java:1200)
        at org.hibernate.engine.jdbc.internal.LogicalConnectionImpl.isAutoCommit(LogicalConnectionImpl.java:392)
        at org.hibernate.engine.transaction.internal.TransactionCoordinatorImpl.afterNonTransactionalQuery(TransactionCoordinatorImpl.java:195)
        at org.hibernate.internal.SessionImpl.afterOperation(SessionImpl.java:597)
        at org.hibernate.internal.SessionImpl.listCustomQuery(SessionImpl.java:1789)
        at org.hibernate.internal.AbstractSessionImpl.list(AbstractSessionImpl.java:231)
        at org.hibernate.internal.SQLQueryImpl.list(SQLQueryImpl.java:156)
        at org.hibernate.internal.AbstractQueryImpl.uniqueResult(AbstractQueryImpl.java:905)
        at com.goldencode.p2j.persist.id.JdbcKeyRetriever.retrieveFreeKeys(JdbcKeyRetriever.java:183)
        at com.goldencode.p2j.persist.id.IdentityPool$KeyScanner.run(IdentityPool.java:576)
        at java.lang.Thread.run(Thread.java:745)
[03/18/2016 19:06:34 GMT] (org.hibernate.engine.jdbc.spi.SqlExceptionHelper:WARN) SQL Error: 0, SQLState: 08006
[03/18/2016 19:06:34 GMT] (org.hibernate.engine.jdbc.spi.SqlExceptionHelper:ERROR) An I/O error occurred while sending to the backend.
[03/18/2016 19:06:34 GMT] (com.goldencode.p2j.util.Agent:WARNING) Agent encountered an error while executing a command for appserver app_server
com.goldencode.p2j.persist.DatabaseConnectionException: org.hibernate.exception.JDBCConnectionException: An I/O error occurred while sending to the backend.
        at com.goldencode.p2j.persist.DBUtils.handleException(DBUtils.java:593)
        at com.goldencode.p2j.persist.Persistence.scroll(Persistence.java:1472)
        at com.goldencode.p2j.persist.Persistence.scroll(Persistence.java:1410)
        at com.goldencode.p2j.persist.ProgressiveResults.getResults(ProgressiveResults.java:1036)
        at com.goldencode.p2j.persist.ProgressiveResults.getResults(ProgressiveResults.java:997)
        at com.goldencode.p2j.persist.ProgressiveResults.moveTo(ProgressiveResults.java:845)
        at com.goldencode.p2j.persist.ProgressiveResults.moveTo(ProgressiveResults.java:719)
        at com.goldencode.p2j.persist.ProgressiveResults.next(ProgressiveResults.java:334)
        at com.goldencode.p2j.persist.ResultsAdapter.next(ResultsAdapter.java:119)
        at com.goldencode.p2j.persist.AdaptiveQuery.next(AdaptiveQuery.java:1813)
        at com.goldencode.p2j.persist.CompoundQuery.processComponent(CompoundQuery.java:2345)
        at com.goldencode.p2j.persist.CompoundQuery.retrieveImpl(CompoundQuery.java:2042)
   [...]

Looking in postgresql logs:

2016-03-18 21:06:34 EET LOG:  server process (PID 19634) was terminated by signal 11: Segmentation fault
2016-03-18 21:06:34 EET DETAIL:  Failed process was running: select ((select count(id) from dvrepcat where id >= $1 and id <= $2) + (select count(id) from jprpstat where id >= $3 and id <= $4) + (select count(id) from...
2016-03-18 21:06:34 EET LOG:  terminating any other active server processes
2016-03-18 21:06:34 EET WARNING:  terminating connection because of crash of another server process
2016-03-18 21:06:34 EET DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2016-03-18 21:06:34 EET HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2016-03-18 21:06:34 EET WARNING:  terminating connection because of crash of another server process
2016-03-18 21:06:34 EET DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2016-03-18 21:06:34 EET HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2016-03-18 21:06:34 EET FATAL:  the database system is in recovery mode
2016-03-18 21:06:34 EET LOG:  all server processes terminated; reinitializing
2016-03-18 21:06:34 EET FATAL:  the database system is in recovery mode
2016-03-18 21:06:34 EET LOG:  database system was interrupted; last known up at 2016-03-18 21:04:04 EET
2016-03-18 21:06:34 EET FATAL:  the database system is in recovery mode
2016-03-18 21:06:35 EET FATAL:  the database system is in recovery mode
2016-03-18 21:06:35 EET FATAL:  the database system is in recovery mode
2016-03-18 21:06:35 EET FATAL:  the database system is in recovery mode
2016-03-18 21:06:36 EET FATAL:  the database system is in recovery mode
2016-03-18 21:06:36 EET FATAL:  the database system is in recovery mode
2016-03-18 21:06:36 EET FATAL:  the database system is in recovery mode
2016-03-18 21:06:37 EET FATAL:  the database system is in recovery mode
2016-03-18 21:06:37 EET FATAL:  the database system is in recovery mode
2016-03-18 21:06:37 EET FATAL:  the database system is in recovery mode
2016-03-18 21:06:38 EET FATAL:  the database system is in recovery mode
2016-03-18 21:06:38 EET FATAL:  the database system is in recovery mode
2016-03-18 21:06:38 EET FATAL:  the database system is in recovery mode
2016-03-18 21:06:39 EET FATAL:  the database system is in recovery mode
2016-03-18 21:06:39 EET FATAL:  the database system is in recovery mode
2016-03-18 21:06:39 EET FATAL:  the database system is in recovery mode
2016-03-18 21:06:40 EET FATAL:  the database system is in recovery mode
2016-03-18 21:06:40 EET FATAL:  the database system is in recovery mode
2016-03-18 21:06:40 EET FATAL:  the database system is in recovery mode
2016-03-18 21:06:41 EET FATAL:  the database system is in recovery mode
2016-03-18 21:06:41 EET FATAL:  the database system is in recovery mode
2016-03-18 21:06:41 EET FATAL:  the database system is in recovery mode
2016-03-18 21:06:42 EET FATAL:  the database system is in recovery mode
2016-03-18 21:06:42 EET FATAL:  the database system is in recovery mode
2016-03-18 21:06:42 EET FATAL:  the database system is in recovery mode
2016-03-18 21:06:43 EET FATAL:  the database system is in recovery mode
2016-03-18 21:06:43 EET FATAL:  the database system is in recovery mode
2016-03-18 21:06:43 EET FATAL:  the database system is in recovery mode
2016-03-18 21:06:44 EET FATAL:  the database system is in recovery mode
2016-03-18 21:06:44 EET FATAL:  the database system is in recovery mode
2016-03-18 21:06:44 EET FATAL:  the database system is in recovery mode
2016-03-18 21:06:45 EET LOG:  database system was not properly shut down; automatic recovery in progress
2016-03-18 21:06:45 EET LOG:  redo starts at 3A/86722F48
2016-03-18 21:06:45 EET FATAL:  the database system is in recovery mode
2016-03-18 21:06:45 EET LOG:  record with zero length at 3A/867BD618
2016-03-18 21:06:45 EET LOG:  redo done at 3A/867BD5E8
2016-03-18 21:06:45 EET LOG:  last completed transaction was at log time 2016-03-18 21:06:34.48673+02
2016-03-18 21:06:45 EET FATAL:  the database system is in recovery mode
2016-03-18 21:06:45 EET FATAL:  the database system is in recovery mode
2016-03-18 21:06:46 EET FATAL:  the database system is in recovery mode
2016-03-18 21:06:46 EET FATAL:  the database system is in recovery mode
2016-03-18 21:06:46 EET FATAL:  the database system is in recovery mode
2016-03-18 21:06:47 EET LOG:  autovacuum launcher started
2016-03-18 21:06:47 EET LOG:  database system is ready to accept connections

I was not able to identify the process faulty PID 19634, but I think that it may be a (short-lived) intermediary process, may be a JVM helper for pl/java?

#2 Updated by Paul E about 8 years ago

Ovidiu - what is your DB version?

This looks like the IdentityPool scan problem I saw back when I upgraded to PostgreSQL 9.4.

#3 Updated by Ovidiu Maxiniuc about 8 years ago

I am still using PostgreSQL 9.2 Server.

I used the debugger and investigated the issue. As seen in postgresql log it was related to a P2J daemon thread that scans for unused keys. I cannot explain the exact mechanism, but after forcing it (tricking it that there are no more keys to scan) to end everything came back to normal. I am now able to run ETF normally.

#4 Updated by Paul E about 8 years ago

Was your trick to do the following in directory.xml?

<node class="container" name="identity_pool">
  <node class="string" name="scan_method">
    <node-attribute name="value" value="NONE"/>
  </node>
</node>

in the server/standard/database/<db_name> container?

LE: GES removed the database name in this entry.

#5 Updated by Ovidiu Maxiniuc about 8 years ago

Actually, no, not exactly. The identity pool method remained the default, JDBC. But the things are related. That thread automatically quits when it finished scanning the whole database. But as long as it is alive it does some asynchronous queries that might overload PostgreSQL and made it crash. I don't have other explanation for the moment, I did not investigate further once it started working because of higher task priorities.

Also available in: Atom PDF