Bug #3032
running ETF fails because some postgresql server process gets killed
0%
History
#1 Updated by Ovidiu Maxiniuc about 8 years ago
I encountered difficulties running ETF after upgrading my OS to Ubuntu 15.10 and completely dropping java7 support.
Apparently the test environment starts and for a couple of minutes everything goes well. After a number of testcases (their number reflected in the number of UUIDs logged in server.log differ, but the figure is about 100, usually more), the environments halt: no CPU and no disk activity.
The BPM displays some exceptions reflected in server.log
:
[03/18/2016 19:06:34 GMT] (org.hibernate.engine.jdbc.spi.SqlExceptionHelper:WARN) SQL Error: 0, SQLState: 08006 [03/18/2016 19:06:34 GMT] (org.hibernate.engine.jdbc.spi.SqlExceptionHelper:ERROR) An I/O error occurred while sending to the backend. [03/18/2016 19:06:34 GMT] (com.mchange.v2.log.slf4j.Slf4jMLog$Slf4jMLogger$WarnLogger:WARNING) [c3p0] A PooledConnection that has already signalled a Connection error is still in use! [03/18/2016 19:06:34 GMT] (com.mchange.v2.log.slf4j.Slf4jMLog$Slf4jMLogger$WarnLogger:WARNING) [c3p0] Another error has occurred [ org.postgresql.util.PSQLException: This connection has been closed. ] which will not be reported to listeners! org.postgresql.util.PSQLException: This connection has been closed. at org.postgresql.jdbc.PgConnection.checkClosed(PgConnection.java:750) at org.postgresql.jdbc.PgConnection.getAutoCommit(PgConnection.java:722) at com.mchange.v2.c3p0.impl.NewProxyConnection.getAutoCommit(NewProxyConnection.java:1200) at org.hibernate.engine.jdbc.internal.LogicalConnectionImpl.isAutoCommit(LogicalConnectionImpl.java:392) at org.hibernate.engine.transaction.internal.TransactionCoordinatorImpl.afterNonTransactionalQuery(TransactionCoordinatorImpl.java:195) at org.hibernate.internal.SessionImpl.afterOperation(SessionImpl.java:597) at org.hibernate.internal.SessionImpl.listCustomQuery(SessionImpl.java:1789) at org.hibernate.internal.AbstractSessionImpl.list(AbstractSessionImpl.java:231) at org.hibernate.internal.SQLQueryImpl.list(SQLQueryImpl.java:156) at org.hibernate.internal.AbstractQueryImpl.uniqueResult(AbstractQueryImpl.java:905) at com.goldencode.p2j.persist.id.JdbcKeyRetriever.retrieveFreeKeys(JdbcKeyRetriever.java:183) at com.goldencode.p2j.persist.id.IdentityPool$KeyScanner.run(IdentityPool.java:576) at java.lang.Thread.run(Thread.java:745)
Followed by:
[03/18/2016 19:06:34 GMT] (com.goldencode.p2j.persist.id.IdentityPool$KeyScanner:SEVERE) Identity pool has been stopped because of an exception: org.hibernate.exception.JDBCConnectionException: could not inspect JDBC autocommit mode at org.hibernate.exception.internal.SQLStateConversionDelegate.convert(SQLStateConversionDelegate.java:131) at org.hibernate.exception.internal.StandardSQLExceptionConverter.convert(StandardSQLExceptionConverter.java:49) at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:125) at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:110) at org.hibernate.engine.jdbc.internal.LogicalConnectionImpl.isAutoCommit(LogicalConnectionImpl.java:395) at org.hibernate.engine.transaction.internal.TransactionCoordinatorImpl.afterNonTransactionalQuery(TransactionCoordinatorImpl.java:195) at org.hibernate.internal.SessionImpl.afterOperation(SessionImpl.java:597) at org.hibernate.internal.SessionImpl.listCustomQuery(SessionImpl.java:1789) at org.hibernate.internal.AbstractSessionImpl.list(AbstractSessionImpl.java:231) at org.hibernate.internal.SQLQueryImpl.list(SQLQueryImpl.java:156) at org.hibernate.internal.AbstractQueryImpl.uniqueResult(AbstractQueryImpl.java:905) at com.goldencode.p2j.persist.id.JdbcKeyRetriever.retrieveFreeKeys(JdbcKeyRetriever.java:183) at com.goldencode.p2j.persist.id.IdentityPool$KeyScanner.run(IdentityPool.java:576) at java.lang.Thread.run(Thread.java:745) Caused by: org.postgresql.util.PSQLException: This connection has been closed. at org.postgresql.jdbc.PgConnection.checkClosed(PgConnection.java:750) at org.postgresql.jdbc.PgConnection.getAutoCommit(PgConnection.java:722) at com.mchange.v2.c3p0.impl.NewProxyConnection.getAutoCommit(NewProxyConnection.java:1200) at org.hibernate.engine.jdbc.internal.LogicalConnectionImpl.isAutoCommit(LogicalConnectionImpl.java:392) at org.hibernate.engine.transaction.internal.TransactionCoordinatorImpl.afterNonTransactionalQuery(TransactionCoordinatorImpl.java:195) at org.hibernate.internal.SessionImpl.afterOperation(SessionImpl.java:597) at org.hibernate.internal.SessionImpl.listCustomQuery(SessionImpl.java:1789) at org.hibernate.internal.AbstractSessionImpl.list(AbstractSessionImpl.java:231) at org.hibernate.internal.SQLQueryImpl.list(SQLQueryImpl.java:156) at org.hibernate.internal.AbstractQueryImpl.uniqueResult(AbstractQueryImpl.java:905) at com.goldencode.p2j.persist.id.JdbcKeyRetriever.retrieveFreeKeys(JdbcKeyRetriever.java:183) at com.goldencode.p2j.persist.id.IdentityPool$KeyScanner.run(IdentityPool.java:576) at java.lang.Thread.run(Thread.java:745) [03/18/2016 19:06:34 GMT] (org.hibernate.engine.jdbc.spi.SqlExceptionHelper:WARN) SQL Error: 0, SQLState: 08006 [03/18/2016 19:06:34 GMT] (org.hibernate.engine.jdbc.spi.SqlExceptionHelper:ERROR) An I/O error occurred while sending to the backend. [03/18/2016 19:06:34 GMT] (com.goldencode.p2j.util.Agent:WARNING) Agent encountered an error while executing a command for appserver app_server com.goldencode.p2j.persist.DatabaseConnectionException: org.hibernate.exception.JDBCConnectionException: An I/O error occurred while sending to the backend. at com.goldencode.p2j.persist.DBUtils.handleException(DBUtils.java:593) at com.goldencode.p2j.persist.Persistence.scroll(Persistence.java:1472) at com.goldencode.p2j.persist.Persistence.scroll(Persistence.java:1410) at com.goldencode.p2j.persist.ProgressiveResults.getResults(ProgressiveResults.java:1036) at com.goldencode.p2j.persist.ProgressiveResults.getResults(ProgressiveResults.java:997) at com.goldencode.p2j.persist.ProgressiveResults.moveTo(ProgressiveResults.java:845) at com.goldencode.p2j.persist.ProgressiveResults.moveTo(ProgressiveResults.java:719) at com.goldencode.p2j.persist.ProgressiveResults.next(ProgressiveResults.java:334) at com.goldencode.p2j.persist.ResultsAdapter.next(ResultsAdapter.java:119) at com.goldencode.p2j.persist.AdaptiveQuery.next(AdaptiveQuery.java:1813) at com.goldencode.p2j.persist.CompoundQuery.processComponent(CompoundQuery.java:2345) at com.goldencode.p2j.persist.CompoundQuery.retrieveImpl(CompoundQuery.java:2042) [...]
Looking in postgresql logs:
2016-03-18 21:06:34 EET LOG: server process (PID 19634) was terminated by signal 11: Segmentation fault 2016-03-18 21:06:34 EET DETAIL: Failed process was running: select ((select count(id) from dvrepcat where id >= $1 and id <= $2) + (select count(id) from jprpstat where id >= $3 and id <= $4) + (select count(id) from... 2016-03-18 21:06:34 EET LOG: terminating any other active server processes 2016-03-18 21:06:34 EET WARNING: terminating connection because of crash of another server process 2016-03-18 21:06:34 EET DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2016-03-18 21:06:34 EET HINT: In a moment you should be able to reconnect to the database and repeat your command. 2016-03-18 21:06:34 EET WARNING: terminating connection because of crash of another server process 2016-03-18 21:06:34 EET DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2016-03-18 21:06:34 EET HINT: In a moment you should be able to reconnect to the database and repeat your command. 2016-03-18 21:06:34 EET FATAL: the database system is in recovery mode 2016-03-18 21:06:34 EET LOG: all server processes terminated; reinitializing 2016-03-18 21:06:34 EET FATAL: the database system is in recovery mode 2016-03-18 21:06:34 EET LOG: database system was interrupted; last known up at 2016-03-18 21:04:04 EET 2016-03-18 21:06:34 EET FATAL: the database system is in recovery mode 2016-03-18 21:06:35 EET FATAL: the database system is in recovery mode 2016-03-18 21:06:35 EET FATAL: the database system is in recovery mode 2016-03-18 21:06:35 EET FATAL: the database system is in recovery mode 2016-03-18 21:06:36 EET FATAL: the database system is in recovery mode 2016-03-18 21:06:36 EET FATAL: the database system is in recovery mode 2016-03-18 21:06:36 EET FATAL: the database system is in recovery mode 2016-03-18 21:06:37 EET FATAL: the database system is in recovery mode 2016-03-18 21:06:37 EET FATAL: the database system is in recovery mode 2016-03-18 21:06:37 EET FATAL: the database system is in recovery mode 2016-03-18 21:06:38 EET FATAL: the database system is in recovery mode 2016-03-18 21:06:38 EET FATAL: the database system is in recovery mode 2016-03-18 21:06:38 EET FATAL: the database system is in recovery mode 2016-03-18 21:06:39 EET FATAL: the database system is in recovery mode 2016-03-18 21:06:39 EET FATAL: the database system is in recovery mode 2016-03-18 21:06:39 EET FATAL: the database system is in recovery mode 2016-03-18 21:06:40 EET FATAL: the database system is in recovery mode 2016-03-18 21:06:40 EET FATAL: the database system is in recovery mode 2016-03-18 21:06:40 EET FATAL: the database system is in recovery mode 2016-03-18 21:06:41 EET FATAL: the database system is in recovery mode 2016-03-18 21:06:41 EET FATAL: the database system is in recovery mode 2016-03-18 21:06:41 EET FATAL: the database system is in recovery mode 2016-03-18 21:06:42 EET FATAL: the database system is in recovery mode 2016-03-18 21:06:42 EET FATAL: the database system is in recovery mode 2016-03-18 21:06:42 EET FATAL: the database system is in recovery mode 2016-03-18 21:06:43 EET FATAL: the database system is in recovery mode 2016-03-18 21:06:43 EET FATAL: the database system is in recovery mode 2016-03-18 21:06:43 EET FATAL: the database system is in recovery mode 2016-03-18 21:06:44 EET FATAL: the database system is in recovery mode 2016-03-18 21:06:44 EET FATAL: the database system is in recovery mode 2016-03-18 21:06:44 EET FATAL: the database system is in recovery mode 2016-03-18 21:06:45 EET LOG: database system was not properly shut down; automatic recovery in progress 2016-03-18 21:06:45 EET LOG: redo starts at 3A/86722F48 2016-03-18 21:06:45 EET FATAL: the database system is in recovery mode 2016-03-18 21:06:45 EET LOG: record with zero length at 3A/867BD618 2016-03-18 21:06:45 EET LOG: redo done at 3A/867BD5E8 2016-03-18 21:06:45 EET LOG: last completed transaction was at log time 2016-03-18 21:06:34.48673+02 2016-03-18 21:06:45 EET FATAL: the database system is in recovery mode 2016-03-18 21:06:45 EET FATAL: the database system is in recovery mode 2016-03-18 21:06:46 EET FATAL: the database system is in recovery mode 2016-03-18 21:06:46 EET FATAL: the database system is in recovery mode 2016-03-18 21:06:46 EET FATAL: the database system is in recovery mode 2016-03-18 21:06:47 EET LOG: autovacuum launcher started 2016-03-18 21:06:47 EET LOG: database system is ready to accept connections
I was not able to identify the process faulty PID 19634, but I think that it may be a (short-lived) intermediary process, may be a JVM helper for pl/java?
#2 Updated by Paul E about 8 years ago
Ovidiu - what is your DB version?
This looks like the IdentityPool scan problem I saw back when I upgraded to PostgreSQL 9.4.
#3 Updated by Ovidiu Maxiniuc about 8 years ago
I am still using PostgreSQL 9.2 Server.
I used the debugger and investigated the issue. As seen in postgresql log it was related to a P2J daemon thread that scans for unused keys. I cannot explain the exact mechanism, but after forcing it (tricking it that there are no more keys to scan) to end everything came back to normal. I am now able to run ETF normally.
#4 Updated by Paul E about 8 years ago
Was your trick to do the following in directory.xml?
<node class="container" name="identity_pool"> <node class="string" name="scan_method"> <node-attribute name="value" value="NONE"/> </node> </node>
in the server/standard/database/<db_name>
container?
LE: GES removed the database name in this entry.
#5 Updated by Ovidiu Maxiniuc about 8 years ago
Actually, no, not exactly. The identity pool method remained the default, JDBC
. But the things are related. That thread automatically quits when it finished scanning the whole database. But as long as it is alive it does some asynchronous queries that might overload PostgreSQL and made it crash. I don't have other explanation for the moment, I did not investigate further once it started working because of higher task priorities.