Project

General

Profile

Feature #4020

reduce work Hibernate does flushing session

Added by Eric Faulhaber about 5 years ago. Updated over 3 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Target version:
-
Start date:
Due date:
% Done:

0%

billable:
No
vendor_id:
GCD
version:

History

#1 Updated by Eric Faulhaber about 5 years ago

Each time Hibernate does something that relies on the state of the database to be consistent with changes in the state of the DMOs it manages, there is a chance it will perform a managed flush. A flush synchronizes these two states via DML statements executed on the database. For instance, a managed flush might be performed to insert, update, or delete records before a query is executed which relies on the data in a table being current, or before a transaction is committed.

The more objects present in the Hibernate session at the time of a managed flush, the more expensive it will be. For each managed object, Hibernate must determine if the state of any of its properties (i.e., columns/fields) have changed, and if so, it must execute an update statement. Or, if the object itself is transient (i.e., has not yet been saved as a record), it must perform an insert. Or, if the object has been deleted, it must delete the associated record in the database.

To minimize the number of statements it executes, and to execute them most efficiently, Hibernate does not execute statements to change the database the moment a DMO changes. Instead, it allows changes to coalesce for as long as possible in the object model, then collects a snapshot of the changes it absolutely must write to the database at the last possible moment. This way, it can potentially skip temporary changes which are superseded by later changes, and it can use SQL batching to execute the final set of statements most efficiently. The term Hibernate uses for this practice is "transparent write-behind", IIRC.

We already have done extensive work to minimize the expense of the managed flush and to defer it as long as possible. The first way we do this is to aggressively evict DMOs from the session if they are not actively in use (i.e., currently stored in a RecordBuffer instance).

The second way is to track which DMO types are dirty (i.e., have pending changes). By tracking this, we can avoid a flush if say, we are executing a query against a specific table or set of tables and we know there has been no change made to them in the current user's session since the last flush event, even if other tables may have changed in the interim.

Yet another way is to leverage hooks provided by Hibernate to assist it in its effort to check which DMOs are dirty. We have done this both by providing a custom SessionInterceptor and a CustomEntityDirtinessStrategy, each of which assists in different ways.

I think there is still room for improvement in this area.

DMOs are constantly being added to the Hibernate session. Whenever we execute a non-projection (i.e., full record) query using HQL, if a record is found, it is loaded into the Hibernate session. I have tried to find all these places and evict these objects immediately after their use if they are otherwise not needed, but I suspect there are still places to find some of these and clean them up.

Furthermore, with our SessionInterceptor and CustomEntityDirtinessStrategy implementations, we currently only tell Hibernate whether an object is definitely not dirty or whether it might be dirty. The former case allows Hibernate to short circuit any further attempts to detect change on its own, but in the latter case, we still leave Hibernate to do the grunt work of figuring out exactly what has changed in a particular record, if anything, by comparing a snapshot it took of the DMO when it was first associated with the session (either through creation by the application or retrieval from the database) with the DMO's current state. We have all this information already stored in various data structures throughout the persistence for undo purposes, but currently it is not in a form that is conveniently accessible to answer the question of "what changed since the last flush?". By rearranging this data in a way that it can be used both for undo purposes and for dirty checking, we might further alleviate Hibernate's load. This is a tricky proposition, because today the data is managed within transaction and sub-transaction scopes, not according to flush activity. However we do this, we have to be sure it is in such a way that it is actually less expensive for us to do it than Hibernate.

#2 Updated by Eric Faulhaber over 3 years ago

  • Status changed from New to Rejected

This task is moot with the removal of Hibernate in trunk rev 11348.

Also available in: Atom PDF