Feature #4690
re-integrate dirty share logic into new persistence implementation
0%
Related issues
History
#1 Updated by Eric Faulhaber almost 4 years ago
- Related to Feature #4011: database/persistence layer performance improvements added
#2 Updated by Eric Faulhaber almost 4 years ago
Opening a separate issue for this, as it is a substantial topic on its own...
4011a_dirty is the branch in use for this effort.
#3 Updated by Eric Faulhaber almost 4 years ago
Some new requirements have arisen for the dirty share feature set. Although we would like to eliminate this feature altogether for reasons of performance and maintainability, backward compatibility and the implementation of some quirky behavior of the 4GL require that we keep it around in the new persistence implementation. However, we want to be able to minimize its use as much as possible, such that it is only enabled for those tables for which business logic is written which absolutely requires these quirks to operate properly.
To minimize regressions when moving to the new persistence implementation, backward compatibility (i.e., enabling the feature as it was before) should be the default. Since we know there is a performance penalty associated with this, we need to do two things:
- provide a means to identify dependencies on the dirty share features in business logic, so that these dependencies can either be eliminated or accommodated by using the dirty share feature set;
- provide several modes of operation:
- all tables tracked for dirty changes;
- a subset of tables tracked for dirty changes;
- dirty share feature disabled entirely.
We can achieve the identification requirement through optional tracking of when the use of the dirty share feature has caused a different query result to be returned. As we currently only use the dirty share context to change the result of FINDs, it may make sense to add this tracking to RandomAccessQuery
, where the logic resides to determine whether "dirty" information should be used to override the result (or lack thereof) found in the primary database.
- disabled (meant for production use -- as close to 0 overhead as possible);
- lightly enabled (suitable for heavy testing -- limited data is captured, some overhead is ok);
- fully enabled (suitable for debugging -- maximum data is captured, may be considerable overhead).
The tracking would not necessarily be real-time logging to the server log. It might be more like the previous database statistics feature or the aspect-enabled profiling added in trunk rev 11323, where data is collected in memory and dumped in response to a signal sent to the server.
We can add more detail to this as we go. For now, I just wanted to record the high level requirements.
#4 Updated by Eric Faulhaber almost 4 years ago
I'm reading through the changes, and this is not a full review yet, but I have some general questions about the 4011a_dirty implementation (rev 11475-6):
- There are lots of new checks for
dirtyContext != null
. The purpose ofNopDirtyContext
was to avoid the need for these. Why has this changed? - Can you help me understand the mechanism to acquire a dirty share context instance? We had discussed changing the granularity of the decision to use the default context vs. the no-op context from database level to table level. It still seems to be driven by database, rather than by table.
- Why are we duplicating a lot of the dirty property tracking now done in
BaseRecord
inRecordBuffer
? I know it was inRecordBuffer
before (e.g.,dirtyProps
, etc.), but the goal was to centralize this in the DMO, rather than duplicate the bitsets inRecordBuffer
.
#5 Updated by Ovidiu Maxiniuc almost 4 years ago
Eric Faulhaber wrote:
I'm reading through the changes, and this is not a full review yet, but I have some general questions about the 4011a_dirty implementation (rev 11475-6):
- There are lots of new checks for
dirtyContext != null
. The purpose ofNopDirtyContext
was to avoid the need for these. Why has this changed?- Can you help me understand the mechanism to acquire a dirty share context instance? We had discussed changing the granularity of the decision to use the default context vs. the no-op context from database level to table level. It still seems to be driven by database, rather than by table.
A dirtyContext
is acquired using DirtyShareFactory.getContextInstance()
in two places: the RecordBuffers
for active usage and BufferManager$TxWrapper
for end of transaction cleanup. When a RecordBuffers
requests a context (line 6263), it will first check the dirtyRead
flag. When this is not active, a special Database
instance with PRIMARY_NON_DIRTY
type. The factory will return the true DirtyShareContext
only for a PRIMARY
, but not _TEMP database. As result, only the tables marked with dirtyRead
will get a (the same) context, while the others will get null
. While in former case we are aware of the tables and implicitly the useDirty
flag, the cleanup of TxWrapper
is done at a database level. It will receive the same context as all the dirty enabled record buffers.
The choice to use a null
context is due the performance. This might not be the most elegant OOP-wise but a simple test to null is faster than NOP calls. However, this is a minor issue, I can revert back to Nop class.
- Why are we duplicating a lot of the dirty property tracking now done in
BaseRecord
inRecordBuffer
? I know it was inRecordBuffer
before (e.g.,dirtyProps
, etc.), but the goal was to centralize this in the DMO, rather than duplicate the bitsets inRecordBuffer
.
The cumulativeDirtyProps
is meant help tracking changes from the last time the shareDirty()
was invoked.
The validationFailed
can be dropped as it is not used.
I think metadata
should remain in RecordBuffer
for fast access. Especially if there is no record in buffer, accessing it is costly. Maybe it should be set in constructor and be final
?
#6 Updated by Eric Faulhaber almost 4 years ago
Ovidiu Maxiniuc wrote:
The choice to use a
null
context is due the performance. This might not be the most elegant OOP-wise but a simple test to null is faster than NOP calls. However, this is a minor issue, I can revert back to Nop class.
No, that's a good reason.
- Why are we duplicating a lot of the dirty property tracking now done in
BaseRecord
inRecordBuffer
? I know it was inRecordBuffer
before (e.g.,dirtyProps
, etc.), but the goal was to centralize this in the DMO, rather than duplicate the bitsets inRecordBuffer
.The
cumulativeDirtyProps
is meant help tracking changes from the last time theshareDirty()
was invoked.
ThevalidationFailed
can be dropped as it is not used.
I thinkmetadata
should remain inRecordBuffer
for fast access. Especially if there is no record in buffer, accessing it is costly. Maybe it should be set in constructor and befinal
?
I think the initialization of metadata
is ok as written, but I do no feel strongly either way.
I must have misread the other bitset code on my first pass, because I thought we were tracking the dirty properties in RecordBuffer
directly, but on second look, I see it is the cumulative dirty properties. I am wondering, however, whether cumulativeDirtyProps
also belongs in BaseRecord
, because as I'm getting more into the flushing regressions, this information looks also to be useful for validation/flushing. However, don't change it yet.
Thanks for the other explanations.
I think the changes overall are ok, though it is a lot to digest to know if there are any regressions in there. How well tested is this change set? Is there a reason not to merge it back to 4011a? I want to consolidate the branch as soon as possible, so we are not managing this separately.
There is one thing I want to mention, not necessarily limited to this branch. I made a conscious decision to go a certain way for performance, but through a series of updates to 4011a, this has been unraveled. Namely, within the orm
package, I decided to use direct field access across classes for things like RecordMeta.uniqueIndices
, nonuniqueIndices
, etc. However, these uses are being rewritten to make these fields private and to route access through getter methods. I understand best practices and encapsulation, but at this low level where performance is critical, I don't see the advantage of adding virtual method calls over direct field access. If the field needs to be accessed from other packages, we can use getters, but I would like to stop the rewriting of direct field access within the orm
package. I realize it is not a lot of overhead being saved, but why add it at all, when the main purpose of 4011a is to improve performance?