Project

General

Profile

Bug #3896

improve performance of the _lock metadata implementation

Added by Eric Faulhaber over 5 years ago. Updated about 2 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Igor Skornyakov
Target version:
-
Start date:
Due date:
% Done:

100%

billable:
No
vendor_id:
GCD
case_num:
version:

_lock.diff Magnifier (43.3 KB) Igor Skornyakov, 03/24/2022 03:07 PM

_lock1.diff Magnifier (62.8 KB) Igor Skornyakov, 03/28/2022 06:53 AM

_lock2.diff Magnifier (63.5 KB) Igor Skornyakov, 04/08/2022 05:03 PM

_lock3.diff Magnifier (66.6 KB) Igor Skornyakov, 04/13/2022 04:15 AM

History

#1 Updated by Eric Faulhaber over 5 years ago

FWD's implementation of the _lock metadata is too slow. The current implementation updates an in-memory H2 database table with lock changes published by the lock manager. Although lock events are processed and the database is updated in a dedicated worker thread, so as not to slow down the user sessions which actually change locks' status, these events can pile up in an active system. Flushing them to the database constantly can consume server resources and can delay response time to a user session which actually needs to query this data (all events as of the time a query is made are flushed before the query is executed). In practical terms, this means a _lock query in an active system is almost guaranteed to produce a stale result by the time it completes.

The way I have been planning to improve the implementation for the _lock table is by maintaining a "current" state of in-memory data which reflects the same state as the H2 database table, plus a "diff" state of in-memory data that is updated/coalesced from lock events that come from the lock manager in user threads. This coalescing should be fast and thus will occur inline in the thread that published the lock status change.

Upon a flush triggered by a user query of the _lock table (as it is today), the diff data would be used with the current, in-memory data to execute SQL on the H2 database table. Upon successful flushing of the diff data to the database, the "current" state would be updated to again be in sync with the database.

While maintaining the in-memory current state and the H2 database itself seems redundant, the idea behind this approach (as opposed to an approach which eliminates the diff tracking and instead clears the database table and creates all records based on the current state) is to minimize the number of updates to the database. The assumption is that there will likely be some records that do not need to change, and these SQL statements can be avoided. Also, in a system that actually executes a lot of queries against the _lock table, it will be much more efficient than recreating the entire table's contents for each query.

I haven't worked out the exact design of the data structures for the current and diff in-memory data yet.

#2 Updated by Eric Faulhaber over 5 years ago

Another tweak on this idea is periodically (maybe once every minute or few minutes?) to flush the diff data to the database in the secondary thread we are using today for individual database updates. This would allow lock status change events (possibly many of them on a busy system) to coalesce before hitting the database, lessening the number of database updates needed. The assumption is that this would lighten the ongoing load on the database, while still allowing a relatively fast response time when a query does come in (because there is presumably less backlog to process to get the database sync'd for the query).

#3 Updated by Greg Shah over 5 years ago

Most applications will almost never access the _lock table. When it is used, generally it relates to a custom utility the developers wrote that is used to figure out who has a specific lock that is causing other users issues. As such, it is so rare that we should not be spending any time worrying about merging this into the database periodically.

Can't we just enhance our current lock manager data structures to maintain enough information such that the _lock table can be recreated in the very rare case that it is ever needed? The overhead for this should approach 0 while the rare case would be slower. That is a good trade off in my opinion.

#4 Updated by Eric Faulhaber over 5 years ago

Can't we just enhance our current lock manager data structures to maintain enough information such that the _lock table can be recreated in the very rare case that it is ever needed?

I explained above why I do not like this approach.

You may be right about the common uses of the _lock table in particular, but we are finding more VSTs that need to reflect changing runtime state in a queryable form, and we can't assume they all will always be for some secondary utility use. Some (e.g., _connect, _trans) are used at login or for other runtime purposes. With my approach above, I am trying to come up with a reasonable solution that can be a model for these. It may be that the periodic merge is overkill for the _lock case or even all cases, but that is just a tweak that involves running the sync code in a timer loop instead of applying it JIT, all-at-once.

#5 Updated by Greg Shah over 5 years ago

All of these cases involve resources that only ever exist in limited numbers.

  • transactions - one per user that is in an open transaction (I don't think this is also per database since there is only ever a single transaction in a Progress session)
  • connect - 0+ per user but always a small number (I think 7-8 is the largest we've seen)
  • lock - one per row that has been accessed via share or exclusive lock, only exists while in buffer scope

A large system might have 1000 users which would be an upper end of 1000 transactions, 5000-10000 connections. Even on such a system, the number of times this data is used is very rare. Also, isn't the _connect user-specific? If I recall, you can't see all the other user's connections so this one is in fact very small if I recall correctly. If _connect is accessed at login, it is still not being used often and simulating this with JIT approach will be very quick.

I like Ovidiu's idea of using a temp-table for these cases. Let the specific user's session take the hit for whatever usage is needed.

Locks can occur in higher numbers but as noted above they are the least likely to be used for a real production use case. Designing the general approach based on locks may not be a good match.

My opinion is that we should keep things as simple as possible while biasing them towards the cases that happen 99.99999% of the time.

#6 Updated by Eric Faulhaber over 5 years ago

Greg Shah wrote:

Also, isn't the _connect user-specific? If I recall, you can't see all the other user's connections so this one is in fact very small if I recall correctly.

That's the _my-connection table. The common query idiom we see is to find the _connect record where _connect.connect-usr = _my-connection._myconn-userid. You get 1 _connect record per user, per persistent database to which they're connected (including those that are auto-connected). Overwriting the entire _connect table in JIT fashion for every user login (plus other use cases -- e.g., on logout, cross-referencing with the _trans table) in a large system creates an unnecessary bottleneck.

My opinion is that we should keep things as simple as possible while biasing them towards the cases that happen 99.99999% of the time.

That's what I am doing :-)

#7 Updated by Igor Skornyakov about 3 years ago

Maybe it makes sense to implement the _Lock support in the same way it is done for the _UserTableStat? I mean keeping all data in memory and flush it to the real table only when an application tries to access it?
Of course, the efficiency of this approach depends on how often the real applications access _Lock.

#8 Updated by Greg Shah about 3 years ago

I don't think this is a radical idea. It is exactly what I've been suggesting for years. See #3896-3.

#9 Updated by Igor Skornyakov about 3 years ago

Greg Shah wrote:

I don't think this is a radical idea. It is exactly what I've been suggesting for years. See #3896-3.

Well, it is not difficult to implement based on my experience with _UserTableStat. It is even simpler since all updates are already in place. Maybe I've missed some places where UserTableStatUpdater.persist() should be invoked for data flush, but I believe that there are only at most one to two places where this call has to be added. The _Lock table persist should be invoked in the same places (with a check for a different table).

#10 Updated by Greg Shah about 3 years ago

This approach should be generalized. I think there are many other tables that would be better implemented this way.

#11 Updated by Igor Skornyakov about 3 years ago

Greg Shah wrote:

This approach should be generalized. I think there are many other tables that would be better implemented this way.

I agree. However, I believe that the general approach is better to implement by incremental refactoring when more tables will be supported since different tables may need (slightly) different support. It is difficult to suggest a general approach from the very beginning.

#12 Updated by Igor Skornyakov about 2 years ago

  • Assignee set to Igor Skornyakov
  • Status changed from New to WIP

#13 Updated by Igor Skornyakov about 2 years ago

Does it make sense to move LockTableUpdater.UpdateWorker.fileNums map to the MetadataManager (as a value of the map with the database as a key, like MetadataManager.FILE_NUM )?
This can potentially increase the memory consumption, but the map will be populated as a part of the _File table population and there will be no need for the database lookup on RecordLockEvent processing.
What do you think?
Thank you.

#14 Updated by Igor Skornyakov about 2 years ago

LockTableUpdater is re-worked in the same way as UserTableStatUpdater was re-worked some time ago. No separated threads for updating _Lock VST is required now.

The locks' data is kept in a map and is persisted to a _Lock VST only when the VST is accessed for read in a query.

For every lock we keep versions of the lock data in a map and in the meta database to avoid non-necessary databases access. We also keep "global" versions of the map and the _Lock VST to avoid non-necessary map iteration.

ConcurrentSkipListMap (a concurrent version of the TreeMap) is used to guarantee O(log(n)) access time. Albeit the ConcurrentHashMap has O(1) access time in typical cases the real access time can be as bad as O(n) in case of conflicts. Here n is the size of the map.

The changes need additional testing, but the core logic is in place.

Please review. The .diff is for 3821c/13684.
Thank you.

#15 Updated by Igor Skornyakov about 2 years ago

After fixing an obvious mistake with adding final to the Empty class declaration the smoke test of the large customer application do not reveal any problems.

I've realized that we need to add logic for cleaning the locks' map and the _Lock table.

I'm planning to use a SingleThreadExecutor shared by all instances of the LockTableUpdater for the background _Lock tables cleanup and add JMX instrumentation for monitoring the states of the map and the table and the LockTableUpdater activity.

#16 Updated by Greg Shah about 2 years ago

I'm planning to use a SingleThreadExecutor shared by all instances of the LockTableUpdater for the background _Lock tables cleanup

I prefer to avoid executors or Java thread pools. It adds hidden logic to manage threads which makes the code more complicated to support in my opinion. Please describe the threading model that is needed and what security context is required.

#17 Updated by Igor Skornyakov about 2 years ago

Greg Shah wrote:

I prefer to avoid executors or Java thread pools. It adds hidden logic to manage threads which makes the code more complicated to support in my opinion. Please describe the threading model that is needed and what security context is required.

The SingleThreadExecutor uses a single thread for its lifetime unless some submitted task causes this thread to terminate due to uncaught exception, which can be easily avoided. The advantage of using executor is that the core application code should not care about managing queues and supending/resuming this thread. Actually in this case we just use a single thread and a queue but with very simple interface. Of course we need to provide a ThreadFactory which will create this thread as daemon one.

I expect that instances of the LockTableUpdater will have an additional Persistence instance that will will be used by a task performing the cleanup.
What is unclear for my is if the thread used for a cleanup should be an instance of the AssociatedThread. The thread used in the UpdateWorker is such, but it seems to perform a job that is common for all active sessions working with a particular database.
Have I missed something?
Thank you.

#18 Updated by Igor Skornyakov about 2 years ago

Please note that the _Lock VST cleanup in a separate thread should be a rare event. It will happen only if the lock data was persisted because there was a read access to this VST while the lock is active and the lock was released before the next read.

#19 Updated by Igor Skornyakov about 2 years ago

Added JMX instrumentation and cleanup logic to LockTableUpdater.
However, I have a strange problem with finding persisted records in the _Lock VST when I need to update/delete them. It was not clear w/o JMX instrumentation.
Continue debugging...

#20 Updated by Igor Skornyakov about 2 years ago

  • File _lock1.diff added

Finished.
Please review.
Thank you.

#21 Updated by Igor Skornyakov about 2 years ago

  • File deleted (_lock1.diff)

#22 Updated by Igor Skornyakov about 2 years ago

Sorry, forgot to add new classes to the .diff file.

#23 Updated by Igor Skornyakov about 2 years ago

Eric,
Can you please review this? I cannot commit another changes e.g. for the DatabaseManager since these changes are not committed.
Thank you.

#24 Updated by Eric Faulhaber about 2 years ago

Igor Skornyakov wrote:

Eric,
Can you please review this? I cannot commit another changes e.g. for the DatabaseManager since these changes are not committed.
Thank you.

Is the latest patch still based on 3821c/13684?

#25 Updated by Igor Skornyakov about 2 years ago

Eric Faulhaber wrote:

Is the latest patch still based on 3821c/13684?

Please find the patch based on 3821c/13756 attached.

#26 Updated by Eric Faulhaber about 2 years ago

_lock2.diff patch code review:

Typo at FwdJMX:173: faild -> failed.

LockTableUpdater:

Please remove the unused imports:

import javax.management.loading.*;
import org.apache.jasper.tagplugins.jstl.core.*;

The class-level javadoc no longer represents the actual implementation. It needs to be rewritten entirely. The way we have chosen to implement this for performance is not obvious/intuitive, so please be very detailed in explaining the implementation and why it is done this way. Review the methods' javadoc as well and update these to reflect the actual implementation. For instance, the addEvent method still discusses the old implementation, which is confusing.

It seems that nothing invokes UpdateWorker.flush() anymore. I think it can be removed.

Please add missing javadoc to the LockRecId inner class.

UpdateWorker.doPersist() persists individual records to the meta database using the Persistence.save() API in a loop, with each operation in its own transaction. This still seems very inefficient. The original implementation either finds and updates or deletes a DMO, or creates and persists a new DMO through the Persistence.save() API. This has a lot of unnecessary overhead and is not ideal. This core functionality is still there.

However, refactoring this to use a more efficient, direct use of INSERT/UPDATE/DELETE statements in batches is quite a bit more work. Since this performance hit is only taken on a _Lock table query, and the main purpose of this task is to speed up the lock state changes, it is ok to leave this aspect alone for now. But we at least should combine multiple updates/inserts/deletes into a single database transaction. This should be a fairly easy change.

Query.java line 392 seems out of place:

         UserTableStatUpdater.persist(db);

Maybe a cut/paste error?

What testing has been done with this code?

#27 Updated by Igor Skornyakov about 2 years ago

Eric Faulhaber wrote:

_lock2.diff patch code review:

Typo at FwdJMX:173: faild -> failed.

LockTableUpdater:

Please remove the unused imports:

[...]

The class-level javadoc no longer represents the actual implementation. It needs to be rewritten entirely. The way we have chosen to implement this for performance is not obvious/intuitive, so please be very detailed in explaining the implementation and why it is done this way. Review the methods' javadoc as well and update these to reflect the actual implementation. For instance, the addEvent method still discusses the old implementation, which is confusing.

It seems that nothing invokes UpdateWorker.flush() anymore. I think it can be removed.

Please add missing javadoc to the LockRecId inner class.

UpdateWorker.doPersist() persists individual records to the meta database using the Persistence.save() API in a loop, with each operation in its own transaction. This still seems very inefficient. The original implementation either finds and updates or deletes a DMO, or creates and persists a new DMO through the Persistence.save() API. This has a lot of unnecessary overhead and is not ideal. This core functionality is still there.

However, refactoring this to use a more efficient, direct use of INSERT/UPDATE/DELETE statements in batches is quite a bit more work. Since this performance hit is only taken on a _Lock table query, and the main purpose of this task is to speed up the lock state changes, it is ok to leave this aspect alone for now. But we at least should combine multiple updates/inserts/deletes into a single database transaction. This should be a fairly easy change.

Query.java line 392 seems out of place:

[...]

Maybe a cut/paste error?

I see. Thank you. Will be done.

Regading the batch processing of persist operations.
I was thinking about this of course. Unfortunatelly it is not very simple change since now I work directly with the map and it is not a problem to keep map and database in sync in case of the database operation failures. For a batch the logic becomes more tricky. Please note however, that in real life we will need to persist only information about the currently active locks and I assume that there will be not very big number. In my test only a couple of records was persisted at a time while the number of the lock events was 3000+. Anyway, I will think how to improve the current logic.

What testing has been done with this code?

I was running a big customer application with a debugger and forced 'persist' operation via JMX. The meta database was accessible via TCP from the external database client so I was able check the correctness of the operations with the database. There is no need to test correctness of the 'persist' triggered by the _Lock queries since this logic is exactly the same as for the _UserTableStat VST which was thoroughly tested before.

#28 Updated by Eric Faulhaber about 2 years ago

Igor Skornyakov wrote:

[...]
Anyway, I will think how to improve the current logic.

As I noted, it is ok to limit the change to combining multiple updates into a single transaction. Let's leave the more complex changes for another time. I don't want to spend more time on this right now. The changes you've already made should get us most of the performance benefit we wanted from this task.

#29 Updated by Igor Skornyakov about 2 years ago

Eric Faulhaber wrote:

Igor Skornyakov wrote:

[...]
Anyway, I will think how to improve the current logic.

As I noted, it is ok to limit the change to combining multiple updates into a single transaction. Let's leave the more complex changes for another time. I don't want to spend more time on this right now. The changes you've already made should get us most of the performance benefit we wanted from this task.

OK. Thank you.
BTW. UpdateWorker.flush() is used by the JMX MBean.

#30 Updated by Eric Faulhaber about 2 years ago

  • Status changed from WIP to Review
  • % Done changed from 0 to 90

#31 Updated by Igor Skornyakov about 2 years ago

Eric Faulhaber wrote:

Igor Skornyakov wrote:

[...]
Anyway, I will think how to improve the current logic.

As I noted, it is ok to limit the change to combining multiple updates into a single transaction. Let's leave the more complex changes for another time. I don't want to spend more time on this right now. The changes you've already made should get us most of the performance benefit we wanted from this task.

The problem with combining multiple updates into a single transaction is that we need to implement commit/rollback for the lock data that are kept in the map to properly handle database errors. I've implemented this, but it is a little tricky and I prefer to review and test it with a "fresh" head tomorrow.

#32 Updated by Eric Faulhaber about 2 years ago

Igor Skornyakov wrote:

The problem with combining multiple updates into a single transaction is that we need to implement commit/rollback for the lock data that are kept in the map to properly handle database errors. I've implemented this, but it is a little tricky and I prefer to review and test it with a "fresh" head tomorrow.

Yes, good point. I guess the way I would implement this would be to do the database updates in one iteration through the map, but do not modify the map in that pass. Only if that succeeds, modify the map in a subsequent pass. That way, there is no need for a "rollback" per se of the map changes; they are only "committed" if the database updates first succeed in full.

#33 Updated by Igor Skornyakov about 2 years ago

Eric Faulhaber wrote:

Igor Skornyakov wrote:

The problem with combining multiple updates into a single transaction is that we need to implement commit/rollback for the lock data that are kept in the map to properly handle database errors. I've implemented this, but it is a little tricky and I prefer to review and test it with a "fresh" head tomorrow.

Yes, good point. I guess the way I would implement this would be to do the database updates in one iteration through the map, but do not modify the map in that pass. Only if that succeeds, modify the map in a subsequent pass. That way, there is no need for a "rollback" per se of the map changes; they are only "committed" if the database updates first succeed in full.

Yes, I've implemented something like this. But there are some subtle details such as VST records' counter. Not a rocket science, but requires attention. I'm sure I will provide a final version tomorrow.

#34 Updated by Igor Skornyakov about 2 years ago

Reworked based on the code review and tested with a large customer app.
Please review the final version (attached).

#35 Updated by Eric Faulhaber about 2 years ago

Code review _lock3.diff (applied to 3821c/13813):

1) I didn't notice this when I reviewed an earlier version of this update, but I am a bit concerned about this code in LockTableUpdater$UpdateWorker.addEvent:

            if (event.getNewType() == LockType.NONE )
            {
               MinimalLockImpl ml = locks.get(lrId);
               if (ml != null && ml.persisted.get() >= 0) 
               {
                  // lock was already persisted, VST cleanup is required
                  ml.guardedUpdate(event);
                  executor.execute(() -> {
                     try
                     {
                        if (ml.persist())
                        {
                           locks.remove(lrId);
                           deletesFromVST.incrementAndGet();
                           persistedLocks.decrementAndGet();
                        }
                        ...

Why are we hitting the VST on a state change event? Presumably, this only will happen after a persist was triggered by a previous query against the VST, but thereafter, the release of every lock that was recorded by that persist operation will cause its own transaction and database update. This might be dozens, hundreds, (thousands?) of locks in a typical application. The intention was that persist operations against the VST only occur in response to a query, not on lock state change events. Can't the state of NO-LOCK just be another state associated with a previously persisted record stored in the locks map, until the next, query-driven persist operation is needed?

2) What is the rationale behind the choice of ConcurrentSkipListMap for the LockTableUpdater$UpdateWorker.locks variable? I'm not suggesting this is a problem, I just want to understand the choice.

3) Why are we running

         executor.execute(() -> {}); // start executor

in the LockTableUpdater$UpdateWorker c'tor? It seems like this will allocate a daemon thread that does nothing.

4) Are the calls to persistence.lock still needed in LockTableUpdater$UpdateWorker.createRecord, findRecord, and doPersist? I think we originally used them as concurrency management, but with the read/write guards you are using now, are they still necessary, or just unneeded overhead? Please determine whether it is possible in the 4GL to acquire a SHARE or EXCLUSIVE lock on a record in the _Lock VST itself. I would think not, since that seems like it would lead to circular references in the VST, but I could be wrong. If it is not possible, then assuming the new read/write protection is sufficient, I think we should remove these persistence.lock calls.

#36 Updated by Igor Skornyakov about 2 years ago

Eric Faulhaber wrote:

Code review _lock3.diff (applied to 3821c/13813):

1) I didn't notice this when I reviewed an earlier version of this update, but I am a bit concerned about this code in LockTableUpdater$UpdateWorker.addEvent:

[...]

Why are we hitting the VST on a state change event? Presumably, this only will happen after a persist was triggered by a previous query against the VST, but thereafter, the release of every lock that was recorded by that persist operation will cause its own transaction and database update. This might be dozens, hundreds, (thousands?) of locks in a typical application. The intention was that persist operations against the VST only occur in response to a query, not on lock state change events. Can't the state of NO-LOCK just be another state associated with a previously persisted record stored in the locks map, until the next, query-driven persist operation is needed?

We are not hitting the VST immeditatelly. We enqueue the corresponding task bor a background cleanup, but only if the previous state of the lock was persisted. This allows us to remove the corresponding entry from the map without waiting for a next query against the _Lock table (which can never happen). This allows us to keep both size of the map and the VST small. It is possible of course to wait for a next query against the _Lock VST and keep information of the obsolete locks until this moment, but the larger is the map the more expensive are the operations with it.

2) What is the rationale behind the choice of ConcurrentSkipListMap for the LockTableUpdater$UpdateWorker.locks variable? I'm not suggesting this is a problem, I just want to understand the choice.

As I mentioned in #3896-14, I decided to use ConcurrentSkipListMap (a concurrent version of the TreeMap) is used to guarantee O(log(n)) access time. Albeit the ConcurrentHashMap has O(1) access time in typical cases the real access time can be as bad as O(n) in case of conflicts. Here n is the size of the map. In addition this version of the Map does not require resizing of the underlying array which is an expensive and stop-the-world operation. I've performed a through compararision of these two type of map some time ago when I was working on a telecom project with near real time performance requirements and the tests demostrated that ConcurrentSkipListMap demonstrates a more stable performance.

3) Why are we running

[...]

in the LockTableUpdater$UpdateWorker c'tor? It seems like this will allocate a daemon thread that does nothing.

This is because we want to create a correct instance of the AssociatedThread. Without pre-allocating thread the persist call initiated via JMX fails.

4) Are the calls to persistence.lock still needed in LockTableUpdater$UpdateWorker.createRecord, findRecord, and doPersist? I think we originally used them as concurrency management, but with the read/write guards you are using now, are they still necessary, or just unneeded overhead? Please determine whether it is possible in the 4GL to acquire a SHARE or EXCLUSIVE lock on a record in the _Lock VST itself. I would think not, since that seems like it would lead to circular references in the VST, but I could be wrong. If it is not possible, then assuming the new read/write protection is sufficient, I think we should remove these persistence.lock calls.

To be honest, I've just re-used the existing code here. Most likely you're right and this is not required. I will check this tomorrow.

#37 Updated by Igor Skornyakov about 2 years ago

I see some strange results with a simple 4GL test.
Consider the following program:

DEF var u as int no-undo init 0.
DEF var l as int no-undo init 0.
OUTPUT TO locks.txt.
FOR EACH item EXCLUSIVE-LOCK:
    u = u + 1.
    l = 0.
    FOR EACH _Lock NO-LOCK:
       message u l.
       export _Lock.
       l = l + 1.
    END.
END.
OUTPUT CLOSE.

Here item is a table with a single record.
The output shows 8192 records with all fields have UNKNOWN value. The the output is the same if we replace EXCLUSIVE-LOCK with NO-LOCK.

I'm totally confused.
Do we have a test where a meaningful information is retrieved from the _Lock VST?
Thank you.

#38 Updated by Eric Faulhaber about 2 years ago

Igor Skornyakov wrote:

I see some strange results with a simple 4GL test.
Consider the following program:
[...]
Here item is a table with a single record.
The output shows 8192 records with all fields have UNKNOWN value. The the output is the same if we replace EXCLUSIVE-LOCK with NO-LOCK.

Well, the block nesting of the test seems a little odd to me, but it should work. You must be connecting to the database in single-user mode; I guess the lock behavior in single-user mode is undefined, since record locking doesn't have much meaning with a single user. Make sure you connect in multi-user mode before you run your test.

#39 Updated by Eric Faulhaber about 2 years ago

Igor Skornyakov wrote:

Eric Faulhaber wrote:

Why are we hitting the VST on a state change event? Presumably, this only will happen after a persist was triggered by a previous query against the VST, but thereafter, the release of every lock that was recorded by that persist operation will cause its own transaction and database update. This might be dozens, hundreds, (thousands?) of locks in a typical application. The intention was that persist operations against the VST only occur in response to a query, not on lock state change events. Can't the state of NO-LOCK just be another state associated with a previously persisted record stored in the locks map, until the next, query-driven persist operation is needed?

We are not hitting the VST immeditatelly. We enqueue the corresponding task bor a background cleanup, but only if the previous state of the lock was persisted.

I understand this and noted it in my question.

This allows us to remove the corresponding entry from the map without waiting for a next query against the _Lock table (which can never happen). This allows us to keep both size of the map and the VST small. It is possible of course to wait for a next query against the _Lock VST and keep information of the obsolete locks until this moment, but the larger is the map the more expensive are the operations with it.

I agree that we want to keep the map small and I agree background cleanup is a good way to ensure this. However, it seems that the way it is written currently, the background cleanup itself can become resource intensive in a busy system. This is because each delete from the VST is its own separate operation, in its own transaction. There is no batching of these events. In practice, each will occur individually, as soon as its executor thread gets some CPU cycles (these are each, separate, short-lived threads, right?).

I am concerned that this can have a performance profile not too different than the old implementation, but only under certain circumstances; those being:

  1. many locks are acquired by multiple users through normal system use;
  2. queries are made periodically against the _Lock VST, causing all that state to be persisted;
  3. locks are released through normal operation of the system.

In this scenario, the lock release for each record previously persisted hits the VST in its own transaction, until all those locks are released.

My concern is not based purely on imagination. While load testing at one customer with the old implementation, we had only 15-20 active users hitting the system hard for less than an hour. We saw the lock updater thread running at 100% CPU throughout testing, and then for another 15+ minutes after all other activity had stopped. The load this put on the server (considering the relative importance of this one, little-used feature) was a problem.

Now, I realize this implementation overall is much more efficient than the old one AND it would be an uncommon case that an application regularly queries the VST and forces persistence of the in-memory state AND even if we got into this scenario, it still is not fully the equivalent of the load imposed by the old implementation. So, maybe this is nothing to worry about.

Nevertheless, the above scenario could happen (who knows how developers have instrumented some kind of lock monitoring logic into their applications?). I don't propose we rework this now, as it seems it would be a fairly invasive change to the implementation. I want to get the benefit of the new improvements into the code base, and we don't know how much of a load this would actually put on the system until we stress test. But it is a departure from the original intent and I am a bit concerned about this one scenario getting us into trouble eventually.

#40 Updated by Igor Skornyakov about 2 years ago

Eric Faulhaber wrote:

Well, the block nesting of the test seems a little odd to me, but it should work. You must be connecting to the database in single-user mode; I guess the lock behavior in single-user mode is undefined, since record locking doesn't have much meaning with a single user. Make sure you connect in multi-user mode before you run your test.

Oh, I see. Thank you!

#41 Updated by Igor Skornyakov about 2 years ago

Eric Faulhaber wrote:

Nevertheless, the above scenario could happen (who knows how developers have instrumented some kind of lock monitoring logic into their applications?). I don't propose we rework this now, as it seems it would be a fairly invasive change to the implementation. I want to get the benefit of the new improvements into the code base, and we don't know how much of a load this would actually put on the system until we stress test. But it is a departure from the original intent and I am a bit concerned about this one scenario getting us into trouble eventually.

I understand your concern. Actually I was thinking about performing cleanup in batches but could not suggest a good solution. However right now I've recalled a trick I've invented long time ago for a similar situation. As far as I remember the implementation is pretty simple. Will try to implement it.
The idea is to have two (configurable?) parameters - a size of batch and a max wait time. We collect pended operation until either a batch size is reached or a max wait time is expired. After that we submit the batch. The max wait time parameter guarantees that we will not wait for the batch start forever (or too long).

#42 Updated by Greg Shah about 2 years ago

Are we really, really sure that this incremental maintenance approach is needed? The _lock data is expected to be infrequently used and is not expected to be performance critical. Real end user code does not access that table. We can easily just drop and recreate the table completely whenever it is queried. It seems to me that:

  • The incremental approach is significantly more complicated than just recreating the table contents from scratch off the current in-memory lock data whenever it is queried.
  • The incremental approach requires constant maintenance which takes real CPU cycles and seems completely unnecessary for 99.99999% of the time.

I suspect that _lock is only used for diagnostic purposes and under the most common case it is probably never queried during the lifetime of the running application. We are twisting ourselves into a pretzel to optimize for something that does not matter. Can we please take a simple and fast approach to this?

#43 Updated by Igor Skornyakov about 2 years ago

Greg Shah wrote:

Are we really, really sure that this incremental maintenance approach is needed? The _lock data is expected to be infrequently used and is not expected to be performance critical. Real end user code does not access that table. We can easily just drop and recreate the table completely whenever it is queried. It seems to me that:

  • The incremental approach is significantly more complicated than just recreating the table contents from scratch off the current in-memory lock data whenever it is queried.
  • The incremental approach requires constant maintenance which takes real CPU cycles and seems completely unnecessary for 99.99999% of the time.

I suspect that _lock is only used for diagnostic purposes and under the most common case it is probably never queried during the lifetime of the running application. We are twisting ourselves into a pretzel to optimize for something that does not matter. Can we please take a simple and fast approach to this?

In fact, I'm not really sure that it is really required, at least at this moment. After all, it is possible to see (via JMX) how intensive is the background cleanup in real life. So, albeit I've recalled the details of the trick I've mentioned before for the batch cleanup, and it is not a big deal to implement it, this will make the logic more complicated.
Maybe it makes sense to commit what we have now and implement the optimization when it will be clear that it will have a significant positive effect in real life?
What do you think?
Thank you,

#44 Updated by Greg Shah about 2 years ago

Eric controls this decision.

I am not suggesting leaving the code as you have written it. I am suggesting that the only time we ever insert into the _lock table would be at the moment that some converted 4GL code does query it. At that moment, we should just create a fresh instance of the table for the session and fill it from the in-memory lock state. I would create a user-specific instance of the table. It does not matter if that is slow. Any background maintenance of the table is too much CPU usage in my opinion.

#45 Updated by Igor Skornyakov about 2 years ago

Greg Shah wrote:

Eric controls this decision.

I am not suggesting leaving the code as you have written it. I am suggesting that the only time we ever insert into the _lock table would be at the moment that some converted 4GL code does query it. At that moment, we should just create a fresh instance of the table for the session and fill it from the in-memory lock state. I would create a user-specific instance of the table. It does not matter if that is slow. Any background maintenance of the table is too much CPU usage in my opinion.

I'm not sure how it will work if two clients will access the _Lock table concurrently. I also don't understand now how easy is to implement user-specific versions of this table, if I understand your idea correctly.

After all, if we expect that user access to the _Lock table will be rare, then the cost of the background maintenance of it will be negligible.

#46 Updated by Greg Shah about 2 years ago

After all, if we expect that user access to the _Lock table will be rare, then the cost of the background maintenance of it will be negligible.

Is the background maintenance done if the _lock table is never accessed? That would be an improvement over the current approach.

It seems to me that most maintenance is wasted CPU cycles. It is likely that even when queried, the _lock table will be checked infrequently. But in a normal system locks are acquired and released very frequently. In a busy system, in the rare event that the table is queried, we may have been maintaining millions of records in that table when only thousands of inserts would be needed at the moment of the query. This "Just In Time" (JIT) approach to building the table will almost always be significantly less work than maintaining everything all the time.

#47 Updated by Igor Skornyakov about 2 years ago

Greg Shah wrote:

After all, if we expect that user access to the _Lock table will be rare, then the cost of the background maintenance of it will be negligible.

Is the background maintenance done if the _lock table is never accessed? That would be an improvement over the current approach.

No, in this case only the map will be updated. Moreover, we submit a clean task only if we see that the lock data was persisted. It is unclear what will be the typical number of locks that will be persisted in this situation, but after these locks will be released and the cleanup will be performed, there will be zero mainatance activity until the next access to the _Lock.

It seems to me that most maintenance is wasted CPU cycles. It is likely that even when queried, the _lock table will be checked infrequently. But in a normal system locks are acquired and released very frequently. In a busy system, in the rare event that the table is queried, we may have been maintaining millions of records in that table when only thousands of inserts would be needed at the moment of the query. This "Just In Time" (JIT) approach to building the table will almost always be significantly less work than maintaining everything all the time.

This is correct. But, as I've mentioned before the background maintenance will happen only after the the access to the _Lock table. And I still not not understand how exactly this JIT approach will work to correctly support concurrent access to the VST.

#48 Updated by Greg Shah about 2 years ago

Moreover, we submit a clean task only if we see that the lock data was persisted.

What do you mean by "persisted"? Are you talking about converted 4GL code that edits the _lock table?

#49 Updated by Igor Skornyakov about 2 years ago

Greg Shah wrote:

Moreover, we submit a clean task only if we see that the lock data was persisted.

What do you mean by "persisted"? Are you talking about converted 4GL code that edits the _lock table?

As far as I understand 4GL code cannot modify this VST directly. By "persisted" I mean the update of this table by the LockTableUpdater based on the map content right before the converted 4GL code reads it. At this moment the content of the map and of the VST will be synchronized.
When the persisted locks will be released the VST will be updated by background tasks (at this moment this will be done one by one) to reduce both the size of the map and the VST.

#50 Updated by Greg Shah about 2 years ago

The moment any access to the _lock table occurs, we start expending CPU cycles maintaining that table. Do I understand correctly?

Does that maintenance end at some point? If so, what causes this maintenance to end?

#51 Updated by Igor Skornyakov about 2 years ago

Greg Shah wrote:

The moment any access to the _lock table occurs, we start expending CPU cycles maintaining that table. Do I understand correctly?

This is correct, well almost. The actual maintenance starts when the first lock persisted because of the access is released.

Does that maintenance end at some point? If so, what causes this maintenance to end?

The maintenance ends when the last lock persisted because of the access is released and removed from the VST.

#52 Updated by Greg Shah about 2 years ago

The actual maintenance starts when the first lock persisted because of the access is released.

By "access is released", do you mean that the lock is released by whatever converted 4GL code was previously holding it as a share or exclusive lock?

The maintenance ends when the last lock persisted because of the access is released and removed from the VST.

The system is constantly adding new locks and changing the state of locks. How can the maintenance ever really end?

#53 Updated by Igor Skornyakov about 2 years ago

Greg Shah wrote:

The actual maintenance starts when the first lock persisted because of the access is released.

By "access is released", do you mean that the lock is released by whatever converted 4GL code was previously holding it as a share or exclusive lock?

This is correct.

The maintenance ends when the last lock persisted because of the access is released and removed from the VST.

The system is constantly adding new locks and changing the state of locks. How can the maintenance ever really end?

Indeed, the locks are constantly added and released, but until the application (converted 4GL code) does not explicitly accesses the _Lock VST only the map is updated and there is no need for a VST maintenance since it is not affected.

Actually, what Eric suggests (the batch VST cleanup) can reduce the duration of the maintenance periods and the corresponding overhead.

#54 Updated by Greg Shah about 2 years ago

The moment that the _lock table is ever accessed by converted 4GL code, maintenance starts. At that point, the _lock table will be maintained until the server is shutdown.

Is that correct? That would mean that we bear the cost of maintenance for some indefinite time just because someone decided to run some tool to look at the locks. This is like the Heisenberg uncertainty principle except it hurts performance instead of destroying information.

#55 Updated by Eric Faulhaber about 2 years ago

Igor Skornyakov wrote:

Actually, what Eric suggests (the batch VST cleanup) can reduce the duration of the maintenance periods and the corresponding overhead.

As noted earlier, I don't want to address this unless we find it's needed. Let's go with the approach as you've currently written it. Under most circumstances, we should be working only with the map.

If/once you feel you have sufficiently tested it, please commit it to 3821c.

#56 Updated by Igor Skornyakov about 2 years ago

Greg Shah wrote:

The moment that the _lock table is ever accessed by converted 4GL code, maintenance starts. At that point, the _lock table will be maintained until the server is shutdown.

Is that correct? That would mean that we bear the cost of maintenance for some indefinite time just because someone decided to run some tool to look at the locks. This is like the Heisenberg uncertainty principle except it hurts performance instead of destroying information.

This is not correct. After the last persisted lock will be released the corresponding record will be removed from the VST, there will be no maintenance overhead at all because the executor used for this will be idle until the next explicit access to the VST.

#57 Updated by Igor Skornyakov about 2 years ago

  • % Done changed from 90 to 100

Eric Faulhaber wrote:

As noted earlier, I don't want to address this unless we find it's needed. Let's go with the approach as you've currently written it. Under most circumstances, we should be working only with the map.

If/once you feel you have sufficiently tested it, please commit it to 3821c.

Thank you.
Committed to 3821c/13818.

#58 Updated by Greg Shah about 2 years ago

Is this complete?

#59 Updated by Igor Skornyakov about 2 years ago

Greg Shah wrote:

Is this complete?

I think so.

#60 Updated by Eric Faulhaber about 2 years ago

  • Status changed from Review to Closed

#61 Updated by Constantin Asofiei about 2 years ago

Igor, please test what happens if you try to lock a meta _user record or do an explicit CREATE on the _user table.

There is this abend when doing a create dictdb._user.:

ava.lang.IllegalStateException: Table 'meta_user' not found in metadata [Lock event #151: ca meta_user:280004; NONE->SHARE]
    at com.goldencode.p2j.persist.meta.LockTableUpdater$UpdateWorker.getFileNum(LockTableUpdater.java:559) ~[p2j.jar:4.0.0_undefined_undefined_undefined]
    at com.goldencode.p2j.persist.meta.LockTableUpdater$UpdateWorker.access$500(LockTableUpdater.java:241) ~[p2j.jar:4.0.0_undefined_undefined_undefined]
    at com.goldencode.p2j.persist.meta.LockTableUpdater$UpdateWorker$MinimalLockImpl.<init>(LockTableUpdater.java:686) ~[p2j.jar:4.0.0_undefined_undefined_undefined]
    at com.goldencode.p2j.persist.meta.LockTableUpdater$UpdateWorker.lambda$addEvent$4(LockTableUpdater.java:509) ~[p2j.jar:4.0.0_undefined_undefined_undefined]
    at java.util.concurrent.ConcurrentSkipListMap.computeIfAbsent(ConcurrentSkipListMap.java:1699) ~[na:1.8.0_312]
    at com.goldencode.p2j.persist.meta.LockTableUpdater$UpdateWorker.addEvent(LockTableUpdater.java:509) ~[p2j.jar:4.0.0_undefined_undefined_undefined]
    at com.goldencode.p2j.persist.meta.LockTableUpdater.lockChanged(LockTableUpdater.java:223) ~[p2j.jar:4.0.0_undefined_undefined_undefined]
    at com.goldencode.p2j.persist.lock.InMemoryLockManager.notifyLockChange(InMemoryLockManager.java:1511) ~[p2j.jar:4.0.0_undefined_undefined_undefined]
    at com.goldencode.p2j.persist.lock.InMemoryLockManager.lockImpl(InMemoryLockManager.java:1210) ~[p2j.jar:4.0.0_undefined_undefined_undefined]
    at com.goldencode.p2j.persist.lock.InMemoryLockManager.lock(InMemoryLockManager.java:592) ~[p2j.jar:4.0.0_undefined_undefined_undefined]
    at com.goldencode.p2j.persist.Persistence.lock(Persistence.java:2427) ~[p2j.jar:4.0.0_undefined_undefined_undefined]
    at com.goldencode.p2j.persist.lock.RecordLockContext$Perm.lock(RecordLockContext.java:504) ~[p2j.jar:4.0.0_undefined_undefined_undefined]
    at com.goldencode.p2j.persist.RecordBuffer.create(RecordBuffer.java:9757) ~[p2j.jar:4.0.0_undefined_undefined_undefined]
    at com.goldencode.p2j.persist.BufferImpl.create(BufferImpl.java:1411) ~[p2j.jar:4.0.0_undefined_undefined_undefined]

#62 Updated by Igor Skornyakov about 2 years ago

Constantin Asofiei wrote:

Igor, please test what happens if you try to lock a meta _user record or do an explicit CREATE on the _user table.

There is this abend when doing a create dictdb._user.:
[...]

Constantin,
If you're talking about my changes in the LockTableUpdater, there is not reason to access the _user table in any way here. The code works with lock events and accesses only _lock table.
On the other side the error looks strange because LockTableUpdater uses the map created by the MetadataManager on the metadata initialization.

#63 Updated by Constantin Asofiei about 2 years ago

Igor, most likely the abend is from the LockTableUpdater changes - please take a look, this abend is seen in a customer application.

Note that the _user meta table can be used in a CREATE and records in this table can have EXPLICIT-LOCK, this table is not read-only in the application's point-of-view.

#64 Updated by Igor Skornyakov about 2 years ago

Constantin Asofiei wrote:

Igor, most likely the abend is from the LockTableUpdater changes - please take a look, this abend is seen in a customer application.

Note that the _user meta table can be used in a CREATE and records in this table can have EXPLICIT-LOCK, this table is not read-only in the application's point-of-view.

OK. I will check. This can be a problem with MetadataManager or with incorrect target on the lock event (wrong database), not with the _LockTableUpdater itself.
It will help a lot if I will be able to run the application in question and reproduce the problem with a debugger.
Is it possible?
Thank you.

#65 Updated by Constantin Asofiei about 2 years ago

In a standalone test use a create _user. and a find first _user exclusive-lock.

#66 Updated by Igor Skornyakov about 2 years ago

Constantin Asofiei wrote:

In a standalone test use a create _user. and a find first _user exclusive-lock.

OK. Thank you.

#67 Updated by Igor Skornyakov about 2 years ago

Indeed, the problem is that meta_user is not included neither to FILE_NUM nor to FILE_NUM_BY_TBL_NAME maps in the MetadatManager.
Working on the fix.

#68 Updated by Constantin Asofiei about 2 years ago

Igor Skornyakov wrote:

Indeed, the problem is that meta_user is not included neither to FILE_NUM nor to FILE_NUM_BY_TBL_NAME maps in the MetadatManager.
Working on the fix.

What other meta tables can be used with CREATE, EXCLUSIVE-LOCK or SHARE-LOCK? I mean, maybe all meta tables should be included in those maps.

#69 Updated by Igor Skornyakov about 2 years ago

Constantin Asofiei wrote:

Igor Skornyakov wrote:

Indeed, the problem is that meta_user is not included neither to FILE_NUM nor to FILE_NUM_BY_TBL_NAME maps in the MetadatManager.
Working on the fix.

What other meta tables can be used with CREATE, EXCLUSIVE-LOCK or SHARE-LOCK? I mean, maybe all meta tables should be included in those maps.

The problem is that meta tables are very poorly documented and it is difficult to answer this question. However the _User and _SEC_AUTHENTICATION_* tables are "special" because they are part of the primary database, not meta. We define _SEC_AUTHENTICATION_* in the primary database and they are processed by the MetadataManager, but _User is defined the standard.df.
If I understand correctly LockTableUpdater is not created for the meta database, so I suggest to fix the issue with _User at this moment and postpone the question about other meta tables as a separate task.

#70 Updated by Constantin Asofiei about 2 years ago

Igor Skornyakov wrote:

If I understand correctly LockTableUpdater is not created for the meta database, so I suggest to fix the issue with _User at this moment

For now, I think any meta table should not go through LockTableUpdater.

... and postpone the question about other meta tables as a separate task.

Another question is: does the _user records (or any other meta records) locks appear as records in the meta _lock table?

#71 Updated by Igor Skornyakov about 2 years ago

Constantin Asofiei wrote:

Igor Skornyakov wrote:

If I understand correctly LockTableUpdater is not created for the meta database, so I suggest to fix the issue with _User at this moment

For now, I think any meta table should not go through LockTableUpdater.

AFAIK FWD creates a separate instance of the LockTableUpdater for all primary databases, but not for the meta databases. So I do not understand what you mean exactly.

... and postpone the question about other meta tables as a separate task.

Another question is: does the _user records (or any other meta records) locks appear as records in the meta _lock table?

This is a good question, but it is not directly related to my optimization which affects only the _Lock table management and processes lock events regardless of their source. Maybe it makes sense to create a separate task for this?
Eric, what do you think?
Thank you.

#72 Updated by Constantin Asofiei about 2 years ago

Igor Skornyakov wrote:

Constantin Asofiei wrote:

Igor Skornyakov wrote:

If I understand correctly LockTableUpdater is not created for the meta database, so I suggest to fix the issue with _User at this moment

For now, I think any meta table should not go through LockTableUpdater.

AFAIK FWD creates a separate instance of the LockTableUpdater for all primary databases, but not for the meta databases. So I do not understand what you mean exactly.

What I meant is for LockTableUpdater.lockChanged to do nothing and log a WARNING if the event is for a meta table. The fix will be (if needed) in a separate task, because this may be not just about the _user table.

#73 Updated by Igor Skornyakov about 2 years ago

Constantin Asofiei wrote:

What I meant is for LockTableUpdater.lockChanged to do nothing and log a WARNING if the event is for a meta table. The fix will be (if needed) in a separate task, because this may be not just about the _user table.

Got I. Thank you.
So, at this moment I will just fix MetadataManager by adding _User table to the internal maps for the primary database.

#74 Updated by Igor Skornyakov about 2 years ago

Constantin Asofiei wrote:

Igor, please test what happens if you try to lock a meta _user record or do an explicit CREATE on the _user table.

There is this abend when doing a create dictdb._user.:
[...]

Fixed in 3821c/13854.

#75 Updated by Constantin Asofiei about 2 years ago

Igor Skornyakov wrote:

Constantin Asofiei wrote:

Igor, please test what happens if you try to lock a meta _user record or do an explicit CREATE on the _user table.

There is this abend when doing a create dictdb._user.:
[...]

Fixed in 3821c/13854.

Thanks, the regression is fixed.

#76 Updated by Igor Skornyakov about 2 years ago

Fixed getLockTableUpdater for meta tables.
Committed to 3821c/13875.

Also available in: Atom PDF