Feature #3758
misc database features
100%
Related issues
History
#1 Updated by Greg Shah over 5 years ago
Functions
- DBTASKID
- RECORD-LENGTH
Attributes
- SERIALIZE-HIDDEN
Methods
- BUFFER:FIND-CURRENT
- BUFFER/TT:READ-JSON
- BUFFER/TT:WRITE-JSON
- QUERY:FIRST-OF
- QUERY:LAST-OF
#2 Updated by Eric Faulhaber over 5 years ago
- Status changed from New to WIP
- Assignee set to Eric Faulhaber
Created task branch 3758a.
#3 Updated by Greg Shah over 5 years ago
- Related to Feature #3813: misc DB features part deux added
#4 Updated by Eric Faulhaber over 5 years ago
- % Done changed from 0 to 30
Task branch 3758b contained conversion support and partial/stubbed runtime support for DBTASKID, RECORD-LENGTH builtin functions; FIND-CURRENT, READ/WRITE-JSON methods. It passed conversion testing, was merged to trunk, and committed as revision 11296. Runtime testing was not performed, since the changes were additive only.
#5 Updated by Eric Faulhaber over 5 years ago
The SERIALIZE-HIDDEN attribute runtime support was not implemented at this time. See #3750-41.
#6 Updated by Greg Shah over 5 years ago
Eric: Please summarize what needs to be done for this task.
#7 Updated by Eric Faulhaber over 5 years ago
Greg Shah wrote:
Eric: Please summarize what needs to be done for this task.
DBTASKID Function
Full runtime implementation is needed. Several variants of ConnectionManager.dbTaskId
are stubbed, currently call UnimplementedFeature.missing
method. Currently, there is no infrastructure to track a current transaction identifier, so TransactionManager
updates will be needed. Since the use of this function should be very rare, the overhead of maintaining a transaction identifier must absolutely be minimized.
RECORD-LENGTH Function
Full runtime implementation is needed. BufferImpl.recordLength
is stubbed, currently calls UnimplementedFeature.missing
method. Implementation should be added to RecordBuffer
class. The following looks useful for the implementation: https://documentation.progress.com/output/ua/OpenEdge_latest/index.html#page/gsdbe%2Fformulas-for-calculating-field-storage.html%23
SERIALIZE-HIDDEN Attribute
Not needed at this time, as documented previously.
BUFFER:FIND-CURRENT Method
Needs to be wired to existing, backing support in the persistence runtime. Did not fit with common implementation used by BUFFER:FIND-{FIRST|NEXT|PREVIOUS|LAST}
, which all delegate to the BufferImpl._find
worker method, because that method converts a dynamic query predicate. This is not appropriate for FIND-CURRENT
, so some refactoring is needed.
BUFFER/TT:{READ|WRITE}-JSON Methods
Runtime implementation is needed. I began implementation based on the similar XML methods, but mostly just to the extent needed to support conversion. That is, I created a JsonData
interface, which Buffer
extends and BufferImpl
implements. However, the implementation is stubbed: the less specific BufferImpl.{read|write}Json
methods delegate to more specific variants with appropriate default parameters, but the most specific worker method is not implemented. It calls UnimplementedFeature.missing
.
QUERY:{FIRST|LAST}-OF Methods
Fully implemented.
#8 Updated by Ovidiu Maxiniuc over 5 years ago
Status & notes:
BUFFER:FIND-CURRENT Method
Needs to be wired to existing, backing support in the persistence runtime. Did not fit with common implementation used byBUFFER:FIND-{FIRST|NEXT|PREVIOUS|LAST}
, which all delegate to theBufferImpl._find
worker method, because that method converts a dynamic query predicate. This is not appropriate forFIND-CURRENT
, so some refactoring is needed.
Implemented. However, the reference does differ from the observed behaviour. I also added support for the related CURRENT-CHANGED
attribute. Where should I commit? My update is based on 3750a.
ETA: 25%. I did some research here. The documentation is scarce. There is some support inDBTASKID Function
Full runtime implementation is needed. Several variants ofConnectionManager.dbTaskId
are stubbed, currently callUnimplementedFeature.missing
method. Currently, there is no infrastructure to track a current transaction identifier, soTransactionManager
updates will be needed. Since the use of this function should be very rare, the overhead of maintaining a transaction identifier must absolutely be minimized.
TransactionManager
(see nextTransId
and BlockDefinition.transId
), but this does not seem correct. Here are some notes from my investigations:
- transaction IDs are independent for each database. They increment separately, two or more database can have the same transaction ID at different moments in time or even at the same time;
- apparently, the next id is persistent for each database (probably in a VST field);
- the transaction ID is only assigned when the first buffer is locked in EXCLUSIVE in a TRANSACTION block, not when the block is pushed on stack;
- even if a database is connected, the transaction ID is not set for it, if no buffers are EXCLUSIVE locked for respective database;
- the transaction ID is set at the moment of EXCLUSIVE lock. It does not matter whether the record is actually updated or otherwise touched.
#9 Updated by Ovidiu Maxiniuc over 5 years ago
Continuation:
The DBTASKID
function is related to _trans
VST. However, the relation is not 1:1. If, in a transaction, any record from a database is involved (updated OR FIND
, regardless of the lock level - and this includes the _Trans
VST, too) then the task id is incremented and the new value is returned by DBTASKID
function. Otherwise (all records of the database are not accessed in any way OR just read access to buffers loaded before the transaction started) the _Trans._Trans-Num
of the database is NOT incremented and DBTASKID
function returns ?
(unknown value).
#10 Updated by Greg Shah over 5 years ago
Where should I commit? My update is based on 3750a.
Please use 3750b. It is based on trunk 11298 (which is 3750a).
#11 Updated by Ovidiu Maxiniuc over 5 years ago
- added support for
CURRENT-CHANGED
,FIND-CURRENT
; - added support for
DBTASKID
ABL function; - improved
_Trans
VST metadata support; - fixed locking in
FIND_*
methods; - fixed
DBPARAM
function.
Notes related to this update:
The new implementation is largely based on the old code of TransactionUpdater
, but the data processed is rather different because the transaction id are generated at different moment, not at the beginning of the transaction block. It is 'flushed' to H2 database only if there is a query on _Trans
VST, otherwise the data structure used by DBTASKID
function is stored in memory, so it's fast. The DBTASKID
works fine now.
When a _Trans
buffer is needed in a query, the internal data is requested to be 'flushed' to H2 meta database so it could be accessed in SQL form by the about-to-be-executed-query. The problem is that, because the fresh _Trans
data, although saved at this moment, is not really committed until the end of transaction, the queries on this VST table will not be able to access them yet. The data is available only after the transaction ends and commit is performed. Needless to say that at that moment, the data is invalid as the transaction has actually ended.
I wonder how can we force the commit of the in-memory data when it is needed in SQL form? Initially I thought that the auto-commit would be a solution, but this only works outside the transactions. As a test I've manually called commit
on the persistence (and then reopened the SQL/Hibernate transaction - to let things apparently unchanged). It worked for my isolated testcase, but this is not going to work for a whole application.
An interesting note: you cannot get a lock on _Trans
buffers. I 'successfully' acquired EXCLUSIVE-LOCKS
on same record from multiple ABL clients. In fact, I think, it is not possible to have the lock on neither of VS tables (this assertion remains to be tested). As I understand them now, they are not real rows, but pieces of information extracted in real-time from ABL db engine.
#12 Updated by Eric Faulhaber over 5 years ago
Ovidiu Maxiniuc wrote:
The new implementation is largely based on the old code of
TransactionUpdater
, but the data processed is rather different because the transaction id are generated at different moment, not at the beginning of the transaction block.
What is the actual requirement in this regard? When does the transaction id need to be generated? I understand that the single id we were generating before is not appropriate; we need a unique transaction id per connected database, right?
It is 'flushed' to H2 database only if there is a query on
_Trans
VST, otherwise the data structure used byDBTASKID
function is stored in memory, so it's fast. TheDBTASKID
works fine now.
It was not so much the speed of the DBTASKID
function I was concerned with; this will be used rarely. It's good that it is fast, but it is really the overhead of the transaction id management that I am concerned with. I think we have added more overhead than necessary with this implementation. If I read the code correctly, you are checking whether the transaction id is set every time we load a record into a buffer and any time we call a setter on a DMO.
Don't we potentially need to update transaction ids on the following events?
- when starting a full transaction;
- when ending a full transaction;
- when connecting a database (if a transaction is active);
- when disconnecting a database (if a transaction is active).
This should result in far fewer checks (which are somewhat expensive in the multiple levels of lookup they do) than each time a record is set into a buffer and every time a DMO setter is called.
Are there any other events that would require a check? Is the transaction id changed at a subtransaction boundary, or just full transaction?
I'm not thrilled that so much database code has moved into TM. Why was it necessary to dismantle the listener model? The point of this was to provide a looser coupling between the TM and the VST implementation. Perhaps events need to be fired from different (or more) locations now, but I think the listener model is still appropriate.
When a
_Trans
buffer is needed in a query, the internal data is requested to be 'flushed' to H2 meta database so it could be accessed in SQL form by the about-to-be-executed-query. The problem is that, because the fresh_Trans
data, although saved at this moment, is not really committed until the end of transaction, the queries on this VST table will not be able to access them yet. The data is available only after the transaction ends and commit is performed. Needless to say that at that moment, the data is invalid as the transaction has actually ended.I wonder how can we force the commit of the in-memory data when it is needed in SQL form? Initially I thought that the auto-commit would be a solution, but this only works outside the transactions. As a test I've manually called
commit
on the persistence (and then reopened the SQL/Hibernate transaction - to let things apparently unchanged). It worked for my isolated testcase, but this is not going to work for a whole application.
If we currently are managing the metadata databases using application-level transactions, that is a bug. Since, as you noted, VST records cannot be locked or changed by application code, there is no point in applying application-level transactions to them. The Persistence$Context
object for the metadata database should have transactions that are independent of the application's transactions for the primary and temp-table databases. Changes to the meta_trans
table (and meta_connect
, etc.) must be committed independently, so that information is available to all contexts regardless of application-level transaction scopes.
I don't think the flush point you added to Persistence.load
is appropriate (where you have the TODO asking if its appropriate). load
is called after the query already has executed; at that point, we're just retrieving the found DMO, so doing the flush then is too late. Did you find that the metadata flush in RecordBuffer.flush
was not enough? This represents the moment when we have determined a record (any record) would be flushed to the database.
Outside of the _trans
changes, the rest of the update looks good to me.
#13 Updated by Ovidiu Maxiniuc over 5 years ago
Eric Faulhaber wrote:
What is the actual requirement in this regard? When does the transaction id need to be generated? I understand that the single id we were generating before is not appropriate; we need a unique transaction id per connected database, right?
The transaction id is generated only when a database access occur (ie: a FIND, or a field update is sufficient). If there are no database accesses in a TRANSACTION block the transaction ID is not generated and DBTASKID returns unknown for at that moment. Each database keeps a transaction counter. It will be incremented and assigned for any user that access it in a valid transaction. Two different databases can have transactions with the same transaction-id for same or for different users at the same time.
Yes, this is true. But all happens in memory, and the accesses are hash-mapped. The overhead is caused only by two things:It was not so much the speed of the
DBTASKID
function I was concerned with; this will be used rarely. It's good that it is fast, but it is really the overhead of the transaction id management that I am concerned with. I think we have added more overhead than necessary with this implementation. If I read the code correctly, you are checking whether the transaction id is set every time we load a record into a buffer and any time we call a setter on a DMO.
- the synchronization on the data member;
- the access to
SecurityManager
's context to identify the session id.
These are far faster than any H2 access. The H2 access is only effectuated if the _Trans
table is accessed in read mode. The ABL programmer does not have write access to (this) VST. When _Trans
is queried, I try to flush (or better said synchronize) the in-memory content for respective database with H2. If no such operations are used, then all happens in-memory.
Don't we potentially need to update transaction ids on the following events?
- when starting a full transaction;
- when ending a full transaction;
- when connecting a database (if a transaction is active);
- when disconnecting a database (if a transaction is active).
As noted above, the IDs are not generated when the transaction block is pushed to (popped from) stack and the transaction counters are independents for each database. If a database is connected during a transaction it will assign a new transaction-id the first time a record is fetched or updated. If that is the first ever transaction, the counter will start at 1, otherwise will increment the counter to N+1, where N is the id of a possible still-running transaction in another connection/user.
This should result in far fewer checks (which are somewhat expensive in the multiple levels of lookup they do) than each time a record is set into a buffer and every time a DMO setter is called.
I know, but I based my implementation on the results and observations done on native ABL.
Are there any other events that would require a check? Is the transaction id changed at a subtransaction boundary, or just full transaction?
Subtransactions are not special here. If the current transaction has an ID then it will be kept during the subtransaction and until the parent transaction ends. If the subtransaction is the first place the db is accessed, the parent transaction gets the ID and keeps it until it ends regardless whether there are no more db accesses in it after the subtransaction block ended. It looks just like any other normal block in the transaction block. I.e.: the _Trans record is dropped (and dbtaskid
returns ?) when the parent transaction block ends.
I'm not thrilled that so much database code has moved into TM. Why was it necessary to dismantle the listener model? The point of this was to provide a looser coupling between the TM and the VST implementation. Perhaps events need to be fired from different (or more) locations now, but I think the listener model is still appropriate.
I preferred that because the data was there. A listener mechanism is not needed. However, the de-coupling TM and the VST persistence is good. I will fix this. In fact this is required since the TM is used at conversion time, too.
If we currently are managing the metadata databases using application-level transactions, that is a bug. Since, as you noted, VST records cannot be locked or changed by application code, there is no point in applying application-level transactions to them. The
Persistence$Context
object for the metadata database should have transactions that are independent of the application's transactions for the primary and temp-table databases. Changes to themeta_trans
table (andmeta_connect
, etc.) must be committed independently, so that information is available to all contexts regardless of application-level transaction scopes.
I think we do. For the moment, I do not know how to do this. I noticed that some of the management of _meta tables (not sure if VSD only) is performed using native H2 calls using executeSQL()
calls. This could be a solution but I do not like the idea very much, because, even if we are using now only H2 for meta and temp databases, the native code will get us stuck with H2 and we are losing flexibility here.
I don't think the flush point you added to
Persistence.load
is appropriate (where you have the TODO asking if its appropriate).load
is called after the query already has executed; at that point, we're just retrieving the found DMO, so doing the flush then is too late. Did you find that the metadata flush inRecordBuffer.flush
was not enough? This represents the moment when we have determined a record (any record) would be flushed to the database.
Yes, that is a remaining of my work for finding the most appropriate the places to check and flush in-memory data to H2. I spent a lot of time moving these calls so that the impact on performance to be minimum. Probably I failed to notice it in the final clean-up. I will review and re-test this occurrence and most likely I will remove the call.
Outside of the
_trans
changes, the rest of the update looks good to me.
Thank you very much for the review.
#14 Updated by Eric Faulhaber over 5 years ago
3750b/11317 adds basic runtime support for the {READ|WRITE}-JSON methods. Basic JSON import and export work with some limitations and a few known issues:
- the following types are supported:
- types assignable from
int64
- types assignable from
Text
- types assignable from
date
decimal
logical
- types assignable from
I need to look at what needs to be done to handle the other data types that can be used with temp-tables.
I've tested the following:integer
character
decimal
- the format (precision) does not always match ABL outputdate
- the format does not match ABL output; we produce MM/DD/YY text, while ABL outputs YYYY-MM-DDlogical
Unformatted JSON is identical to ABL's output. Formatted JSON does not look identical (some differences in line breaks and indents).
Like with the XML export, not all options are supported for JSON export yet. Currently, we support FILE and LONGCHAR target types, but not yet STREAM, STREAM-HANDLE, MEMPTR, JsonArray, or JsonObject. We currently ignore the omit-initial-values, omit-outer-object, and write-before-image options. I believe the last two are related to datasets.
Some refactoring of the JSON (and XML) import/export code will be needed, once we have a better grasp on the requirements for dataset integration, as the APIs currently operate on an instance of TemporaryBuffer
.
Despite the format issues, the data imports correctly from the given output and iterating through the temp-tables and DISPLAY-ing the records gives the same results.
As part of these changes, I did some refactoring of the XML import/export runtime code. Hopefully, I did not break that, but some regression testing is required to be sure.
#15 Updated by Ovidiu Maxiniuc over 5 years ago
Eric,
Do we need to find the exact formula for RECORD-LENGTH?
I have implemented a formula that works well for a small amount of fields of a record. However, when the number of fields in the record increases, ABL additionally add supplementary increments of 5 bytes. I could not find a prognosis when they would be added.
#16 Updated by Eric Faulhaber over 5 years ago
Ovidiu Maxiniuc wrote:
Do we need to find the exact formula for RECORD-LENGTH?
I have implemented a formula that works well for a small amount of fields of a record. However, when the number of fields in the record increases, ABL additionally add supplementary increments of 5 bytes. I could not find a prognosis when they would be added.
Did you base your implementation on https://documentation.progress.com/output/ua/OpenEdge_latest/index.html#page/gsdbe%2Fformulas-for-calculating-field-storage.html%23?
I'll review tomorrow...
#17 Updated by Ovidiu Maxiniuc over 5 years ago
Eric Faulhaber wrote:
Yes, that table provides a rough estimation for a single field only. However, as the number of fields in a record increases, 4GL adds extra bytes. After further investigations I was able to come up with some patterns:Did you base your implementation on https://documentation.progress.com/output/ua/OpenEdge_latest/index.html#page/gsdbe%2Fformulas-for-calculating-field-storage.html%23?
For example: for a record with a single
logical extent
, all components set to unknown
value. Here is the variation table:
extent | size of record | notes |
1 | 20 | computed as 15 + 4 +1 |
2-15 | 21-34 | +1 for each component |
16 | 40 | +1 +additional 5 bytes are added |
17-31 | 41-55 | +1 for each component |
32 | 58 | +1 +additional 2 bytes are added |
33-47 | 59-73 | +1 for each component |
48 | 76 | +1 +additional 2 bytes are added |
But that is not all. The size also depends on the number of fields in the record. Let's add some simple
logical
fields to initial with extent
of 1.
no of simple fields | size of record | notes |
1 | 21 | computed as 15 + (4 + 1) + 1 |
2-12 | 22-32 | +1 for each field |
13 | 38 | +1 +additional 5 bytes are added |
13-28 | 39-53 | +1 for each component |
29 | 56 | +1 +additional 2 bytes are added |
29-44 | 71 | +1 for each component |
45 | 74 | +1 +additional 2 bytes are added |
etc..
So, it seems that, there is a threshold at each 16 components/ fields. The fact that for simple fields the first one is at 13 implies that the extra constant 15 bytes are occupied by hidden 2 fields that forms some kind of header (most likely one of them is the recid
/rowid
). It is important to note that at first threshold, 5 bytes are added, while at the subsequent only 2.
I'll review tomorrow...
I have not committed yet, I don't know where to: 3750b or the new 3898a? The code is really additive so it's safe. I will update it in the light of my new discoveries and let you know when ready.
#18 Updated by Eric Faulhaber over 5 years ago
3750b
#19 Updated by Greg Shah over 5 years ago
- Related to Feature #3549: implement RAW-TRANSFER statement added
#20 Updated by Greg Shah over 5 years ago
Please see #3549 for my previous research into the layout of a 4GL serialized database record. It has findings that are overlapping with your work. As you can imagine, one important use of RECORD-LENGTH
would be to allocate a buffer that can hold the entire serialized record. I believe that the RECORD-LENGTH
returns the exact size of what would be serialized.
#21 Updated by Ovidiu Maxiniuc over 5 years ago
I added (full) implementation of RECORD-LENGTH to 3750b. Committed revision 11340.
The solution central point is in BufferImpl.recordLength()
which is computed the extra bytes as described in note-17. However, each individual field needs to report the size of data it holds so I added a new method in BDT: getSize()
. I have done extensive testing to identify how each of the fields are serialized. The main set of tests were saved as testcases/uast/raw_transfer/record-length.p
.
Please review.
#22 Updated by Greg Shah about 5 years ago
Code Review 3750b Revision 11340
I'm good with the changes. I like the approach with the BDT getSize()
. When we implement the fully compatible RAW-TRANSFER
we will do it the same way (putting the type-specific serialization there).
#23 Updated by Greg Shah about 5 years ago
Can this task be closed?
#24 Updated by Ovidiu Maxiniuc about 5 years ago
Greg Shah wrote:
Can this task be closed?
RECORD-LENGTH: I think I covered all datatypes which can be serialized. We cannot test this at this time.
DBTASKID: All my tests were successful for this function. There is a clear delimitation from the related _Trans
VST whish is not 100% done, but that is being fixed with other VSTs.
BUFFER:FIND-CURRENT: Fully functional.
I believe Eric handled the others. In conclusion, yes, it can be closed.
#25 Updated by Greg Shah about 5 years ago
- % Done changed from 30 to 100
- Status changed from WIP to Closed
#26 Updated by Greg Shah over 4 years ago
Ovidiu: The DBTASKID()
is marked as runtime stubs in gap marking. I think this is incorrect. Based on the comments above, it possibly should be "full" though there is some work on the _trans
table which is not clear whether or not it impacts DBTASKID()
behavior.
#27 Updated by Ovidiu Maxiniuc over 4 years ago
Indeed, the DBTASKID
should be marked as having full implementation. Where do I commit this?
#28 Updated by Greg Shah over 4 years ago
Ovidiu Maxiniuc wrote:
Indeed, the
DBTASKID
should be marked as having full implementation. Where do I commit this?
4069a, thanks.