Project

General

Profile

Feature #7064

Analyze the performance impact of the implicit meta schema in H2

Added by Alexandru Lungu over 1 year ago. Updated 10 months ago.

Status:
Rejected
Priority:
Normal
Assignee:
Target version:
-
Start date:
Due date:
% Done:

100%

billable:
No
vendor_id:
GCD
version_reported:
version_resolved:

History

#1 Updated by Alexandru Lungu over 1 year ago

There is a so called INFORMATION_SCHEMA in H2 which is implicitly built: once for a shared temp-database and multiple times if private temp-databases are used. Its goal is to deliver meta information of the database.
This is a nice thing to have as we determine certain metadata very fast. However, it is usable only internally, as the converted code relies on an explicit meta database.

On the same topic, I recall several places where the Database.meta is locked and used. AFAIK, meta itself is a table, so it isn't locked if LOCK_MODE is 0. For private temp-databases we shouldn't worry about this. However, there are many places where this implicit meta table is actually used (entries are inserted/removed/updated). I need some better understanding of this standalone system table and how we use it.

As long as we don't use INFORMATION_SCHEMA or Database.meta, or at least we don't heavily rely on them, we should consider disabling. From my slim investigation, we use INFORMATION_SCHEMA only to retrieve the current value of a sequence and check if a sequence exists. Therefore, we can leave the schema table to do just the sequence work. Please let me know if there are other use-cases in FWD for these I am missing here.

#2 Updated by Igor Skornyakov over 1 year ago

Sorry, I do not understand what you mean. As far as I remember the META database contains information about permanent tables only, at least for now.
The relational databases do provide some metadata (which is typically database-specific) and there is JDBC API for access to e.g. ResultSet metadata, but it is a completely different story. I can recall a few places in our code where we use JDBC or database metadata internally, but the META database tables can be used by an application (so far for read access only)
Or you mean something else?
Thank you.

#3 Updated by Alexandru Lungu over 1 year ago

Igor Skornyakov wrote:

Sorry, I do not understand what you mean. As far as I remember the META database contains information about permanent tables only, at least for now.
The relational databases do provide some metadata (which is typically database-specific) and there is JDBC API for access to e.g. ResultSet metadata, but it is a completely different story. I can recall a few places in our code where we use JDBC or database metadata internally, but the META database tables can be used by an application (so far for read access only)

Now that I am reading #7064-1 again, it needs some rephrasing indeed.
I am mostly referring to some H2 internals:
  • H2 generates an implicit INFORMATION_SCHEMA schema on database creation. This schema contains metadata, and for example, one can use SELECT * FROM INFORMATION_SCHEMA.TABLES to retrieve all tables from H2. This same schema is used to resolve the metadata required by the H2 JDBC Driver (through the DatabaseMetaData interface). My concern here is that such internal H2 metadata management can cost us performance/memory. Basically, I want to avoid an INSERT into INFORMATION_SCHEMA.TABLES each time a table is created or such. I think it is worth to take a look if we can avoid doing extra work for INFORMATION_SCHEMA. Maybe we can live without DatabaseMetaData for the sake of performance.
  • There is a Dorg.h2.engine.Database.meta attribute which seems to be used several times. It is unrelated to INFORMATION_SCHEMA, but it looks like a system table which stores all objects created inside the database (tables, indexes, views, etc.). Again, my concern is that this internal H2 table may be altered each time a database object is created/removed/updated etc. which may cost us performance.

#4 Updated by Igor Skornyakov over 1 year ago

Alexandru Lungu wrote:

Now that I am reading #7064-1 again, it needs some rephrasing indeed.
I am mostly referring to some H2 internals:
  • H2 generates an implicit INFORMATION_SCHEMA schema on database creation. This schema contains metadata, and for example, one can use SELECT * FROM INFORMATION_SCHEMA.TABLES to retrieve all tables from H2. This same schema is used to resolve the metadata required by the H2 JDBC Driver (through the DatabaseMetaData interface). My concern here is that such internal H2 metadata management can cost us performance/memory. Basically, I want to avoid an INSERT into INFORMATION_SCHEMA.TABLES each time a table is created or such. I think it is worth to take a look if we can avoid doing extra work for INFORMATION_SCHEMA. Maybe we can live without DatabaseMetaData for the sake of performance.
  • There is a Dorg.h2.engine.Database.meta attribute which seems to be used several times. It is unrelated to INFORMATION_SCHEMA, but it looks like a system table which stores all objects created inside the database (tables, indexes, views, etc.). Again, my concern is that this internal H2 table may be altered each time a database object is created/removed/updated etc. which may cost us performance.

I see. So it is unrelated to our META database.
I have very rudimentary understanding of H2 internals, but I understand that DatabaseMetaData is used by the H2 internals and do not understand how we can "live without it". This seems to be in the core of H2 functionality and trying to get rid of it can be very risky if possible at all. We can minimize access to this data from the FWD code but I doubt that it will have a noticeable positive impact on performance since H2 will continue to work with it internally. Of course I can be wrong.
Have estimated the overhead of maintaining this data?
Thank you.

#5 Updated by Radu Apetrii over 1 year ago

  • % Done changed from 0 to 80
  • Assignee set to Radu Apetrii
  • Status changed from New to WIP

Alexandru Lungu wrote:

  • H2 generates an implicit INFORMATION_SCHEMA schema on database creation. This schema contains metadata, and for example, one can use SELECT * FROM INFORMATION_SCHEMA.TABLES to retrieve all tables from H2. This same schema is used to resolve the metadata required by the H2 JDBC Driver (through the DatabaseMetaData interface). My concern here is that such internal H2 metadata management can cost us performance/memory. Basically, I want to avoid an INSERT into INFORMATION_SCHEMA.TABLES each time a table is created or such. I think it is worth to take a look if we can avoid doing extra work for INFORMATION_SCHEMA. Maybe we can live without DatabaseMetaData for the sake of performance.

From my findings, I noticed that INFORMATION_SCHEMA.TABLES does contain information about the temp-tables we store, but, this is done in a lazy manner. To be more precise, the information about the newly created temp-table does not get stored in INFORMATION_SCHEMA.TABLES unless I explicitly execute a select * from INFORMATION_SCHEMA.TABLES statement. AFAIK, we do not use this kind of statements in fwd code so this should not be a problem. No temp-table (or any other thing, e.g. sequences) will be stored there.

  • There is a Dorg.h2.engine.Database.meta attribute which seems to be used several times. It is unrelated to INFORMATION_SCHEMA, but it looks like a system table which stores all objects created inside the database (tables, indexes, views, etc.). Again, my concern is that this internal H2 table may be altered each time a database object is created/removed/updated etc. which may cost us performance.

org.h2.engine.Database.meta only stores information from non-temporary sessions, which we do not use. For our use case, the information about temp-tables gets stored in LocalTempTablesManager.localTempTables, which is necessary for the program.

As a conclusion, I don't think there is something we could do to boost the performance by refactoring the storage of metadata in H2.

#6 Updated by Greg Shah over 1 year ago

org.h2.engine.Database.meta only stores information from non-temporary sessions, which we do not use

Although we do not recommend it for production use, we do support H2 as a non-temp database. This is really helpful in test/demo environments (e.g. Hotel GUI) or any small application which would not need a non-embedded database.

I'm not saying we should work on this. We probably should leave it. But it is important for you to know that it actually is in use at times.

#7 Updated by Radu Apetrii 10 months ago

Can/should this task be rejected? IMO, there is nothing left to investigate/implement here.

#8 Updated by Alexandru Lungu 10 months ago

Radu Apetrii wrote:

Can/should this task be rejected? IMO, there is nothing left to investigate/implement here.

I recently met with some metadata "stuff" from inside FWD-H2. For _temp database, there was no hit, but from the _meta database there were several hits. I will need to do a double-check to understand how is the FWD-H2 meta schema used from inside our meta database. In _meta, objects are not temporary, so they happen to flush sometimes information on disk / generate meta data to be used cross-session. I am not sure if these are actually required.

I will check this first thing tomorrow morning and let you know.

#9 Updated by Alexandru Lungu 10 months ago

  • When you open a session and create a database, you reach Database.open, which calls Database.addDatabaseObject. This will eventually call addMeta and will save the database in the meta table.
    • creating a database will also create a LOBS table, storing data about the lobs you save (INFORMATION_SCHEMA.LOB_DATA, INFORMATION_SCHEMA.LOB_MAP, INFORMATION_SCHEMA.LOBS and INFORMATION_SCEMA.INDEX_LOB_TABLE). The indexes and constraints are also saved in the meta schema.
    • creating a database will also do some user related work (e.g. Database.setMasterUser)
  • All SET commands run Set.addOrUpdateSetting, which does meta work.
  • CALL DATABASE() does some meta work.
  • Defining function aliases does meta work.
  • Sometimes when you commit, analyze is triggered. This does some meta work regarding index selectivity.

However, all of the above happen only at server start-up when initializing the meta table. Therefore, I don't think this should be prioritized in any way, as the total time of doing the meta work is very low comparing to the total server start-up time.
The only run-time meta update was caused by Sequence.flush, but was fixed already in #7454.

Although we do not recommend it for production use, we do support H2 as a non-temp database. This is really helpful in test/demo environments (e.g. Hotel GUI) or any small application which would not need a non-embedded database.

Even so, the optimizations here are related to the removal of the unused meta information from inside FWD-H2. If a H2 database is used for persistence, the removal of meta schema is not recommended, as one can interoperate with the database externally and may need the metadata. For _temp, this is not the case.

Taking these in consideration, we can reject this.

#10 Updated by Greg Shah 10 months ago

  • Status changed from WIP to Rejected
  • % Done changed from 80 to 100

Also available in: Atom PDF