Feature #7064
Analyze the performance impact of the implicit meta schema in H2
100%
History
#1 Updated by Alexandru Lungu over 1 year ago
There is a so called INFORMATION_SCHEMA
in H2 which is implicitly built: once for a shared temp-database and multiple times if private temp-databases are used. Its goal is to deliver meta information of the database.
This is a nice thing to have as we determine certain metadata very fast. However, it is usable only internally, as the converted code relies on an explicit meta database.
On the same topic, I recall several places where the Database.meta
is locked and used. AFAIK, meta
itself is a table, so it isn't locked if LOCK_MODE
is 0. For private temp-databases we shouldn't worry about this. However, there are many places where this implicit meta table is actually used (entries are inserted/removed/updated). I need some better understanding of this standalone system table and how we use it.
As long as we don't use INFORMATION_SCHEMA
or Database.meta
, or at least we don't heavily rely on them, we should consider disabling. From my slim investigation, we use INFORMATION_SCHEMA
only to retrieve the current value of a sequence and check if a sequence exists. Therefore, we can leave the schema table to do just the sequence work. Please let me know if there are other use-cases in FWD for these I am missing here.
#2 Updated by Igor Skornyakov over 1 year ago
Sorry, I do not understand what you mean. As far as I remember the META database contains information about permanent tables only, at least for now.
The relational databases do provide some metadata (which is typically database-specific) and there is JDBC API for access to e.g. ResultSet metadata, but it is a completely different story. I can recall a few places in our code where we use JDBC or database metadata internally, but the META database tables can be used by an application (so far for read access only)
Or you mean something else?
Thank you.
#3 Updated by Alexandru Lungu over 1 year ago
Igor Skornyakov wrote:
Now that I am reading #7064-1 again, it needs some rephrasing indeed.Sorry, I do not understand what you mean. As far as I remember the META database contains information about permanent tables only, at least for now.
The relational databases do provide some metadata (which is typically database-specific) and there is JDBC API for access to e.g. ResultSet metadata, but it is a completely different story. I can recall a few places in our code where we use JDBC or database metadata internally, but the META database tables can be used by an application (so far for read access only)
I am mostly referring to some H2 internals:
- H2 generates an implicit
INFORMATION_SCHEMA
schema on database creation. This schema contains metadata, and for example, one can useSELECT * FROM INFORMATION_SCHEMA.TABLES
to retrieve all tables from H2. This same schema is used to resolve the metadata required by the H2 JDBC Driver (through theDatabaseMetaData
interface). My concern here is that such internal H2 metadata management can cost us performance/memory. Basically, I want to avoid anINSERT
intoINFORMATION_SCHEMA.TABLES
each time a table is created or such. I think it is worth to take a look if we can avoid doing extra work forINFORMATION_SCHEMA
. Maybe we can live withoutDatabaseMetaData
for the sake of performance. - There is a
Dorg.h2.engine.Database.meta
attribute which seems to be used several times. It is unrelated toINFORMATION_SCHEMA
, but it looks like a system table which stores all objects created inside the database (tables, indexes, views, etc.). Again, my concern is that this internal H2 table may be altered each time a database object is created/removed/updated etc. which may cost us performance.
#4 Updated by Igor Skornyakov over 1 year ago
Alexandru Lungu wrote:
Now that I am reading #7064-1 again, it needs some rephrasing indeed.
I am mostly referring to some H2 internals:
- H2 generates an implicit
INFORMATION_SCHEMA
schema on database creation. This schema contains metadata, and for example, one can useSELECT * FROM INFORMATION_SCHEMA.TABLES
to retrieve all tables from H2. This same schema is used to resolve the metadata required by the H2 JDBC Driver (through theDatabaseMetaData
interface). My concern here is that such internal H2 metadata management can cost us performance/memory. Basically, I want to avoid anINSERT
intoINFORMATION_SCHEMA.TABLES
each time a table is created or such. I think it is worth to take a look if we can avoid doing extra work forINFORMATION_SCHEMA
. Maybe we can live withoutDatabaseMetaData
for the sake of performance.- There is a
Dorg.h2.engine.Database.meta
attribute which seems to be used several times. It is unrelated toINFORMATION_SCHEMA
, but it looks like a system table which stores all objects created inside the database (tables, indexes, views, etc.). Again, my concern is that this internal H2 table may be altered each time a database object is created/removed/updated etc. which may cost us performance.
I see. So it is unrelated to our META database.
I have very rudimentary understanding of H2 internals, but I understand that DatabaseMetaData
is used by the H2 internals and do not understand how we can "live without it". This seems to be in the core of H2 functionality and trying to get rid of it can be very risky if possible at all. We can minimize access to this data from the FWD code but I doubt that it will have a noticeable positive impact on performance since H2 will continue to work with it internally. Of course I can be wrong.
Have estimated the overhead of maintaining this data?
Thank you.
#5 Updated by Radu Apetrii over 1 year ago
- % Done changed from 0 to 80
- Assignee set to Radu Apetrii
- Status changed from New to WIP
Alexandru Lungu wrote:
- H2 generates an implicit
INFORMATION_SCHEMA
schema on database creation. This schema contains metadata, and for example, one can useSELECT * FROM INFORMATION_SCHEMA.TABLES
to retrieve all tables from H2. This same schema is used to resolve the metadata required by the H2 JDBC Driver (through theDatabaseMetaData
interface). My concern here is that such internal H2 metadata management can cost us performance/memory. Basically, I want to avoid anINSERT
intoINFORMATION_SCHEMA.TABLES
each time a table is created or such. I think it is worth to take a look if we can avoid doing extra work forINFORMATION_SCHEMA
. Maybe we can live withoutDatabaseMetaData
for the sake of performance.
From my findings, I noticed that INFORMATION_SCHEMA.TABLES
does contain information about the temp-tables we store, but, this is done in a lazy manner. To be more precise, the information about the newly created temp-table does not get stored in INFORMATION_SCHEMA.TABLES
unless I explicitly execute a select * from INFORMATION_SCHEMA.TABLES
statement. AFAIK, we do not use this kind of statements in fwd code so this should not be a problem. No temp-table (or any other thing, e.g. sequences) will be stored there.
- There is a
Dorg.h2.engine.Database.meta
attribute which seems to be used several times. It is unrelated toINFORMATION_SCHEMA
, but it looks like a system table which stores all objects created inside the database (tables, indexes, views, etc.). Again, my concern is that this internal H2 table may be altered each time a database object is created/removed/updated etc. which may cost us performance.
org.h2.engine.Database.meta
only stores information from non-temporary sessions, which we do not use. For our use case, the information about temp-tables gets stored in LocalTempTablesManager.localTempTables
, which is necessary for the program.
As a conclusion, I don't think there is something we could do to boost the performance by refactoring the storage of metadata in H2.
#6 Updated by Greg Shah over 1 year ago
org.h2.engine.Database.meta only stores information from non-temporary sessions, which we do not use
Although we do not recommend it for production use, we do support H2 as a non-temp database. This is really helpful in test/demo environments (e.g. Hotel GUI) or any small application which would not need a non-embedded database.
I'm not saying we should work on this. We probably should leave it. But it is important for you to know that it actually is in use at times.
#7 Updated by Radu Apetrii 10 months ago
Can/should this task be rejected? IMO, there is nothing left to investigate/implement here.
#8 Updated by Alexandru Lungu 10 months ago
Radu Apetrii wrote:
Can/should this task be rejected? IMO, there is nothing left to investigate/implement here.
I recently met with some metadata "stuff" from inside FWD-H2. For _temp database, there was no hit, but from the _meta database there were several hits. I will need to do a double-check to understand how is the FWD-H2 meta schema used from inside our meta database. In _meta
, objects are not temporary, so they happen to flush sometimes information on disk / generate meta data to be used cross-session. I am not sure if these are actually required.
I will check this first thing tomorrow morning and let you know.
#9 Updated by Alexandru Lungu 10 months ago
- When you open a session and create a database, you reach
Database.open
, which callsDatabase.addDatabaseObject
. This will eventually calladdMeta
and will save the database in the meta table.- creating a database will also create a
LOBS
table, storing data about the lobs you save (INFORMATION_SCHEMA.LOB_DATA
,INFORMATION_SCHEMA.LOB_MAP
,INFORMATION_SCHEMA.LOBS
andINFORMATION_SCEMA.INDEX_LOB_TABLE
). The indexes and constraints are also saved in the meta schema. - creating a database will also do some user related work (e.g.
Database.setMasterUser
)
- creating a database will also create a
- All
SET
commands runSet.addOrUpdateSetting
, which does meta work. CALL DATABASE()
does some meta work.- Defining function aliases does meta work.
- Sometimes when you commit, analyze is triggered. This does some meta work regarding index selectivity.
However, all of the above happen only at server start-up when initializing the meta table. Therefore, I don't think this should be prioritized in any way, as the total time of doing the meta work is very low comparing to the total server start-up time.
The only run-time meta update was caused by Sequence.flush
, but was fixed already in #7454.
Although we do not recommend it for production use, we do support H2 as a non-temp database. This is really helpful in test/demo environments (e.g. Hotel GUI) or any small application which would not need a non-embedded database.
Even so, the optimizations here are related to the removal of the unused meta information from inside FWD-H2. If a H2 database is used for persistence, the removal of meta schema is not recommended, as one can interoperate with the database externally and may need the metadata. For _temp, this is not the case.
Taking these in consideration, we can reject this.