Project

General

Profile

Multi-Tenancy

Initial thoughts on design and areas of FWD affected...

Database-Per-Tenant

The initial support for multi-tenancy in FWD will follow a database-per-tenant model. That is, each tenant will have a distinct database instance, and a single, logical FWD server will manage connections to the correct database, based on the context of the user. This approach seems more scalable and secure, and is simpler to implement in FWD, than multi-tenancy within a single database.

Given a database-per-tenant approach, the runtime implementation is largely a matter of:

  • managing connection pools and data sources for multiple tenant databases with identical schemas;
  • providing a connection from the correct pool to service a request, based on an implicit identifier inherent in the user context;
  • ensuring all state, caches, etc. in the persistence framework manage resources separately by database/tenant.

On the last point, this already should be the case, in that FWD supports multiple databases today. That is, even before implementing multi-tenancy, it would already be considered a bug if state was not being managed per database. Nevertheless, a thorough review will be conducted to make sure this is truly the case.

Areas to review to ensure state is maintained by database:

  • dynamic database caches:
    • DynamicQueryHelper
    • DynamicValidationHelper
  • FQLPreprocessor
  • FQLHelperCache
  • FastFindCache (already being addressed with #6812)
  • DatabaseManager
  • SortCriterion
  • SortIndex
  • Query cache in Persistence
  • Prepared statement cache (inherently associated with a connection, so shouldn't be an issue)
  • ChangeSet (stores an instance of database Session)
  • IndexMetadataCollector (creates and stores instances of Session in a Map)
  • Others?

Storage

Databases may reside in the same server/cluster, or may be distributed across multiple, physical locations. A stable, performant connection from the FWD server to each database is still required/assumed. The same database vendor and version should be used for all tenant databases for a particular schema, since the schema can only be guaranteed to be defined and behave the same way within a given dialect and version (TODO: there may be flexibility in the version).

Configuration

Configuration (i.e., the directory) will specify the JDBC URL which uniquely identifies a physical database for a particular tenant. These will need to be configured separately, as they may require different connection pool and other settings, based on the use patterns for a particular tenant.

Design/Implementation

Review data source provider code in the persist.orm package; specifically, PooledDataSourceProvider, JDBCDataSource. I expect modifications in these areas to serve the correct connection per tenant, based on a tenant ID in the context.

Review security package to consider how we manage/access a tenant ID.

Open Questions

How do we implement this with zero overhead for the non-multi-tenant case? Most installations are not multi-tenant, and there may be some overhead extracting the context-local tenant identifier, mapping to connection pools, etc. It may not be a lot per operation, but these operations will occur millions of times.

Should we collapse areas of database configuration which are common across tenants (e.g., driver, dialect, schema, etc.)?

How do we associate a tenant ID with a user context? How do we extract the ID securely for persistence purposes, while not exposing it externally?