Review of Firebird 2.0

To Firebird Home Page   To Engine Development page
Some time ago, many of our users were surprised by the number of features released in Firebird 1.5. Honestly, before preparing this paper, I didn't count new features in Firebird 2.0 and I don't have a clue whether it's superior to the version 1.5 in this aspect or not. But, in my opinion, the major benefit of the version 2.0 is not its feature set.
"What then?", - you ask. I'd describe Firebird 2.0 as "the version which removes the annoying limits". Sounds not so modest, I know. Let me explain better. No doubt that Firebird has a nice multi-generational architecture and a rich SQL language, an embedded usage and a good performance. But I'm sure almost everyone stepped on some internal limitations that worried or even shocked you. To name a few (in no particular order):

  • Undocumented table limit of about 35GB, an overflow may cause data corruption
  • Removing (garbage collecting) nodes from a non-selective index is extremely slow
  • Bigger page cache often means slower performance
  • Optimizer is not able to choose a good plan in many cases
  • International support is weak, a lot of bugs in the Unicode/MBCS handling
  • Lack of a fast backup/restore mechanism
  • Weak security and many known vulnerabilities
  • Requirement of exclusive database access for referential integrity declarations
  • Too few built-in functions
  • Unreliable database shutdown

Some of them could be of critical priority for your business, others are just annoying. Anyway, I'm glad to report that Firebird 2.0 eliminates most of the aforementioned issues and significantly reduces the impact of others. As for me, this is more important than declaring new language features. But considering also tons of bugfixes and enough new features, Firebird 2.0 definitely represents a major release of your favourite RDBMS. More robust, more feature complete, faster and much more friendly to non-ASCII users.

Of course, there are limits that still exist and many features we don't support yet. But we should have some room for future improvements, shouldn't we? We'll talk about the future development a bit later.

Well, for those who're interested in numbers, let's read the WhatsNew document or Release Notes and make a summary of total changes per version:

  • Version 1.0: 32 improvements, 55 bugfixes
  • Version 1.5: 58 improvements, 94 bugfixes
  • Version 2.0: 82 improvements, 140 bugfixes

Firebird Vulcan

Our primary goal is to merge two codebases (Firebird 2.0 and Vulcan) in order to release Firebird version 3.0. It will be based on the Vulcan tree and will contain its modular architecture and new features, as well as all improvements made in Firebird 2.0. The key features of the Vulcan codebase are:

  • Globally refactored code
  • Fine-grained multithreading
  • Unified provider based architecture
  • Flexible configuration mechanism
  • Database level authentication and enhanced security management
  • Internal DSQL implementation
Roadmap Header 2006

If you'd ask me to outline the generic development priorities, they would be:

  1. Reliability and safety (bugfixing, guaranteed recovery, security improvements)
  2. Administration and monitoring facilities
  3. Compliance with the SQL specification
  4. Performance (both algorithmical and optimizer decisions)
  5. Language enhancements

The timetable should look a lot like this:

2005:

  • Release 2.0 RC and fork the 2.0 HEAD to create the release branch
  • Port some changes from independent trees to HEAD
  • Fork the Vulcan HEAD to create the 3.0 development branch

2006, 1st quarter:

  • Release Firebird 2.0 Final and Firebird Vulcan Final

2006, 2nd quarter:

  • Release Firebird 3.0 Beta
  • Fork the 3.0 HEAD to create the 3.0+ development branch

2006, 3rd quarter:

  • Release Firebird 3.0 Final

2006, 4th quarter:

  • Release Firebird 3.0+ Beta

What's lurking in the trees?

As you perhaps know, there are some improvements done by various Firebird developers that didn't go into the 2.0 release due to time constraints. Some of them are included and being tested in Fyracle, others are still in private trees. Also, we still have a few features in Yaffil that requires backporting into Firebird. Everything mentioned above is exactly the new stuff you'll see in version 3.0. Let's see what has been already done:

  • Common tables expressions and recursive queries (SQL-99 compliant)
    Developed by: Paul Ruizendaal
    Current state: Completed
  • Global temporary tables (SQL-99 compliant)
    Developed by: Vlad Horsun
    Current state: Completed
  • External procedures / functions (SQL-99 compliant)
    Developed by: Eugene Putilin, Roman Rokytskyy, Vlad Horsun
    Current state: Partially done, requires the callback API discussion
  • New built-in functions (string, math, binary, date/time)
    Developed by: Oleg Loa
    Current state: Completed, although requires some changes

Firebird 3.0

As both Firebird 2.0 and Firebird Vulcan releases are going to co-exist in the next year, you may ask why version 3.0 is numbered as a major release and what else (except of features already done in both codebases) it will contain. A good question. As we want to shorten the 3.0 release cycle as much as possible, no completely new development is expected to happen in that version. But we need to keep our users interested, so something new should be introduced.

The solution is simple: the 3.0 release is going to incorporate all the work done inside independent branches.

What features to expect?

PRIORITY

       shows how much our users want the feature. It's set up after looking at various polls and forum/newsgroup discussions;  also influenced by our analysis of the competitors' feature offerings
COMPLEXITY, based on time/effort estimates made by the core team
NONE
All required work is already done (in some code branch) and just needs to be backported and tested
Implementation
Feature has already been discussed and agreed on, but still requires some minor discussions and actual coding
Design/Implementation

<-Between these two->
Design
We have basic agreement and some vision of the thing, but the work hasn't been discussed in depth yet, hence we don't have an implementation plan for it
Research
The work requires serious analysis before discussing its design and implementation specifications

Admin/Tracing/Monitoring


Asynchronous statement cancellation / timeouts
Force any of the following actions: cancel a running statement, rollback a transaction, kill an attachment. Allow setting timeouts for SQL statements. Should be available at least at the API level.

Monitoring via API and/or special tables
Perform a "snapshot" monitoring (i.e. at the given moment) of the internal activities inside the engine. Obvious objects of such a monitoring are: databases, attachments, transactions, active requests, resource (memory, CPU) usage, etc..

Detailed SQL tracing/profiling
Show detailed access path (at the RSB tree level) for every retrieval, count rows (profile CPU time, etc) per every node. Available runtime statistics should be extended.

Detailed logging/audit
Allow to log some events happening on the server. These could be: successful/rejected authentification attempts (containing client host info), prepared/executed SQL statements, committed/rolled back transactions, etc. We need API to set up the required events and to retrieve the audit log.

DDL level and global triggers
Allow triggers ON CREATE/ALTER/DROP. Implement triggers attachment and transaction level triggers running in autonomous transactions.

PSQL debugging extensions/hooks
Allow PSQL debugging via introducing: looper breakpoints, handler callbacks, retrieval of context data, etc. priority = low
complexity = research

Maintenance / Recovery


Reliable logical backup
The only case of unrestorable backup should be a physically corrupted backup. Primitive objects (generators, UDFs, etc) must be restored in the beginning. Computed columns and validation constraints must be restored at the end. The engine should reject inconsistent data instead of transform them when reading (e.g. no value -> NULL). GBAK should allow partial restore, driven by switches or interactively.

Point-in-time recovery
The engine must have an optional ability to maintain a redo log in order to roll it over the last logical backup. No data loss is acceptable.

Security


Embedded users / SQL users management
Allow in-database users management. (Done in Vulcan)

User permissions for metadata
Protect all metadata with security classes. Implement metadata-level permissions. Add database-level permissions like BACKUP, DROP, etc.

Pluggable authentication modules
Enable using of custom authentication mechanisms (e.g. native OS ones).

Security groups
Design group-based security as an alternative to the existing role-based one.

Database encryption
Allow optional encryption of database files. Keys management is an open question here.

Performance / Optimizer


Faster outer joins
Implement the merge algorithm for outer joins.

Optimizer improvements
Fix known bugs/limitations, better optimizer decisions, more data statistics

More effective sorting
Implement partial sorting to spead up FIRST-limited retrievals. Consider sorting recno's instead of entire rows.

Optimized network protocol
Avoid sending a lot of unneeded data (buffer tails). Consider implementing protocol batches (e.g. prepare + info). Compress spaces more effectively.

More access paths
Consider implementation of hash join / hash aggregate and other retrieval algorithms used in the competitor RDBMS.

Language Extensions


Temporary tables / transient datasets
Implement temporary schema objects and/or datasets. SQL-99 compliance is required, extensions are welcome. (Partly done in Fyracle.)

More built-in functions
The SQL-99 (or later) ones (those of major importance for us) must be implemented first. Then we need users feedback about other ones.

Schemas/namespaces
First, it significantly reduces a cost of the issue with short metadata names. Second, it simplifies administration as a number of different databases could be united into a single file. Third, it finally allows us to be fully SQL-92 (entry level) compliant.

Native long numeric data type
Implement long exact numeric data type (with precision longer than 30 decimal digits) and appropriate BCD arithmetics.

Recursive queries
Implement SQL for recursive retrievals. Make it consistent with the SQL specification.

Regular expressions
Allow usage of regular expressions in search conditions. Add some special syntax (a new predicate) for this purpose.

TEXT BLOB compatible with [VAR]CHAR
Allow BLOB SUB_TYPE TEXT to be compatible/interchangable with string data types. Allow text blobs in all built-in functions.

Domains everywhere
Allow usage of domains in PSQL parameters and variables, as well as in the CAST function.

Longer metadata names
Up to 128 unicode characters.

SQL functions
CREATE/ALTER/DROP FUNCTION as per SQL-99.

Deferred constraints
Implement commit-time constraint checking as per SQL specification.

Generic / Architecture


SMP support in SS
Support effective fine grained multi-threading in the SS architecture. (Done in Vulcan.)

Compiled statements cache
Support caching/reusing of compiled statements. (Partly done in Vulcan)

External functions/procedures
Allow to create procedures/functions written in non-PSQL languages. Provide a few drivers (cdecl, Java, .NET) in the distros. (Partly done in Fyracle.)

External data sources / database links / cross-database SQL
Allow retrievals from external data sources. Provide a few drivers (native FB, JDBC, ODBC) in the distros. Add DDL to declare and DML to use such sources. Implement optimization of retrievals for native data sources.

Statement/transaction consistency
Solve the known inconsistencies in verbs/transactions. Mostly, this covers blr_for behaviour in INSERT/UPDATE/DELETE statements. Make read-committed transactions compliant with the SQL specification.

Bi-directional indices
Allow reversed index navigation to use ASC-indices for DESC sorting and vice versa.

Bulk load/import
Implement effective mass data load ability. Provide utility/syntax to use different input formats (csv, xml schema, etc) for import.

Referential integrity without indices
Implement (optionally) foreign keys that are not enforced by indices. Also provide an ability to reuse the existing index for a constraint.

Full-text search
Implement FTS features inside the engine or add API to plug the external FTS engines in. FB seems to be the only RDBMS that don't have this feature yet.

Clustering
MySQL supports clusters (AFAIK, only shared-memory so far), PostgreSQL shared-disk implementation is in the way. More and more buzz about this feature around the world.

Bi-directional cursors
Consider implementation of scrollable cursors inside the engine or provide a thin layer at the top of the RSB hierarchy to implement the feature via the caching.

XML integration
Provide at least fetch to XML and insert from XML abilities. Consider having BLOB SUB_TYPE XML and implementing XPath queries.

Obviously, the list is not complete:  it includes only the changes that we consider most important.
If this list misses your favourite wish, speak now!

Now let's envisage a matrix where the most preferred and easiest-to-do features are placed in the top-left corner and the harder and/or less wanted ones are toward the bottom-right corner. Moving from one of these corners to another reveals the most likely feature implementation roadmap. Recalling what has been said before, we could imagine a more detailed roadmap:

Firebird 3.0 (the merged version)

  • Monitoring
  • Asynchronous statement cancellation
  • Embedded users / SQL users management
  • More built-in functions
  • Temporary tables
  • SQL functions
  • Recursive queries
  • Faster outer joins
  • SMP support in SS
  • Compiled statements cache
  • External functions/procedures

As you can see (and as it has been stated earlier), version 3.0 is expected to include the work already done and a few features that are highly wanted and relatively easy to implement. Everything else tends to slow the development down and hence is excluded from the v.3.0 list.

Firebird 3.0+ (the next major version)

  • Detailed logging/audit
  • SQL tracing/profilingUser permissions for metadata
  • Pluggable authentication modules
  • Security groups
  • Long exact numeric implementation
  • Domains everywhere
  • Regular expressions
  • TEXT BLOB compatible with [VAR]CHAR
  • Reliable logical backup
  • Optimizer improvements
  • Statement consistency/atomicy, read committed compliance
  • Optimized network protocol
  • Bi-directional indices
  • Referential integrity without indices
  • Bulk load/import

It would also be excellent to design schemas/namespaces and longer metadata names for 3.0+, but no promises here. The same for external data sources and deferred constraints.

Of course, some intermediate or minor releases may happen in the meantime, as we'll try to make the release cycles shorter.

This overview is not the last word! A detailed "feature by feature" roadmap for version 3.0 will be published after the New Year.- D.Y.

Firebird 3.0++ (something we don't have a schedule for yet)

  • PSQL debugging extensions/hooks
  • Database encryption
  • More access paths
  • Full-text search
  • Clustering
  • Bi-directional cursors
  • XML integration
To Firebird Home Page   To Engine Development page
This site and the pages contained within are Copyright 2000-2006, Firebird Project.
Firebird® is a registered trademark of Firebird Foundation Incorporated.