Ok. We may all (maybe not really "all") agree that XML in a relational DB is a cool thing. But which kind of approach should we use to store non tabular data in a typical tabular infrastructure?
Personally I don't like hybrids. The core of a RDBMS is designed to be efficient and with a relatively small footprint. Inflating the core with some features that would probably be helpful to not more than the 30% of the projects (to be optimistic, but this percentage is a good reason to think about XML in a DB), does not justify the extra overhead.
Unfortunately, an XML storage engine. On the contrary, it would make things worse, because we would translate queries twice - Xquery to SQL then to XQuery again (the big fans of MySQL internals will forgive me for this simplification).
My interest for a XOR (XML Over Relational) approach goes back to several years ago, mainly because in my past life I have designed a couple of products that used XML quite heavily: a CMR system (Customer Managed Relationship - a permission-based CRM engine with multichannel capabilities) for a company called Bluparc and a CRM Solution Suite for Athoa Ltd ( http://www.athoa.com/...). Both environments heavily relied on Java and used MySQL as RDBMS, but not as XML repository. The adoption of two data managers instead of one has always created issues for our customers. Some of our customers used Oracle, but the XML repository was not in the DB. Why? Because (A) the MySQL at that time did not provide any XML feature at all and (B) because despite the widespread marketing message, the Oracle implementation of XOR did not fit the product at all.
Back to the original topic, there are three basic approaches to XOR (I made up the names, so forgive me if I may somewhat confuse you): XML Shredding, XML Object mapping and XML Indexing
XML Shredding
With XML Shredding every set of XML elements is converted into a table. Depending on the implementation, the columns of the RDBMS table can be associated to XML attibutes or to simple elements. This approach has been adopted by Oracle to implement XML (although the fact of being an object-relational database has helped a bit in defining a specific XML type and in handling complex elements within a set of basic elements).
XML shredding is good for query performance when the XML model is not too complex, ie when a query against the RDBMS is not translated into an endless set of joins with tens of tables. The shredding approach can also facilitate the indexing of the elements and the conversion of XML queries into SQL queries and vice versa. On the other hand, the generation of the SQL queries and the ERD can be very complex.

XML Object Mapping
This is the approach currently adopted by MySQL. Each document or element within a document is stored into a cell of a database column table. The granularity of the element usually depends on the application.
The object mapping approach is very efficient to retrieve deep hierarchies, but on the other hand it provides limited indexing capabilities and document manipulation. For example, in order to retrieve a single element in a XML document that is stored in a large TEXT column, the server has to retrieve and analyse the whole document.
XML Indexing
This approach is based on a set of predefined tables that are used to index and store the XML basic elements. Complex elements are split into several rows, named and indexed.
This approach is based on a fairly simple ER schema (although some complexity may be added to improve performance and capabilities) and it is used for "general purpose XOR". Being generic, this approach works pretty well for basic XML document (ie with no deep hierarchies). In case of complex documents, the generation of the SQL used to retrieve the document becomes a real bottleneck and increases in complexity pretty quickly.
Which Approach?
Unfortunately, a single approach does not fit all the needs. A good XOR strategy should provide all the approaches and user should decide which one would fit their projects. What's important is that the XML connectors and the XOR layer will make the physical storage and the ER implementation completely transparent to the application.
Is MySQL ready for one of these approaches?
MySQL can easily implement XML Object Mapping. In fact, object mapping is already available (in its basic form) in 5.1. What we need now is a mapping layer and a connector.
Anyone? :)

0 comments:
Post a Comment