On 29 November last year, SkySQL and Monty Program jointly  announced the release of the so called "MariaDB Client Library for C Applications" and "Maria DB Client Library for Java Applications", which I will call C and JDBC connectors here. You can follow this link to read the press release from SkySQL.

Last week Baron Schwartz posted some findings re the C connector in his blog. Today, another post from Robert Hodges added few details and were looking for answers. I think I can help a bit in understanding what we have released and how the connectors can be used. I will not comment the accuses of plagiarism mentioned in Baron's post, for two reasons. First of all, I think plagiarism is a serious accusation that refers to illicit actions (but I am not a native English speaker and I may be wrong) and I do not have any legal expertise. Second, I think that any discussion around the ethical or unethical use of somebody else's code, especially when it is based on the number or on the percentage of code changed, is a pretty slippery field and it is too prone to individual interpretations.

The C connector in [very] short terms

You can find information regarding the C connector here and here. The connector is based on the MySQL Client Library v3.23.58. If you look at the source code of both libraries you can immediately spot that there are significant differences. Take the libmysql.c file for example, you will find significant changes within the C functions and some new functions, such as cli_report_progress, changes to support the 5.X protocol, connection/close/reconnect etc. Obviously, libmysql.c is not the only module to show changes - take the prepared statements in my_stmt.c and my_stmt_codec.c for example, where Monty's team worked on the porting from the mysqlnd extensions.

A bit more history and extra info for the JDBC connector

I can certainly add a more history and info for the JDBC connector. Mark Riddoch (our head of Engineering) and his team have actively participated to the improvement of the connector, together with Monty Program's team. What you can read below is a summary of what happened and the areas that they have improved.

The JDBC connector is directly derived from the Drizzle connector. When we looked for a Java connector that could work as alternative to the MySQL Connector/J from Oracle, we were very happy with the great job that Markus Eriksson did. The only problem was that the Drizzle connector, although it conformed to the basic JDBC interface, had been stripped down to the bare essentials. This is good for the objectives of the Drizzle project, but we wanted to offer something that was more generally useful for the MySQL and MariaDB™ community at large. The problem was, of all the vast array of missing features, which should we tackle first? We embarked on a lengthy evaluation program to determine what features to add either ourselves or to fund Monty Program to add to the driver.

Our first approach was to take a few popular Java applications and test these against the connector. One of the issues with this of course is to identify applications that are a reasonable representation of what an end user might use. Also we have to test these applications to fully investigate the features that the application uses with the driver. Given the Java linkage model this means we need to effectively do as complete a test as possible for each application in order to fully exercise the JDBC API.

Using this approach we quickly found some of the more major areas of incompatibility, namely the fact that the URL used by the original Drizzle connector had an incompatible syntax for the URL and was missing compression. Our aim was to provide a drop in compatible JDBC driver, therefore having a compatible syntax for the URL was an obvious first fix to make, as was the addition of compression.

Although this approach seemed to be reasonable at first sight, the overhead in setting up various applications, creating data sets needed by the application and then manually testing the application was a time consuming operation. As an enhancement to this approach we decided to look around for an application or a framework that had some form of automated test procedures that we could use to test the interface to the database. We choose the Hibernate framework as our vehicle for this since it contains a set of automated tests to verify the SQL dialect it is using. This enabled us to automatically run a large set of tests, across a diverse set of API methods of the JDBC driver. Crucially this enabled us to run regression tests very easily each time we fixed a bug or added new functionality to the JDBC driver. The first runs of the Hibernate test suite yielded several hundred failures and very quickly led us to a number of fixes that we needed to apply to the driver that had not been picked up by the tests we had previously run. The lack of support for BLOB and CLOB datatypes caused a vast number of failures, so these were an obvious first target for our development of the driver. This was quickly followed by a number of other features such as better support for the manipulation of the metadata and schema aspects of the JDBC API.

The lack of support for streaming data sets, and the way of handling large result sets in the Drizzle driver required a number of changes to the driver, callable statement support and stored procedure support also need to be added to support these feature of MySQL.

The third phase of the development and testing of the connector was to engage with some external application developers who required a JDBC driver with the more relaxed licence requirements. This added a whole new set of issues that had not come to light with the Hibernate testsuite. A number of these issues were related to the concurrent usage aspects that had not be highlighted by the Hibernate testsuite, there were a number of synchronisation issues that had not be solved, connection handling for high numbers of concurrent connections, connection leaks and general resource leaks were all highlighted by this test route.  Some of the more interesting issues that this approach raised was related to the MySQL specifics that the more general Hibernate approach didn’t find; this included the various parameters that could be set as connection properties and reliance on particular semantics of the MySQL Connector/J.

Datatype support also became an issue as we tested with more and more application developers; temporal datatypes needed to be updated to support microseconds and numerous were found where the Drizzle driver had a different mapping between the database types and the native Java data types. In particular the getObject API call required some work to give true compatibility with the MySQL Connector/J.

One particularly interesting case was the getTable() and getColumn() API, used to get table information from the database schema. The implementable of getTable() in the Drizzle driver was 100% compatible with the specification of this routine in the JDBC specification, however the MySQL Connector/J was not current according to the JDBC specification. A particular application we were working with depended upon this particular behaviour in MySQL Connector/J, which was technically incorrect, so this gave us an issue; we had a driver that was correct but different from the one we wanted to emulate. So should we deliberately put what could be considered a bug into our JDBC driver? The solution we arrived at was to add a connection parameter to switch between the strict JDBC behaviour for getTable() and the MySQL Connector/J behaviour; so we added the NoSchemaPattern connection option for this switch.

At this point we believed we had something that was fairly usable, and started to widen the availability of the JDBC driver to more application developers in an attempt to ascertain if the driver was complete enough to be interesting to the general user base. There were still major areas of functionality that we knew to be missing, in particular the support for connection pooling and connection failover. In parallel to the widening of the testing we also added these features and support for introspection.

The improvements

In all we spent just over a year taking the Drizzle core and slowly testing and enhancing it before handing the driver over the Monty Program for release to the community.

Here is a summary of the improvements that we made in the MariaDB connectors since initial fork, in chronological order:
- UseCompression option
- Connection string compatibility with MySQL Connector/J
- Implemented getWarnings/clearWarnings
- Added support for CLOB
- Added callable statements
- Allowed IP addresses in connection URLs
- Added large result set support
- Fixed synchronisation issue when multiple connections are used by threads in a single application
- Added an option to control the output of tracing messages from the MySQLProtocol class, using a connection option MySQLProtocolLogLevel=FINEST
- Fixed prepared statements so that the parameters are not automatically cleared between each invocation the prepared statement
- Added support for the MaxRows API
- Added connect URL options: socketTimeout, interactiveClient, localSocketAddress, createDatabaseIfNotExist
- Fix for IGNORE_BACKSLASH_ESCAPES
- Implementation of the IGNORE_BACKSLASH_ESCAPES option in line with the MySQL Connector/J treatment
- Update error handling for zero dates to correctly set SQLState
- Corrected object type returned by getObject for INTEGER columns
- Bring other object types returned by getObject into line with the MySQL Connector/J
- Added support for setCatalog and getCatalog. These follow the model my the MySQL driver and treat databases as catalogues
- Resolved quoting issue with setCatalog
- Added the NoSchemaPattern option to the connection string to enable compatibility with MySQL Connector/J. The schema pattern argument to getColumns and getTables are ignored by the MySQL if NoSchemaPattern is set to true.
- Update the remainder of the meta data functions to honour the NoSchemaPattern connection options
- Fix issue with naming of auto increment columns
- Clears the cached result set on closing a statement
- Removes the prepared statement cache
- Make the methods GetDatabaseMajorVersion and GetDatabaseMinorVersion work.
- Support for the javax.sql.pooledConnection class
- A fix for the pooled connection to tidy up transactions in the event of close() being called
- Extra methods on the MySQLDataSource class to allow the data source to be created via reflection:
  - zero argument constructor
  - public void setDatabaseName(String dbName)
  - public String getDatabaseName()
  - public void setUserName(String userName)
  - public String getUserName()
  - public void setPassword(String pass)
  - public void setPort(int p)
  - public int getPort()
  - public void setPortNumber(int p)
  - public int getPortNumber()
  - public void setServerName(String serverName)
  - public String getServerName()
- Addition of the setURL and setUrl methods on MySQLDataSource to aid creation of data sources using reflection
- Addition of setUser and getUser as well as the less used setUserName and getUserName methods
- Update to setURL to avoid overwrite of items set with individual methods if the corresponding value is not set in the URL itself
- Corrects a problem with final attributes in the JDBCUrl class preventing modification of connection parameters
- Addition of connection properties
  - socketFactory
  - tcpKeepAlive
  - tcpNoDelay
  - tcpRcvBuf
  - tcpSndBuf
- Added in the ability to return generated keys in executeUpdate methods of a statement
- Addition of the dumpQueryOnException connection property
- Further fixes for the generated key support
- Corrected behaviour of GetObject on a resultSet with a TINYINT(1) column. If TINYINT1_IS_BIT property was set this previously always returned false.
- Fixed issue that prevented '-' character in hostnames in a URL
- Fixed issue related to BufferUnderflow exceptions
- Fixed issue with the mapping of CHAR columns into a Java type


Naturally, we want to improve both connectors. We are also seeking for contributors in testing, in providing feedback and ideas or even more if it is possible. The fact that Baron, Robert and others are blogging about the connectors is really encouraging. Suggestions and comments are more than welcome!



16 comments:

Ivan,
The major question that has not been answered is: why forking the Drizzle project with months (if not years) of private development instead of contributing to it publicly?
And that leads to the second burning question: why the JDBC connector was re-licensed as LGPL instead of the original BSD?

January 5, 2013 at 3:11 AM  

This is a great description. It would help to publish this when software is released so that you can set the message.

January 5, 2013 at 5:09 AM  

Hi Mark,

Thanks! Yes, I agree.

We should have added more details - and we still need to work on these details, add proper documentation. Our resources are limited and busy on other projects, but this is not a justification. We want to repair and fix this.

January 5, 2013 at 7:27 AM  

Hi Giuseppe,

Re your question on licensing - We asked our lawyer, who investigated how the connectors could be used and he recommended LGPL. As I already said, I am far from being an expert in software law, I can only say that LGPL reflects the values at the core of our work.

Re the "private development", we are talking about a year of work (not full time) of a project that was part of other development - something certainly similar to the work that other companies in the MySQL ecosystem do.

We always intended to release the code publicly and seek for contribution, but we were simply not ready. As you can see from other posts, it looks like we are barely ready now, but more work is needed.


January 5, 2013 at 7:43 AM  

Ivan,
You did not answer the question of why MP did not contribute to the Drizzle driver, rather than forking.
The MariaDB driver is derived from the Drizzle driver, but the Drizzle team cannot get any improvement back in the original project because of the (seemingly arbitrary) choice of licensing.
This is why I keep asking this question, and not getting a satisfactory answer.

About private coding, there is nothing bad in it. What I find contradictory is the claim that all the MP development is public, and then we see evidence of the contrary. People can code privately as much as they want and I will not complain, provided they don't go around claiming that all their code development is done in public.

January 5, 2013 at 10:10 AM  

Ivan,

This is very helpful, thanks. With time, and with Monty Program and SkySQL's efforts to spread the word, hopefully the conversation will change. Currently a lot of people are unaware of alternatives to the GPL/dual connectors from Oracle, although I was surprised to get multiple inquiries about MariaDB's connectors.

The major hurdle I would see going forward is likely to be whether it is legally safe to use these client connectors. You probably know as well as I do that everyone wants you to give them the answer and make the decision for them, because they are afraid of the GPL. This is why I made the offhand remark in my blog that I don't give GPL advice anymore. People read the GPL (and LGPL) and decide that it's too scary to rely on their own knowledge and ability to decide. I predict that this will remain the biggest problem for adoption, and some portion of users will always decide that they just feel less afraid to pay Oracle the licensing fee than to worry about whether they're liable to some subtlety they think they might be missing.

Making the origin and licensing of the code as crystal-clear as possible is probably the only way to combat this. But regardless, the advantage will always be in Oracle's court, and the burden of proof that it's OK to use an alternative will always create gravity towards Oracle's salespeople. We will probably have to accept that the uphill battle against Fear, Uncertainty, and Doubt is not ours to fight past a certain point.

Still, as I said, it is heartening that so many people are already aware of the alternatives.

- Baron Schwartz

January 6, 2013 at 1:36 AM  

Hi Ivan,

Great work!
It is my personal view that when you fork/extend an open source project, you should keep to the same license under which it was released. This shows respect to the owner of the project, who can, if they wish so, best integrate your changes.

Otherwise it's best to move from "mroe restrictive" to "less restrictive", such as the case in Google Patches or Facebook Pacthes, who extend the MySQL GPL version via BSD, which is less restrictive.

In your case, you have taken an almost non-restrictive licensed (BSD) project, and have licensed your fork as more restrictive (LGPL). This does not give the Drizzle folks the chance to integrate your changes. But you were relying on their work, so there's some unappreciative note to it.

January 6, 2013 at 9:01 AM  

Hi Ivan,
I would like to echo Mark that this is a great description. The work you describe will be helpful to many applications. The lack of strict compatibility with Connector/J is an issue with drizzle JDBC for users who cannot easily update their data access code as we have on Tungsten. Thank you for posting in such detail.

I think you should consider reverting to BSD licensing. This would enable those of us already on the drizzle driver to collaborate more easily, as I indicated in my blog article. Doing so seems more consistent with the stated objective of creating a viable alternative to Connector/J. That's a very worthy goal for a variety of reasons.

January 6, 2013 at 5:14 PM  

Hi Giuseppe,

I am sorry if the answer is not satisfactory, I can only try to rephrase:
- The JDBC connector project was mainly led by SkySQL, not by Monty Program, so it us who deviated from the original Drizzle connector. We were focused on MySQL, not on Drizzle, we did changes to make the connector palatable for MySQL connections, not for Drizzle. It made sense, in our mind, to do a fork to work on a MySQL connector more than a Drizzle connector. Looking back, somebody from the Drizzle project may strip some features that are not really interesting for them.
- Re the licensing, we thought LGPL was more in line with what we were doing. Should we revert to BSD? I do not know, I will ask our lawyer again what are the implications, since the recommendation came specifically from their office.
- No private coding in MP, as I already said this has been primarily a SkySQL project. We have so much work to do, we were always wondering if the software was really ready for general distribution.

I hope these answer can satisfy you a bit more.

January 6, 2013 at 7:56 PM  

Hi Baron,

I hear what you are saying about the licensing and the origin of the code.

We will do our best to not repeat the misunderstanding in the future.

January 6, 2013 at 8:05 PM  

Hi Shlomi,

I understand your point and it makes sense. As I already mentioned, we thought that the features that we added would have been interesting for MySQL than for Drizzle. That said, yours is a fair point. I will check again what we can do with the licensing.

January 6, 2013 at 8:21 PM  

Hi Robert,

I will check what are the implications of reverting to BSD. Monty Program will have the final answer on this topic.

January 6, 2013 at 8:27 PM  

When you add a substantial work to a software project, you also need to ensure that the end license of the project match the goals you have. If the original license doesn't match your goals, you unfortunately have no other option than to fork.

For SkySQL, when the project was released, LGPL was a better license for them as it guarantees more freedom of the code while still allowing closed source applications to use it.

As the Drizzle JDBC driver has not been actively maintained, it was originally for SkySQL a better business decision to do a full fork than to work on the Drizzle JDBC project. This was the only way to ensure that the new driver would surely get in all the needed code, including code that the Drizzle developers may not have approved upon. It also ensured that SkySQL could guarantee to their customer that new code or new version would not break their applications.

That said, no one as far as I know has asked us to get some or all of the created patches to be included to the Drizzle project under BSD. If someone would be interested to integrate those patches back into the Drizzle JDBC code base, that could probably be arranged.

January 7, 2013 at 9:46 PM  

I am confused why Monty Program should have the authority to determine the licensing, if it was SkySQL who did the development during the last year or so? Does Monty Program now own the rights to the code SkySQL developed? I think this was one of the things kind of nagging me in the back of my mind. (If so, I don't object to this, it's just one of those points that makes me feel like I am missing some of the details.)

- Baron Schwartz

January 8, 2013 at 2:13 AM  

Baron,

SkySQL developed the JDBC Connector (well, some people from Monty's team helped, but it was led by us), but the connector is now owned by Monty Program. We thought it was more consistent with the rest of the work done at Monty Program, i.e. the C Connector and the MariaDB Server.

The licensing has been initially decided together, and we thought that LGPL would have been consistent with the other projects. As you can see from Monty's comment here, they are open to help with the integration of the patches back in the drizzle code.

-ivan

January 14, 2013 at 12:03 PM  

@Monty, I completely understand the rationale for forking based on wanting to have a different feature set. However, like others I don't understand the licensing decision. How would we port changes back to the BSD drizzle driver? I would have asked this question some time ago but was unaware of the SkySQL effort before their announcement. It then took a while to figure out how the code had branched.

Meanwhile, the drizzle JDBC project *is* maintained and we use it for Tungsten as I have said publicly a number of times including in comments on your blog (http://monty-says.blogspot.com/2010/12/in-search-of-bsdlgplapache-licensed.html). It seems we could as a community communicate a bit better on these types of efforts. It would save a lot of confusion and also make those of us depending on drizzle JDBC feel a bit more comfortable about migrating to the MariaDB driver.

January 14, 2013 at 11:33 PM  

Newer Post Older Post Home