[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Reply to copyright concerns for libzdb in Seafile



Hi all,

I'm Jonathan Xu, main developer of Seafile. I personally wrote the replacement code to libzdb.

Sorry for not replying to the original thread since I'm late to the discussion and was not in the mailing list before. The original discussion can be found here: https://lists.debian.org/debian-legal/2019/05/msg00002.html

I'll reply to the concerns and evidence below.

Concern:
------------

We later discovered that the code that replaced libzdb is mostly a copy of libzdb's code and structures. This stand in contrast to the statement “we completely replaced libzdb with our own code.”  [4]

Libzdb is licensed under GPLv3. Copying and modifying GPL code is perfectly fine as long as the original copyright notice and license are kept. Unfortunately, this is not what the Seafile team did. Instead they copied code from libzdb, removed the copyright notice, claimed the code as their own and re-license it under another license. 
Evidence: 
-------------

To do a side by side comparison I’m going to use Seafile’s version of libzdb which they forked on GitHub [6] at version 2.11.1 and based their new code on and claimed as their own [5]. The comparison is going to be against our same version on Bitbucket. 

Libzdb is a database connection pool library which consists of 4 major components: ConnectionPool, Connection, ResultSet and PreparedStatement. Forward declared data structures are used to abstract the concrete database implementation. These components together with their method association are quite unique for a _C_ database connection pool library as far as I know. The Seafile "rewrite" uses the exact same components and method association between components with modest renaming. Mostly by going from camel-case to snake-case.

**** Seafile's reply

When we rewrote the database access layer, we did learn some ideas from libzdb. But we did a lot of redesign of the interface and architecture, to make it better suit our needs and make it easier for us to understand.

To be honest I didn't fully understand the code of libzdb at the time of writing our own access layer. The code libzdb uses some  unusual coding techniques in C languaage. For example it implements its own Exception to handle errors. And they widely use object-oriented design patterns like delegation paradigms in the code.

So I admit that I borrow some ideas from libzdb, but all the code are written by myself so that I can understand it and maintain it in the future. It would be hard to believe someone can copy some code that he/she doesn't fully understand, make modifications and make it work flawlessly.

I also want to say that the interfaces of libzdb are not so unique, as they're very natural and common choices to provide a "middle ground" for all the APIs of different databases. Connection pools, connection, prepared statement, result set, are all basic concepts in DBMS APIs. What libzdb and us do are just to implement some "wrapper" to these APIs, to hide the differences among DBMS.

Let's first compare the interfaces of our db access layer and libzdb's. The source code are referenced below.

- Seafile: https://github.com/haiwen/seafile-server/blob/master/common/db-wrapper/db-wrapper.h
- libzdb: https://github.com/haiwen/libzdb/blob/master/src/db/PreparedStatement.h , https://github.com/haiwen/libzdb/blob/master/src/db/Connection.h , https://github.com/haiwen/libzdb/blob/master/src/db/ResultSet.h , https://github.com/haiwen/libzdb/blob/master/src/db/ConnectionPool.h

We'll go into details of comparing each interfaces and their implementation later. But as you can find by taking a quick look at the source code, you'll find that the interfaces are already very different except from the conceptual level.

First of all, since we don't use Exceptions to report errors, we explicitly include a GError **error parameter in each function. E.g. 

ResultSet *
db_connection_execute_query (DBConnection *conn, const char *sql, GError **error);

Secondly, the data structures we define doesn't use "delegate", compared to libzdb's.

--- Seafile
struct DBStmt {
    ResultSet *result_set;
};

--- Libzdb
#define T PreparedStatement_T
struct PreparedStatement_S {
    Pop_T op;
    ResultSet_T resultSet;
    PreparedStatementDelegate_T D;
};

Our DBStmt structure is defined to be the "parent" of all concrete DB statement structures, like 

typedef struct MySQLDBStmt {
    DBStmt parent;
    int param_count;
    MYSQL_STMT *stmt;
    MYSQL_BIND *bind;
} MySQLDBStmt;

Compared to libzdb's

#define T PreparedStatementDelegate_T
struct T {
    int maxRows;
    int lastError; 
    int paramCount;
    param_t params;
    MYSQL_STMT *stmt;
    MYSQL_BIND *bind;
};

I don't understand how the delegate works here :-(

You may find some similarity in the fields of both structures. But I think they're reasonable similarity due to the fact that we both need to call the same set of APIs provided by the databases.

Below I'll reply inline to the evidence provided by libzdb authors.

I’m going to limit the comparison somewhat for brevity, but it should be enough to demonstrate copyright concern. The full comparison can be done by comparing [5] and [7]. 


1. The Connection Pool has two significant methods:

- Get a connection from the pool

a:libzdb: https://bitbucket.org/tildeslash/libzdb/src/2958e023fcee44f313e6d3f3592b02cc06783e0f/src/db/ConnectionPool.c#lines-314

a:seafile: https://github.com/haiwen/seafile-server/blob/9f30eedc467bf5938ff57e24cee3a5b473e72314/common/db-wrapper/db-wrapper.c#L180

- And return a connection to the pool

b:libzdb: https://bitbucket.org/tildeslash/libzdb/src/2958e023fcee44f313e6d3f3592b02cc06783e0f/src/db/ConnectionPool.c#lines-345

b:seafile: https://github.com/haiwen/seafile-server/blob/9f30eedc467bf5938ff57e24cee3a5b473e72314/common/db-wrapper/db-wrapper.c#L220

Apart from Seafile using glib array and libzdb using its own vector module the above demonstrate copy of code with the same logic, method and variable names. Libzdb’s Connection_setAvailable is equal to their conn->is_available = FALSE; And our LOCK macro is just pthread_mutex_lock. I.e. the same code and logic, just expanded and inlined. 


***** Seafile's reply

Connection pools are very common programming constructs seen in many programming languages. The implementations are more or less the same. They all use some data structure to store the connections, and having a lock. It's unnecessary to use another data structure (say hash table) just to be different.

And if you compare the interfaces of connection pool from libzdb and Seafile, you'll find the interfaces are quite different.

We have functions like 'db_conn_pool_new_mysql' and much less other functions. While libzdb has more functions.

- libzdb: https://github.com/haiwen/libzdb/blob/master/src/db/ConnectionPool.h
- Seafile: https://github.com/haiwen/seafile-server/blob/master/common/db-wrapper/db-wrapper.h

What we have in Seafile:

- db_conn_pool_new_mysql
- db_conn_pool_new_pgsql
- db_conn_pool_new_sqlite

While there are many more functions in libzdb, they don't have the above 3 functions.

2. Connection

In libzdb a Connection has three significant  methods, Connection_execute, Connection_executeQuery and Connection_prepareStatement. Seafile has the same methods implemented in the same way

a:libzdb: https://bitbucket.org/tildeslash/libzdb/src/2958e023fcee44f313e6d3f3592b02cc06783e0f/src/db/Connection.c#lines-308

a:seafile: https://github.com/haiwen/seafile-server/blob/9f30eedc467bf5938ff57e24cee3a5b473e72314/common/db-wrapper/db-wrapper.c#L258

b:libzdb: https://bitbucket.org/tildeslash/libzdb/src/2958e023fcee44f313e6d3f3592b02cc06783e0f/src/db/Connection.c#lines-323

b:seafile: https://github.com/haiwen/seafile-server/blob/9f30eedc467bf5938ff57e24cee3a5b473e72314/common/db-wrapper/db-wrapper.c#L350

Seafile has not copied all methods from libzdb’s Connection, but Connection_ping is is there as well as Connection_beginTransaction, Connection_rollback and Connection_commit

c:libzdb: https://bitbucket.org/tildeslash/libzdb/src/2958e023fcee44f313e6d3f3592b02cc06783e0f/src/db/Connection.c#lines-228

c:seafile: https://github.com/haiwen/seafile-server/blob/9f30eedc467bf5938ff57e24cee3a5b473e72314/common/db-wrapper/db-wrapper.c#L241


**** Seafile's reply

As I said in the beginning, connection, execute sql, prepare statement, ping are just basic concepts borrowed from DBMS client libraries like MySQL. If you look at MySQL's C API manual, you'll find all of them. They're natural choice for the function names.
What is special about our transaction code in libzdb is that we keep a counter called “isInTransaction” which Seafile has as “in_transaction”. 

d:libzdb: https://bitbucket.org/tildeslash/libzdb/src/2958e023fcee44f313e6d3f3592b02cc06783e0f/src/db/Connection.c#lines-252

d:seafile: https://github.com/haiwen/seafile-server/blob/9f30eedc467bf5938ff57e24cee3a5b473e72314/common/db-wrapper/db-wrapper.c#L424


**** Seafile's reply

We also need this field since we have to know whether to rollback a transaction when we close a connection. It seem to me there are no other good choice to implement the same goal.
3. ResultSet and PreparedStatment are also clearly copied from libzdb. We see ResultSet_next, ResultSet_getString, ResultSet_getInt etc and PreparedStatement_setString, PreparedStatement_setInt etc. Also PreparedStatement_executeQuery is faithfully copied:

a:libzdb: https://bitbucket.org/tildeslash/libzdb/src/2958e023fcee44f313e6d3f3592b02cc06783e0f/src/db/PreparedStatement.c#lines-122

a:seafile: https://github.com/haiwen/seafile-server/blob/9f30eedc467bf5938ff57e24cee3a5b473e72314/common/db-wrapper/db-wrapper.c#L392


**** Seafile's reply

Again the concepts of prepared statement and result set are from MySQL, PostgreSQL etc.

About the code similarity, I think it arises from the fact that there are only very few ways to implement a db wrapper library. We all need a function pointer table to abstract the details of various DBMS. And given the similarity of the concepts of different DBMS, we all need to define some functions: execute, query, prepare_statement, etc.
4. Concrete Database implementations. 

When it comes to the concrete database implementation for SQLite, MySQL and PostgreSQL the same copy of code is repeated. For example, MysqlResultSet_new.

a:libzdb: https://bitbucket.org/tildeslash/libzdb/src/2958e023fcee44f313e6d3f3592b02cc06783e0f/src/db/mysql/MysqlResultSet.c#lines-102

a:seafile: https://github.com/haiwen/seafile-server/blob/9f30eedc467bf5938ff57e24cee3a5b473e72314/common/db-wrapper/mysql-db-ops.c#L189

and the special way we ensure column field capacity in MySQL where they very telling even has copied our comment: 

b:libzdb: https://bitbucket.org/tildeslash/libzdb/src/2958e023fcee44f313e6d3f3592b02cc06783e0f/src/db/mysql/MysqlResultSet.c#lines-84

b:seafile: https://github.com/haiwen/seafile-server/blob/9f30eedc467bf5938ff57e24cee3a5b473e72314/common/db-wrapper/mysql-db-ops.c#L277

**** Seafile's reply

The code similarity is demanded by the MySQL client API. If we don't construct the data structures in this way, we cannot correctly call the MySQL library.

Summary:
==================
The code similarity is largely due to below facts:
1. Various DBMS have some similar interface concepts. So it's natural for anyone who wants to implement db wrapper libraries to abstract the interfaces this way. And we made our own changes to the interface and architecture. It's not a direct copy of the interfaces and architecture. (BTW, in my humble opinion, interfaces and architecture is not covered by GPL license.)
2. Another factor is that in C language there are only limited ways to architect the code of db wrapper libraries.
3. Many code are required to be written as it is since we have to conform to DBMS's client API.

We're happy to attribute to the ideas we learn from libzdb. And we'll try to change the code that look too similar to libzdb's.

--
Regards
Jonathan Xu



Reply to: