Sep 18, 2011

XML to structure and back again.

Hi again.

After a year long pause, I have started with this again.
And so I now have my first test of the XML-to-SQL objects-to-XML.
And it works to both parse and generate it back from the object structure and getting a file identical to the first.

The XML looks like this:

<?xml version="1.0" ?>
<sql:Statement xmlns:sql="http://www.unifiedbpm.se/XMLschema/DAL/SQL"
xmlns:xsi="http://www.w3.org/2001/XMLSchema" xsi:schemaLocation="http://www.unifiedbpm.se/XMLschema/DAL/SQL ../SQL.xsd">
<sql:Verb_CREATE_TABLE>
<sql:Name>Table1</sql:Name>
<sql:Constraints>
<sql:Parameter_Constraint>
<sql:CheckConditions />
<sql:Constraint_Type>PRIMARY KEY</sql:Constraint_Type>
<sql:References>
<sql:str>Table1ID</sql:str>
</sql:References>
<sql:Name>PK_Table1_Table1ID</sql:Name>
</sql:Parameter_Constraint>
<sql:Parameter_Constraint>
<sql:CheckConditions />
<sql:Constraint_Type>FOREIGN KEY</sql:Constraint_Type>
<sql:References>
<sql:str>Table1ID</sql:str>
<sql:str>Table2</sql:str>
<sql:str>Table2ID</sql:str>
</sql:References>
<sql:Name>FK_Table1_Table1ID_Table2_Table2ID</sql:Name>
</sql:Parameter_Constraint>
<sql:Parameter_Constraint>
<sql:CheckConditions>
<sql:Parameter_Conditions>
<sql:Conditions>
<sql:Parameter_Condition>
<sql:Operator>></sql:Operator>
<sql:ANDOR>AND</sql:ANDOR>
<sql:Right>
<sql:Parameter_Numeric>
<sql:Numeric_Value>2.4</sql:Numeric_Value>
</sql:Parameter_Numeric>
</sql:Right>
<sql:Left>
<sql:Parameter_Numeric>
<sql:Numeric_Value>1.3</sql:Numeric_Value>
</sql:Parameter_Numeric>
</sql:Left>
</sql:Parameter_Condition>
<sql:Parameter_Condition>
<sql:Operator>LIKE</sql:Operator>
<sql:ANDOR>AND</sql:ANDOR>
<sql:Right>
<sql:Parameter_String>
<sql:String_Value>%</sql:String_Value>
</sql:Parameter_String>
</sql:Right>
<sql:Left>
<sql:Parameter_Identifier>
<sql:Identifier>T1.firstname</sql:Identifier>
</sql:Parameter_Identifier>
</sql:Left>
</sql:Parameter_Condition>
</sql:Conditions>
</sql:Parameter_Conditions>
</sql:CheckConditions>
<sql:Constraint_Type>CHECK</sql:Constraint_Type>
<sql:References>
<sql:str>Table1ID</sql:str>
</sql:References>
<sql:Name>CK_Table1_Name</sql:Name>
</sql:Parameter_Constraint>
<sql:Parameter_Constraint>
<sql:CheckConditions />
<sql:Constraint_Type>UNIQUE</sql:Constraint_Type>
<sql:References>
<sql:str>Table1ID</sql:str>
</sql:References>
<sql:Name>UQ_Table1_Name</sql:Name>
</sql:Parameter_Constraint>
</sql:Constraints>
<sql:Columns>
<sql:Parameter_ColumnDefinition>
<sql:Datatype>serial</sql:Datatype>
<sql:Default />
<sql:NOTNULL>True</sql:NOTNULL>
<sql:Name>Table1ID</sql:Name>
</sql:Parameter_ColumnDefinition>
<sql:Parameter_ColumnDefinition>
<sql:Datatype>string(400)</sql:Datatype>
<sql:Default />
<sql:NOTNULL>False</sql:NOTNULL>
<sql:Name>Table1Name</sql:Name>
</sql:Parameter_ColumnDefinition>
<sql:Parameter_ColumnDefinition>
<sql:Datatype>timestamp</sql:Datatype>
<sql:Default />
<sql:NOTNULL>False</sql:NOTNULL>
<sql:Name>Table1Changed</sql:Name>
</sql:Parameter_ColumnDefinition>
</sql:Columns>
<sql:postSQL />
</sql:Verb_CREATE_TABLE>
</sql:Statement>



As you see, it is basically an abstraction of the SQL language.
It should be pretty easy to write a GUI for generating the XML that would be similar to actually writing SQL.
With that as input, the following SQLs are generated when calling any of the SQL objects' AsSQL-method(note the oracle syntax):


SQL for the MySQL database:

CREATE TABLE Table1 (
`Table1ID` INTEGER AUTO_INCREMENT NOT NULL,
`Table1Name` VARCHAR(400),
`Table1Changed` TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
CONSTRAINT `PK_Table1_Table1ID` PRIMARY KEY (Table1ID),
CONSTRAINT `FK_Table1_Table1ID_Table2_Table2ID` FOREIGN KEY (Table1ID) REFERENCES Table2(Table2ID),
CONSTRAINT `CK_Table1_Name` CHECK ((1.3 > 2.4) AND (T1.firstname LIKE '%ercedes')),
CONSTRAINT `UQ_Table1_Name` UNIQUE (Table1ID)
)


SQL for the PostgreSQL database:



CREATE TABLE Table1 (
"Table1ID" serial NOT NULL,
"Table1Name" VARCHAR(400),
"Table1Changed" DATETIME DEFAULT(NOW()),
CONSTRAINT "PK_Table1_Table1ID" PRIMARY KEY (Table1ID),
CONSTRAINT "FK_Table1_Table1ID_Table2_Table2ID" FOREIGN KEY (Table1ID) REFERENCES Table2(Table2ID),
CONSTRAINT "CK_Table1_Name" CHECK ((1.3 > 2.4) AND (T1.firstname ILIKE '%
ercedes')),
CONSTRAINT "UQ_Table1_Name" UNIQUE (Table1ID)
)


SQL for the Oracle database:



CREATE TABLE Table1 (
"Table1ID" integer NOT NULL,
"Table1Name" VARCHAR2(400),
"Table1Changed" DATETIME DEFAULT(NOW()),
CONSTRAINT "PK_Table1_Table1ID" PRIMARY KEY (Table1ID),
CONSTRAINT "FK_Table1_Table1ID_Table2_Table2ID" FOREIGN KEY (Table1ID) REFERENCES Table2(Table2ID),
CONSTRAINT "CK_Table1_Name" CHECK ((1.3 > 2.4) AND (T1.firstname LIKE '%
ercedes' escape 'C')),
CONSTRAINT "UQ_Table1_Name" UNIQUE (Table1ID)
)
CREATE SEQUENCE seq_Table1_Table1ID_DAL_auto_increment
start with 1
increment by 1
nomaxvalue;
CREATE TRIGGER tr_Table1_Table1ID_DAL_auto_increment
BEFORE INSERT ON Table1 FOR EACH ROW BEGIN
SELECT seq_Table1_Table1ID_DAL_auto_increment.nextval INTO :new.id FROM dual;
end;


SQL for the DB2 database:



CREATE TABLE Table1 (
"Table1ID" INT GENERATED ALWAYS AS IDENTITY NOT NULL,
"Table1Name" VARCHAR(400),
"Table1Changed" TIMESTAMP,
CONSTRAINT "PK_Table1_Table1ID" PRIMARY KEY (Table1ID),
CONSTRAINT "FK_Table1_Table1ID_Table2_Table2ID" FOREIGN KEY (Table1ID) REFERENCES Table2(Table2ID),
CONSTRAINT "CK_Table1_Name" CHECK ((1.3 > 2.4) AND (T1.firstname LIKE '%
ercedes')),
CONSTRAINT "UQ_Table1_Name" UNIQUE (Table1ID)
)


SQL for the SQLserver database:



CREATE TABLE Table1 (
[Table1ID] int IDENTITY(1,1) NOT NULL,
[Table1Name] varchar(400),
[Table1Changed] DATETIME DEFAULT(NOW()),
CONSTRAINT [PK_Table1_Table1ID] PRIMARY KEY (Table1ID),
CONSTRAINT [FK_Table1_Table1ID_Table2_Table2ID] FOREIGN KEY (Table1ID) REFERENCES Table2(Table2ID),
CONSTRAINT [CK_Table1_Name] CHECK ((1.3 > 2.4) AND (T1.firstname LIKE '%
ercedes')),
CONSTRAINT [UQ_Table1_Name] UNIQUE (Table1ID)
)


Small disclaimer: I haven't run these scripts on all database architectures, am currently rebuilding the testing environment.

Oct 26, 2010

XML, back and forth and here again.

Hi,
Lately i've busy creating, validating and parsing XML.

There are now two schemas used when defining upgrade files:
http://www.unifiedbpm.se/XMLschema/DAL/SQL.xsd
http://www.unifiedbpm.se/XMLschema/DBUpgrader/DBUpgrader.xsd

The first defines the, so far supported, SQL, and the second defines the DBUpgraders upgrade file.

As always, I had less time and it took more effort than I thought, but now I am closing in on having a working upgrader.

It is pretty cool to watch integration tests run that creates the same structure on five different database platforms, inserts data and selects back and actually works.
A bit left to do, but after that, this might actually be a useful functionality.

That's all folks, bye!

Sep 29, 2010

Road map.

Hi,

SO FAR

I've gotten to a point where I have proven and implemented the two most important technical concepts needed for the development of Unified BPM :
1. Database abstraction layer
2. Database upgrade management including an (almost finished) XML schema for upgrade files. The XML schema is to allow for easy upgrade client development, one should not have to dabble with XML file editing. Interesting point is that the code generates the schema mostly by itself, based on its functionality, allowing for simpler maintenance.

This functionality extends to the MySQL, PostgreSQL, DB2, Oracle and SQL server platforms, except for upgrade management, where Oracle is partially unsupported due to its aforementioned lack of auto increment shortcuts. Maybe someone knows a way around that problem?

ROAD MAP


At this stage, then, I am in the really exciting position of being able to present a short road map. It is a seriously pragmatic one, one that is geared at getting something basic working as soon as possible:

1. Basic database abstraction/upgrade management. Creating tables, selecting and inserting data. I won't do anything more advanced with this unless other parts of UBPM needs more functionality. In my experience and according to my expectation, needing advanced SQL statements in the beginning of a project is mostly due to poor database design. Also, I am consciously keeping the database as simple as possible at this stage. I am nearly done with this stage. I have some structural stuff left on the database upgrader, but then I will move on.

2. The next step will be the messaging system of UBPM. I will begin with deciding the basic structure of communication within UBPM. I have already figured most of it out, but now I need to scale that down to its bare necessities in order to start somewhere.
Note that no logging of communications will be implemented in this stage.
The test for this will be to send a file from a listener on one host to another. One listener will run on a Linux host, and the other on a Windows host.

3. Not only files are to be sent, so next up will be to SELECT a dataset on one database server of one type, and INSERT the same data on an other of a different type and compare. This will put the DAL through its paces and probably expose any problems there.

4. By now, the DAL has probably been tested enough to enable me to start working on the Unified BPM database structure. Its primary purpose will be to log whatever happens on the system and that will be what I'll be implementing first. I will test this, and the database upgrade process by installing and upgrading the database and running the tests of points 2 and 3.
At this stage, I will be very happy, since I would now know that all the important concepts of UBPM works. As I see it, the largest technical hurdles are now negotiated and I will have a working integration test environment.


5. GUI. I start to develop a simple web based GUI that enables the generation of the XML database upgrade files. Because, from now on, there will be significantly more database development.
In general, it will be more straightforward development, and perhaps the perfect moment to invite others to join the project. It will certainly still be small enough to be easily influenced by new ideas and feedback.

Sep 14, 2010

Connection(s) established. Initiating execution.

Hi y'all.

So now the next phase is finished.
I now implemented and tested connections to all supported database platforms.
I only used multi-platform python-components.
These are:
  • PostgreSQL(psycopg2, native)
  • MySQL(MySQLdb, native)
  • SQL Server(pyodbc using FreeTDS)
  • DB2(pyodbc, needs libdb2.so)
  • Oracle(cx_Oracle, native, needs oracle instant client)
Getting all the databases running on my network was a real chore, to be honest.
Especially DB2 and Oracle were quite cumbersome to connect to, and needed lots of proprietary stuff installed on the client. Also, they don't really support debian/ubuntu.
However, now that I know how everything works I will try to automate client installations, or at least document them well, so Unified BPM users won't have to go through all the hassle I had to.

Anyway, next now is to define a small XML schema for generalizing of database upgrade management.
My thought is that the same XML should be able to manage database upgrades for multiple platforms, through the functionality of the database abstraction layer.
I don't want some Unified BPM components to end up having half-a$$ed support for specific databases, like it is for the Drupal project, for example.
Also, this makes it possible to make really kewl database tests.

Aug 17, 2010

Done with first phase of SQL generation

So there.

Now I have, in python, created a class which does only the things I need to at begin with the first database part of Unified BPM, creating and querying simple database structures.
The class can currently create these kinds of SQL statements(and then some):
  • SELECT - with where, expressions, conditions, CASE, function calls, joins, subselects.
  • CREATE TABLE - with defaults, primary keys, foreign keys and autoincrement
  • CREATE INDEX - Unique, Clustered, Non-Clustered
Supported databases for this functionality is:
  • Full
    • MySQL
    • PostgreSQL
    • DB2
    • SQL Server
  • Partial DDL
    • Oracle(due to identifier limitations and auto-increment hassle)
Now I will use this generator to make test for the DAL:s connection handling.
I now have working installations of all five databaseservers and will use them to set up integration tests for all flavours.

In practice, Partial DDL means that Unified BPM won't run on that platform but will be able to communicate with it. I simply won't commit to restricting myself to 30 bytes long identifiers. Somebody please think of the FK-names!

Aug 15, 2010

Adding DB2 support

Well, I though that while I'm at it, I might as well add DB2 support as well.
So now I will, it seems to have a pretty much straightforward SQL implementation.

By the way, It is kind of interesting to see what free(as in beer) offerings the different database vendors have, it is a bit revealing with regards to their attitude towards developers.

Aug 14, 2010

Dropping oracle support(at least DDL)

Hi,
I have decided to drop Oracle support.
Last time I used Oracle was many years ago and back then I had none of the experience I do today.
The reason? The extremely silly 30-character length restriction on identifiers.

Oracle restricts the length of identifiers to 30 characters, which is extremely strange, this being 2010 and all.
Large systems with lots of complex and specific data needs long identifier names. I once belonged to the school of abbreviation, where everything got abbreviated down to a length of about 20 characters. I have since met THE REAL WORLD of huge databases and realised that the only reason to abbreviate identifiers is that is looks better when you list them.
However, mapping them to their real-world counterpart data(rw_ctrprt_data?!?) will, for the next developer, be a error-prone and boring chore.

Recently, a collegue of mine came to me with an identifier which unabbreviated would become something like 120 characters long. I had to tell him to go with it, since there was no way around it.

The typical reason for an identifier to become this long is that it describes an exception of some sort, like: "add taxes if the counterparty's custodian has the same currency than the traders custodian".
Usually, it is some real-word legal rule that has to be followed.
And believe me, we use all the tricks, like dynamic structures and so forth, to avoid having to name columns like this, so don't give me any crap of "with the right design", because reality is different from school.
But many times, one have to do it that way. Yeah, one could use column description, but they aren't as useful as one might think.

Anyway, I will now drop Oracle support. Too bad I've already spent hours on making the auto-increment support.
Luckily, it was really straightforward to uninstall the Oracle software, kudos to them. :-)

EDIT:
Upon closer thought I will probably not drop Oracle support totally, but rather I will only have none or partial DDL support. It would be silly to exclude a very popular database from the databases supported by UnifiedBPM.