Oct 26, 2010

XML, back and forth and here again.

Lately i've busy creating, validating and parsing XML.

There are now two schemas used when defining upgrade files:

The first defines the, so far supported, SQL, and the second defines the DBUpgraders upgrade file.

As always, I had less time and it took more effort than I thought, but now I am closing in on having a working upgrader.

It is pretty cool to watch integration tests run that creates the same structure on five different database platforms, inserts data and selects back and actually works.
A bit left to do, but after that, this might actually be a useful functionality.

That's all folks, bye!

Sep 29, 2010

Road map.



I've gotten to a point where I have proven and implemented the two most important technical concepts needed for the development of Unified BPM :
1. Database abstraction layer
2. Database upgrade management including an (almost finished) XML schema for upgrade files. The XML schema is to allow for easy upgrade client development, one should not have to dabble with XML file editing. Interesting point is that the code generates the schema mostly by itself, based on its functionality, allowing for simpler maintenance.

This functionality extends to the MySQL, PostgreSQL, DB2, Oracle and SQL server platforms, except for upgrade management, where Oracle is partially unsupported due to its aforementioned lack of auto increment shortcuts. Maybe someone knows a way around that problem?


At this stage, then, I am in the really exciting position of being able to present a short road map. It is a seriously pragmatic one, one that is geared at getting something basic working as soon as possible:

1. Basic database abstraction/upgrade management. Creating tables, selecting and inserting data. I won't do anything more advanced with this unless other parts of UBPM needs more functionality. In my experience and according to my expectation, needing advanced SQL statements in the beginning of a project is mostly due to poor database design. Also, I am consciously keeping the database as simple as possible at this stage. I am nearly done with this stage. I have some structural stuff left on the database upgrader, but then I will move on.

2. The next step will be the messaging system of UBPM. I will begin with deciding the basic structure of communication within UBPM. I have already figured most of it out, but now I need to scale that down to its bare necessities in order to start somewhere.
Note that no logging of communications will be implemented in this stage.
The test for this will be to send a file from a listener on one host to another. One listener will run on a Linux host, and the other on a Windows host.

3. Not only files are to be sent, so next up will be to SELECT a dataset on one database server of one type, and INSERT the same data on an other of a different type and compare. This will put the DAL through its paces and probably expose any problems there.

4. By now, the DAL has probably been tested enough to enable me to start working on the Unified BPM database structure. Its primary purpose will be to log whatever happens on the system and that will be what I'll be implementing first. I will test this, and the database upgrade process by installing and upgrading the database and running the tests of points 2 and 3.
At this stage, I will be very happy, since I would now know that all the important concepts of UBPM works. As I see it, the largest technical hurdles are now negotiated and I will have a working integration test environment.

5. GUI. I start to develop a simple web based GUI that enables the generation of the XML database upgrade files. Because, from now on, there will be significantly more database development.
In general, it will be more straightforward development, and perhaps the perfect moment to invite others to join the project. It will certainly still be small enough to be easily influenced by new ideas and feedback.

Sep 14, 2010

Connection(s) established. Initiating execution.

Hi y'all.

So now the next phase is finished.
I now implemented and tested connections to all supported database platforms.
I only used multi-platform python-components.
These are:
  • PostgreSQL(psycopg2, native)
  • MySQL(MySQLdb, native)
  • SQL Server(pyodbc using FreeTDS)
  • DB2(pyodbc, needs libdb2.so)
  • Oracle(cx_Oracle, native, needs oracle instant client)
Getting all the databases running on my network was a real chore, to be honest.
Especially DB2 and Oracle were quite cumbersome to connect to, and needed lots of proprietary stuff installed on the client. Also, they don't really support debian/ubuntu.
However, now that I know how everything works I will try to automate client installations, or at least document them well, so Unified BPM users won't have to go through all the hassle I had to.

Anyway, next now is to define a small XML schema for generalizing of database upgrade management.
My thought is that the same XML should be able to manage database upgrades for multiple platforms, through the functionality of the database abstraction layer.
I don't want some Unified BPM components to end up having half-a$$ed support for specific databases, like it is for the Drupal project, for example.
Also, this makes it possible to make really kewl database tests.

Aug 17, 2010

Done with first phase of SQL generation

So there.

Now I have, in python, created a class which does only the things I need to at begin with the first database part of Unified BPM, creating and querying simple database structures.
The class can currently create these kinds of SQL statements(and then some):
  • SELECT - with where, expressions, conditions, CASE, function calls, joins, subselects.
  • CREATE TABLE - with defaults, primary keys, foreign keys and autoincrement
  • CREATE INDEX - Unique, Clustered, Non-Clustered
Supported databases for this functionality is:
  • Full
    • MySQL
    • PostgreSQL
    • DB2
    • SQL Server
  • Partial DDL
    • Oracle(due to identifier limitations and auto-increment hassle)
Now I will use this generator to make test for the DAL:s connection handling.
I now have working installations of all five databaseservers and will use them to set up integration tests for all flavours.

In practice, Partial DDL means that Unified BPM won't run on that platform but will be able to communicate with it. I simply won't commit to restricting myself to 30 bytes long identifiers. Somebody please think of the FK-names!

Aug 15, 2010

Adding DB2 support

Well, I though that while I'm at it, I might as well add DB2 support as well.
So now I will, it seems to have a pretty much straightforward SQL implementation.

By the way, It is kind of interesting to see what free(as in beer) offerings the different database vendors have, it is a bit revealing with regards to their attitude towards developers.

Aug 14, 2010

Dropping oracle support(at least DDL)

I have decided to drop Oracle support.
Last time I used Oracle was many years ago and back then I had none of the experience I do today.
The reason? The extremely silly 30-character length restriction on identifiers.

Oracle restricts the length of identifiers to 30 characters, which is extremely strange, this being 2010 and all.
Large systems with lots of complex and specific data needs long identifier names. I once belonged to the school of abbreviation, where everything got abbreviated down to a length of about 20 characters. I have since met THE REAL WORLD of huge databases and realised that the only reason to abbreviate identifiers is that is looks better when you list them.
However, mapping them to their real-world counterpart data(rw_ctrprt_data?!?) will, for the next developer, be a error-prone and boring chore.

Recently, a collegue of mine came to me with an identifier which unabbreviated would become something like 120 characters long. I had to tell him to go with it, since there was no way around it.

The typical reason for an identifier to become this long is that it describes an exception of some sort, like: "add taxes if the counterparty's custodian has the same currency than the traders custodian".
Usually, it is some real-word legal rule that has to be followed.
And believe me, we use all the tricks, like dynamic structures and so forth, to avoid having to name columns like this, so don't give me any crap of "with the right design", because reality is different from school.
But many times, one have to do it that way. Yeah, one could use column description, but they aren't as useful as one might think.

Anyway, I will now drop Oracle support. Too bad I've already spent hours on making the auto-increment support.
Luckily, it was really straightforward to uninstall the Oracle software, kudos to them. :-)

Upon closer thought I will probably not drop Oracle support totally, but rather I will only have none or partial DDL support. It would be silly to exclude a very popular database from the databases supported by UnifiedBPM.

Jul 25, 2010

More frustration and extremism.

Ok. In my last post I ranted about ranting about limitations in a specific database.
But sometimes, developers don't give a sh*t on how the users actually want to use the system they make.
Rather, they go to extreme lengths explaining why the user should do it another way instead.
Even though the users obviously don't want to do it that way.

For example, I am implementing column defaults in the DAL, and have finall gotten do make it work for MySQL.
What is the most usual default value? To me, and in all databases I have seen so far it is the current date and time.

I haven't thought about it much until now, since I had seen the TIMESTAMP datatype being used for that in MySQL.

I did not, however, consider that DEFAULT (Now()) wasn't even supported, since functions aren't allowed as default values!
So I will have to use another datatype, the afore mentioned timestamp, which I will transform into DATETIME DEFAULT (GETDATE()) when needed.
However, timestamp is completely inflexible, and have a different start date than the normal date time. So, it sucks and probably has to be handled separately later on.
But, screw everyting, I'll use it for now.

See http://bugs.mysql.com/bug.php?id=27645 for the reeaally long discussion.

Since many FOSS-systems, like mine, puts huge efforts in being database-agnostic, it is extremely annoying that the databases don't try meeting up the other way.
Like the stupid reasoning in the Oracle camp for not shortcutting auto incrementation for the condescending reason of teaching users how to do things properly.

The only database that seems to be fairly balanced, and doesn't create any significant problems for me, is PostgreSQL.

Now it seems that I am letting MSSQL off easy. But of course, MSSQL has the dubious advantage of almost not being designed at all, thus allowing for practically everything thrown on it at this the most basic level of database development.
It's when it gets advanced that MSSQL comes back and bites you like you've never been bitten with it's partial implementations. The insane PARTITION BY limitations almost killed me, for example.
That said, on this level it works well.


Jul 20, 2010

Hacking along, realising stuff.

I am doing a little more than is necessary on that DAL(database abstraction layer), because I have discovered it being pretty rewarding.

I look at myself as something of a pretty accomplished SQL dude, knowing more than perhaps sometimes is healthy about databases, but this has made me reflect on the structure of the SQL language. A refreshing angle for me, to say the least. Sometimes you do something so much that you just end up muttering about it shortcomings. Especially different implementations' shortcomings.

It's interesting to realise what really belong where, it was way harder than I thought to break it down in its logical pieces.
Its small things, like that a join is the same as a complete query, just a source of data treated by the query:

SELECT N.Name, A.Role
FROM Names N -- Source 1 data
JOIN Actors A ON A.PersonID = N.PersonID -- Source 2 data+conditions
WHERE N.Name = 'Clint Eastwood' -- Source 1 conditions
Being able to treat these sources in exactly the same way makes things way easier. Just an example. And Conditions are made of expressions that can have conditions in them, would be another example. Definitions, definitions.
There are more stuff like that in SQL-DML, but they are a bit harder to explain, so I won't.

Anyway, I solved the autoincrement issue with Oracle by having a postSQL property, simply generating code for creating the sequence and the trigger and then concatenating that to the end of the CREATE TABLE-SQL. Haven't tested that code very thoroughly yet though.

Another reason to stay with DAL a bit more is that I am thinking of perhaps creating a transformation toolchain for use in Unified BPM. Or at least be used for more than database upgrades.

Well. See ye later.

Jul 3, 2010

Please someone tell me I am wrong.

Here I was, happily coding along, when I got into something that possibly will force me to drop Oracle support: Oracle's implementation of sequences.

By not having built-in insert-support of auto-incrementing sequences like others do (PG,MY, SQL-se et.al) they leave the incrementing of primary key to the insert statement using seqname.nextval.

Yes, this could be made using a trigger. But to me, triggers are teh ugly shit, in general.
Having one in almost every table really stinks. It is bad form. Added a column? Tough luck, you're likely to have some trigger that needs that column added to.

I have seen the rationale for this, that it is better to have sequences table independent.
But that's missing the point. It's not about that, everybody else has that too.
It's about control and should be the OTHER WAY.

The sequencing should always happen automatically in those tables using them when records are inserted.
It should not be possible for an application to insert a custom value in a sequenced column. It should not be possible to, by accident, choose the wrong sequence.

This aspect even makes it dangerous.
Actually, most developers just use sequences to have a new value for each insert, not caring very much if there is a gap. The only time they get in trouble for using the wrong sequence is when they try to insert a value that already exists. Usually, that problem is not even a problem and almost immediately visible.

But in financial applications, for example, having gaps the sequence could have serious legal ramifications, possibly not seen until some reconciliation event occurs years later. And by then it have created situations that are almost impossible to rectify. And, even more importantly, explain, to the SEC or IRS when they come knocking. Where did you put those missing futures contracts now again? Where is that coupon? Hopefully someone will notice. But likely they won't since they are many working with the same sequence.

So to provide sequencing support for Oracle, the INSERT commands need to know lots of things about the table. And since I want to abstract the creation, I need to keep track of this weirdness.

I am thinking about automatically creating an very specifically named trigger but it seems so stupid and out of place. It really does not fit well with how DAL is supposed to work.

Aww crap.

Jun 13, 2010

Database Abstraction Layer - DAL


The last month has been somewhat difficult with regards to time available.
But I've still managed to churn out an early version of what will be the core of the database abstraction layer, the classes that generates the SQL for each database.
The code is currently supporting MySQL, PostgreSQL, Oracle and SQL Server.
Since they will be sent as parameters into DAL, I call them parameters.
They do, however, have more functionality than being mere parameters.
There is a base class, Parameter_Base, that all parameters inherits from, that has four methods:
  • AsMySQL()
  • AsPostGreSQL()
  • AsOracle()
  • AsSQLServer()
Each of these methods returns a database specific SQL-representation of the parameters contents.
There are classes like Parameter_Field, Parameter_Expression, Parameter_CASE, Parameter_Function and Parameter_SELECT and so on.

They use each other to generate, for example, a SELECT-statement.
Consequently, Parameter_SELECT has (currently) two properties, fields(list of Parameter_Field) and sources (list of Parameter_Source), which in turn uses other classes.
So when one calls the Parameter_SELECT.AsMySQL function, it loops through the objects in its fields and sources list and calls their .AsMySQL functions. This way, a MySQL specific SQL-representation of the Parameter_SELECT-object is generated.

The parameters: parameter.py
Some tests: parameter_tests.py
Crude documentation: Doc_20100613.pdf

So far, most elements of typical SELECT statements are covered, like expressions and joins.
Stuff like ORDER BY, GROUP BY and HAVING aren't, but from now on, it should be much more easy since they merely reuse the concepts I have already defined.
I'd have to admit, maybe the structure looks pretty natural when you look at it, but it was really quite hard to generalise and simplify the structure while maintaining all the functionality of the language.

Well. I suppose that was it.

May 16, 2010

DB issues and their solutions


  • What should the database look like?
  • How do you handle upgrades of the database structure?
  • How do you handle upgrades when both MySQL, PostgreSQL, SQL server and Oracle is supposed to be able to be the database backend? (I'd say DB2 as well, only I haven't got any experience with it, or test installations)
  • How do you test upgrades of the database structure?
  • How do you test upgrades of the database structure against different backends?


For Unified BPM, the solution to the upgrade problem is two-fold:

  1. A generalizing layer. A Database Abstraction Layer(DAL).

  2. A step-based upgrade solution that uses that layer to apply upgrades.

In my experience, and as long as you stay away from stored procedures, functions and stuff like that, and only have a simple database structure(which Unified BPM is supposed to), most updates only need standard SQL syntax, like CREATE TABLE, ALTER TABLE, INSERT and such. And a "custom" which takes whatever needs to be done, one SQL for each backend. Hopefully, "custom" should be rarely used.
So it should not be very difficult to write a layer that takes parameters and transforms them into the mentioned flavours of SQL. Yeah I know. Famous last words. :-)
I also now that there have been some attempts made, however, I will have a somewhat narrower approach. No bells and whistles what so ever.
I have also made an XML schema that should contain steps, parameters and such so now I am writing a utility uses the database abstraction layer to apply those steps on the database. No GUI yet, but I know that it will be needed later on.


A *full* Unified BPM integration test run will start with "CREATE database", each time building and populating the database from scratch using the update utility and many of the access objects' unit tests. Possibly, in a few years, one can start with a later version.


The database structure will be an old favourite, a *really* basic tree. nodeid and parentnodeid
  1. All objects will have a node in the tree. This makes the security layer easy to implement.
  2. All other data will be in sub tables.
  3. For auditing, the first, and hopefully last(fuzzy, see "custom" above, stuff to maintain on different backends) triggers of the database will be created and will.

Oh, well. Bye then!

May 12, 2010

At it again.


I have now started implementing the most basic functionality, the interface between the agent and the broker.

Some stuff, not anything fancy but hopefully someone gets a bit helped:
  1. I use Eclipse and the following builder script to update the development server:
    -vtr -e "ssh -i /path/to/client/certificate/id_rsa" /source/code/dir/ user@server.domain.se:/destinationdir
  2. Found out how to import relatively within a package. This is from the server unit test importing the other modules.
    from .server import *
    from .session import *
  3. I use the PyDev Eclipse plugin when doing Python.
  4. I use mod_wsgi for the web server scripting. Very versatile indeed.Example here.
  5. The broker server itself is only a class declaration. The access code for SOAP, JSON or whatever is in a separate layer, allowing for unit testing of high-level functionality without involving serialization. Many forget this separation even though I feel it is an important one. I believe that soaplib, for example, should be unit tested by it's developers and not by me. I will have integration tests but unit testing stubs rarely gives anything in return.
I am considering buying the Clean Code-book, which some really geeky friends of mine recommend, nay urge me, to buy but I am a bit scared of it.

I am also thinking of deployment, the distutils stuff will possibly do the trick, but I have to learn how.

May 1, 2010

Labor pains

I have now set a preliminary structure for at least the lower levels of the system.
I want the structure of the actual, physical underlying system to be really simple, while not having limitations that would making it unable to handle all the concepts of, for example BPMN. At some point, it should be possible for script generator/designers to use XPDL, for example.

So I decided to design Unified BPM in levels, the names of which I have not decided on yet, not that it matters that much.
The lowest level defines these entities and mechanisms:

It defines the script by defining a parameter passing format and mechanism and makes it possible for a script execution to start on one computer and continue on the next.
And also, as a consequence making it possible for a script to spawn child processes on other computers.
Unified BPM scripts will have a standardised look to allow for both script generation and customisation.


It defines the broker, the central entity which all IPC within Unified BPM passes. The broker is responsible for logging all that happens and to handle IPC security.
The broker is also connected to the Unified BPM database, which should be a deliberatly simply designed SQL database. It should be possible to use any of the large SQL-compliant database servers as backend without any (significant) problems.
The broker is also aware of other brokers, making it possible to forward messages and progress messages across networks and organisations.

It defines the agent, the client "listener" and "doer", that is responsible for receiving IPC and carry out those instructions on a client computer.
To reduce complexity and increase safety the agent actually don't have any open ports but has open requests to the broker which when timed out are immediately made again. This makes the agents as responsive as if they had listening, open ports. Also, it can use SOAP and https for encryption and verification and don't have do have much of internal security.
It also has some other functions, like managing and debugging client scripts, monitoring client performance counters and many other things.

Ok, that was the lowest level.
Above that level, there can be different kinds of controlling mechanisms, script generators and BPM designers.
So regarding the design of this system, it will be bottom-up when designing the lowest level, and top down when designing the upper levels.
The reason for that being that the lower level should be really open to allow for different BPM paradigms on top. Or maybe it will be more of a complex/simple thing.

I guess we'll have to see how all that works out. :-)

Apr 18, 2010

UnifiedBPM blog. First post!

This blog will document the development of Unified BPM.
So what...is UBPM? Except a future Business Process Management system that's gonna take BizTalk down(yeah right).
Some time ago, I realised that so much in terms of parsers and connectivity was already done and available as free and open source software, that just connecting the dots could not be that hard.

Well, it was a little bit harder than I thought, since I decided to combine it with learning a new programming language.
So what did I need from that language?

I wanted it to be loosely typed, quick and great for both web(apache) and utility development.
I needed it to be able to parse and generate itself, that seemed to be the most uncommon thing.
I narrowed it down a bit and remained was Python, PERL and Ruby.
I felt attracted by the freedom of PERL and gave it a try.

But after clearing many a hurdle and testing almost all the technologies involved I came to the conclusion that I, amongst some other things, didn't like the OOP-paradigm of PERL.
Me, a pretty well-seasoned developer, should not have to struggle with something that basic. Also, the myriad of ways to do something would become a problem if the project grew and people contributed.

So I looked at Ruby and found that the documentation and the community, while being very active, lacked much in maturity and quality.
Remaining was Python, so Python it was. Also, I grew to like many things about the pythonesque approach. It was almost as clean as ruby while keeping a bit more traditional.

I have now, again, started to work through the things I need to get a system like this of the ground:
  1. Debugging programatically. I need to be able to do this to GUI:fy script creation.
  2. SOAP/WSDL functionality. Will be used for most of the internal and external communications. I want this system to be extremely standardized, safe and extensible and don't give a rats ass about 20% less performance than RPC if it saves me hassles with datatypes and pointless XML-traversal. See below for a working wsgi-example.
  3. Code generation and parsing. I need this as well to GUI:fy. One should not have to be an python hacker to use the system. But anything should be possible.
  4. Persistence. I will use mod_wsgi for most scripts, so I had to check that out. Found it to be quite flexible.
So why blog about it?

Well, while exploring the above technologies I have encountered numerous hurdles, for which the solutions presented while searching the web has either horrible explanations, bad examples or worse, are completely undocumented. And some of those I will present here.
Also, some people might get interested in the UnifiedBPM project.
For PERL, I actually have example code for all the above areas if anyone wants it. Personally, I can't stand it. :-)

Below is an Python example on how to use soaplib under Apaches' mod_wsgi. *Without* any frameworks like Django and other stuff involved. For some reason, this seems to be the only such example on the web:

#! /usr/bin/env python

from soaplib.wsgi_soap import SimpleWSGISoapApp
from soaplib.service import soapmethod
from soaplib.serializers.primitive import String, Integer, Array

class HelloWorldService(SimpleWSGISoapApp):

def say_hello(self,name,times):
for i in range(0,times):
results+= 'Hello, ' + name
return results

def application(environ, start_response):
test = HelloWorldService()
results = test.__call__(environ , start_response)
return results

Actually, it was pretty straightforward, the __call__()-function takes the environment and start_response function. Only that nobody seemed to had done it before.
Also, the other examples (http://www.djangosnippets.org/snippets/979/) don't work for most because of this bug: http://github.com/jkp/soaplib/issues#issue/12 .
So I simplified it a little bit until the bug fix is properly bewildered. :-)