Aug 17, 2010

Done with first phase of SQL generation

So there.

Now I have, in python, created a class which does only the things I need to at begin with the first database part of Unified BPM, creating and querying simple database structures.
The class can currently create these kinds of SQL statements(and then some):
  • SELECT - with where, expressions, conditions, CASE, function calls, joins, subselects.
  • CREATE TABLE - with defaults, primary keys, foreign keys and autoincrement
  • CREATE INDEX - Unique, Clustered, Non-Clustered
Supported databases for this functionality is:
  • Full
    • MySQL
    • PostgreSQL
    • DB2
    • SQL Server
  • Partial DDL
    • Oracle(due to identifier limitations and auto-increment hassle)
Now I will use this generator to make test for the DAL:s connection handling.
I now have working installations of all five databaseservers and will use them to set up integration tests for all flavours.

In practice, Partial DDL means that Unified BPM won't run on that platform but will be able to communicate with it. I simply won't commit to restricting myself to 30 bytes long identifiers. Somebody please think of the FK-names!

Aug 15, 2010

Adding DB2 support

Well, I though that while I'm at it, I might as well add DB2 support as well.
So now I will, it seems to have a pretty much straightforward SQL implementation.

By the way, It is kind of interesting to see what free(as in beer) offerings the different database vendors have, it is a bit revealing with regards to their attitude towards developers.

Aug 14, 2010

Dropping oracle support(at least DDL)

Hi,
I have decided to drop Oracle support.
Last time I used Oracle was many years ago and back then I had none of the experience I do today.
The reason? The extremely silly 30-character length restriction on identifiers.

Oracle restricts the length of identifiers to 30 characters, which is extremely strange, this being 2010 and all.
Large systems with lots of complex and specific data needs long identifier names. I once belonged to the school of abbreviation, where everything got abbreviated down to a length of about 20 characters. I have since met THE REAL WORLD of huge databases and realised that the only reason to abbreviate identifiers is that is looks better when you list them.
However, mapping them to their real-world counterpart data(rw_ctrprt_data?!?) will, for the next developer, be a error-prone and boring chore.

Recently, a collegue of mine came to me with an identifier which unabbreviated would become something like 120 characters long. I had to tell him to go with it, since there was no way around it.

The typical reason for an identifier to become this long is that it describes an exception of some sort, like: "add taxes if the counterparty's custodian has the same currency than the traders custodian".
Usually, it is some real-word legal rule that has to be followed.
And believe me, we use all the tricks, like dynamic structures and so forth, to avoid having to name columns like this, so don't give me any crap of "with the right design", because reality is different from school.
But many times, one have to do it that way. Yeah, one could use column description, but they aren't as useful as one might think.

Anyway, I will now drop Oracle support. Too bad I've already spent hours on making the auto-increment support.
Luckily, it was really straightforward to uninstall the Oracle software, kudos to them. :-)

EDIT:
Upon closer thought I will probably not drop Oracle support totally, but rather I will only have none or partial DDL support. It would be silly to exclude a very popular database from the databases supported by UnifiedBPM.

Jul 25, 2010

More frustration and extremism.

Ok. In my last post I ranted about ranting about limitations in a specific database.
But sometimes, developers don't give a sh*t on how the users actually want to use the system they make.
Rather, they go to extreme lengths explaining why the user should do it another way instead.
Even though the users obviously don't want to do it that way.

For example, I am implementing column defaults in the DAL, and have finall gotten do make it work for MySQL.
What is the most usual default value? To me, and in all databases I have seen so far it is the current date and time.

I haven't thought about it much until now, since I had seen the TIMESTAMP datatype being used for that in MySQL.

I did not, however, consider that DEFAULT (Now()) wasn't even supported, since functions aren't allowed as default values!
So I will have to use another datatype, the afore mentioned timestamp, which I will transform into DATETIME DEFAULT (GETDATE()) when needed.
However, timestamp is completely inflexible, and have a different start date than the normal date time. So, it sucks and probably has to be handled separately later on.
But, screw everyting, I'll use it for now.

See http://bugs.mysql.com/bug.php?id=27645 for the reeaally long discussion.

Since many FOSS-systems, like mine, puts huge efforts in being database-agnostic, it is extremely annoying that the databases don't try meeting up the other way.
Like the stupid reasoning in the Oracle camp for not shortcutting auto incrementation for the condescending reason of teaching users how to do things properly.

The only database that seems to be fairly balanced, and doesn't create any significant problems for me, is PostgreSQL.

Now it seems that I am letting MSSQL off easy. But of course, MSSQL has the dubious advantage of almost not being designed at all, thus allowing for practically everything thrown on it at this the most basic level of database development.
It's when it gets advanced that MSSQL comes back and bites you like you've never been bitten with it's partial implementations. The insane PARTITION BY limitations almost killed me, for example.
That said, on this level it works well.

Cy'all.

Jul 20, 2010

Hacking along, realising stuff.

I am doing a little more than is necessary on that DAL(database abstraction layer), because I have discovered it being pretty rewarding.

I look at myself as something of a pretty accomplished SQL dude, knowing more than perhaps sometimes is healthy about databases, but this has made me reflect on the structure of the SQL language. A refreshing angle for me, to say the least. Sometimes you do something so much that you just end up muttering about it shortcomings. Especially different implementations' shortcomings.

It's interesting to realise what really belong where, it was way harder than I thought to break it down in its logical pieces.
Its small things, like that a join is the same as a complete query, just a source of data treated by the query:

SELECT N.Name, A.Role
FROM Names N -- Source 1 data
JOIN Actors A ON A.PersonID = N.PersonID -- Source 2 data+conditions
WHERE N.Name = 'Clint Eastwood' -- Source 1 conditions
Being able to treat these sources in exactly the same way makes things way easier. Just an example. And Conditions are made of expressions that can have conditions in them, would be another example. Definitions, definitions.
There are more stuff like that in SQL-DML, but they are a bit harder to explain, so I won't.

Anyway, I solved the autoincrement issue with Oracle by having a postSQL property, simply generating code for creating the sequence and the trigger and then concatenating that to the end of the CREATE TABLE-SQL. Haven't tested that code very thoroughly yet though.

Another reason to stay with DAL a bit more is that I am thinking of perhaps creating a transformation toolchain for use in Unified BPM. Or at least be used for more than database upgrades.

Well. See ye later.

Jul 3, 2010

Please someone tell me I am wrong.

Here I was, happily coding along, when I got into something that possibly will force me to drop Oracle support: Oracle's implementation of sequences.

By not having built-in insert-support of auto-incrementing sequences like others do (PG,MY, SQL-se et.al) they leave the incrementing of primary key to the insert statement using seqname.nextval.

Yes, this could be made using a trigger. But to me, triggers are teh ugly shit, in general.
Having one in almost every table really stinks. It is bad form. Added a column? Tough luck, you're likely to have some trigger that needs that column added to.

I have seen the rationale for this, that it is better to have sequences table independent.
But that's missing the point. It's not about that, everybody else has that too.
It's about control and should be the OTHER WAY.

The sequencing should always happen automatically in those tables using them when records are inserted.
It should not be possible for an application to insert a custom value in a sequenced column. It should not be possible to, by accident, choose the wrong sequence.

This aspect even makes it dangerous.
Actually, most developers just use sequences to have a new value for each insert, not caring very much if there is a gap. The only time they get in trouble for using the wrong sequence is when they try to insert a value that already exists. Usually, that problem is not even a problem and almost immediately visible.

But in financial applications, for example, having gaps the sequence could have serious legal ramifications, possibly not seen until some reconciliation event occurs years later. And by then it have created situations that are almost impossible to rectify. And, even more importantly, explain, to the SEC or IRS when they come knocking. Where did you put those missing futures contracts now again? Where is that coupon? Hopefully someone will notice. But likely they won't since they are many working with the same sequence.

So to provide sequencing support for Oracle, the INSERT commands need to know lots of things about the table. And since I want to abstract the creation, I need to keep track of this weirdness.

I am thinking about automatically creating an very specifically named trigger but it seems so stupid and out of place. It really does not fit well with how DAL is supposed to work.

Aww crap.

Jun 13, 2010

Database Abstraction Layer - DAL

Hi!

The last month has been somewhat difficult with regards to time available.
But I've still managed to churn out an early version of what will be the core of the database abstraction layer, the classes that generates the SQL for each database.
The code is currently supporting MySQL, PostgreSQL, Oracle and SQL Server.
Since they will be sent as parameters into DAL, I call them parameters.
They do, however, have more functionality than being mere parameters.
There is a base class, Parameter_Base, that all parameters inherits from, that has four methods:
  • AsMySQL()
  • AsPostGreSQL()
  • AsOracle()
  • AsSQLServer()
Each of these methods returns a database specific SQL-representation of the parameters contents.
There are classes like Parameter_Field, Parameter_Expression, Parameter_CASE, Parameter_Function and Parameter_SELECT and so on.

They use each other to generate, for example, a SELECT-statement.
Consequently, Parameter_SELECT has (currently) two properties, fields(list of Parameter_Field) and sources (list of Parameter_Source), which in turn uses other classes.
So when one calls the Parameter_SELECT.AsMySQL function, it loops through the objects in its fields and sources list and calls their .AsMySQL functions. This way, a MySQL specific SQL-representation of the Parameter_SELECT-object is generated.

The parameters: parameter.py
Some tests: parameter_tests.py
Crude documentation: Doc_20100613.pdf

So far, most elements of typical SELECT statements are covered, like expressions and joins.
Stuff like ORDER BY, GROUP BY and HAVING aren't, but from now on, it should be much more easy since they merely reuse the concepts I have already defined.
I'd have to admit, maybe the structure looks pretty natural when you look at it, but it was really quite hard to generalise and simplify the structure while maintaining all the functionality of the language.

Well. I suppose that was it.