Dynamically generating data types in python

For a project I was working on recently, I needed to define a bunch of data types in python. As defining 10 different datatypes with about the same functionality manually would’ve been a pain, I decided to try something else.

I already had a bunch of nested dictionaries defining the fields and types of all datatypes, as I was generating SQLite statements from them. The general format looked something like this:

datatypes = {
    "type1": {
        "field1": {
            "type": "INTEGER",
            "notNull": True,
            "primaryKey": False,
            "autoIncrement": False,
            "default": None,
            "foreignKey": {
                "table": "type2",
                "field": "field1",
                "onDel": "RESTRICT",
                "onUpd": "RESTRICT"
            }
        },
        "field2": {
            # ...
        },
        # ...
    },
    # ...
}

Each outer dictionary (“type1”) defines a data type, and each inner dictionary (“field1”) defines a key-value-pair of this data type, including some information like “can this be empty?”. So, I already had everything I needed to define my data types.

What are the data types supposed to be able to do? I needed them to have unchangeable values (meaning that I needed only “getters”, no “setters” outside the constructor). So, how can I create class definitions from this block of definitions?

Easy. I generate a long string containing all definitions I need and run it through exec, a function I usually tend to avoid like the plague (note that this assumes that the class definition is safe and cannot be changed by others, meaning that you don’t have to worry about the security ramifications of using exec).

The code for the generation itself only clocks in at 55 lines, including comments. For easier reading, I put it in a gist.

As you can see, the code will generate a bunch of definitions for each data type, including getters and a checkRep-Function that can be used to verify the consistency of the data. So far, it does not check for foreign key constraints, as they are quite annoying to check and enforced in the database backend anyway.

So, why is this awesome?

  • You don’t have to manually write out all your datatypes. Instead, you define them once and then generate them automatically.
  • Even large changes to datatypes are quick and easy, by just changing the definition in one place.
  • You can generate other things like DB interfaces and -definitions as well, all from the same source definition.

And what are the problems with this approach?

  • You are using exec, meaning that if someone untrusted gains access to your definitions, they can do evil things to your code.
  • code coverage tests don’t play well with it.

For a full example of the code in use, you can check out my (abandoned) InvoiceManager-Project on GitHub. There, I also generate database definitions, database validation code, database interfaces, and unittests for all of these things, all from the same source definition.

Let me know what you think.