Tag Archives: Python

Dynamically generating data types in python

For a project I was working on recently, I needed to define a bunch of data types in python. As defining 10 different datatypes with about the same functionality manually would’ve been a pain, I decided to try something else.

I already had a bunch of nested dictionaries defining the fields and types of all datatypes, as I was generating SQLite statements from them. The general format looked something like this:

datatypes = {
    "type1": {
        "field1": {
            "type": "INTEGER",
            "notNull": True,
            "primaryKey": False,
            "autoIncrement": False,
            "default": None,
            "foreignKey": {
                "table": "type2",
                "field": "field1",
                "onDel": "RESTRICT",
                "onUpd": "RESTRICT"
            }
        },
        "field2": {
            # ...
        },
        # ...
    },
    # ...
}

Each outer dictionary (“type1”) defines a data type, and each inner dictionary (“field1”) defines a key-value-pair of this data type, including some information like “can this be empty?”. So, I already had everything I needed to define my data types.

What are the data types supposed to be able to do? I needed them to have unchangeable values (meaning that I needed only “getters”, no “setters” outside the constructor). So, how can I create class definitions from this block of definitions?

Easy. I generate a long string containing all definitions I need and run it through exec, a function I usually tend to avoid like the plague (note that this assumes that the class definition is safe and cannot be changed by others, meaning that you don’t have to worry about the security ramifications of using exec).

The code for the generation itself only clocks in at 55 lines, including comments. For easier reading, I put it in a gist.

As you can see, the code will generate a bunch of definitions for each data type, including getters and a checkRep-Function that can be used to verify the consistency of the data. So far, it does not check for foreign key constraints, as they are quite annoying to check and enforced in the database backend anyway.

So, why is this awesome?

  • You don’t have to manually write out all your datatypes. Instead, you define them once and then generate them automatically.
  • Even large changes to datatypes are quick and easy, by just changing the definition in one place.
  • You can generate other things like DB interfaces and -definitions as well, all from the same source definition.

And what are the problems with this approach?

  • You are using exec, meaning that if someone untrusted gains access to your definitions, they can do evil things to your code.
  • code coverage tests don’t play well with it.

For a full example of the code in use, you can check out my (abandoned) InvoiceManager-Project on GitHub. There, I also generate database definitions, database validation code, database interfaces, and unittests for all of these things, all from the same source definition.

Let me know what you think.

Introducing IssueBot, a Jabber MUC notification bot for GitHub

As part of a development team working on improving Enigmail, I recently found myself in need of a bot sending notifications to a Jabber multi-user chat (MUC) when the Issues of a GitHub-project receive an update (we already had a similar program for new commits in place, called commitbot). I searched for a while and did not find anything that fulfilled our requirements.

So, being a CS student and all that, I decided to write one.

Introducing IssueBot

IssueBot is a Python / Twisted bot, using the GitHub API to fetch information about the issues of a GitHub project. Those are then sent to a Jabber MUC. It can monitor multiple repositories at the same time, authenticate itself using an OAuth token (if you generate one manually) to increase the rate limit on the GitHub API, and will generate Notifications in the following conditions:

  • New ussue
  • Issue closed
  • One of the following has changed about the issue:
    • Title
    • State (open -> closed or vice versa)
    • Assignee
    • New comments

Development is still actively going on, with new features being planned and bugs being fixed, so keep an eye on the GitHub-Repository.

Configuration is done using a .tac-file (an example file is provided with the program). Just update the variables in it and you should be good to go. Instructions on how to use it can be found in the README.

As parts of the code are derived from the aforementioned commitbot, the Code is licensed under the GNU GPLv3. Feel free to fork, improve and send pull requests on the project page on GitHub.

The making of IssueBot

As it turns out, it is actually really easy to query the GitHub API. You send a request to a specific URL (for example, https://api.github.com/repos/octocat/hello-world/issues) and get a response with some JSON and some headers telling you how many requests you have remaining for the current hour (the API limit for unauthenticated users is 60 requests per hour).

Since it is pretty easy to do this in Python, and it has some nice support for JSON built-in using the standard json module, it was pretty easy to query the API, parse the result into a Python dictionary, and parse that into the local database. Then, the changes could be determined and notification messages generated. The only thing missing was the interface to Jabber.

For that, I decided to reuse some code from the commitbot project. This blew up the list of dependencies, but made it possible to somewhat painlessly work with Jabber MUCs. As an added bonus, one of the dependencies, Twisted, deamonizes the process automatically, saving me the trouble.

In the end, it took me about three hours to hack together the current version of the program. Most of the time was spent trying to figure out how to get Twisted to work with my main loop, which was actually non-trivial until I stumbled upon the LoopingCall-Instruction provided by Twisted.internet.task. As Twisted has some rather… interesting views on how it should be used, and my use case did not quite fit into that pattern, I found documentation on how to use it hard to find.

The program is now happily running on my server, spitting out the occasional notification into our chat, and works like a charm.