Deprecation notice

Deprecation Notice: This blog is no longer updated. Please visit my current blog at subfocal.net.

Sunday, October 30, 2011

Database file storage for Django

Django provides a good mechanism for handling file attachments as model fields, using the FileField and ImageField classes. These field types store the path to a file in the database, while facilitating the actual file storage on the filesystem through the Storage API. They even come with the interface widgets to handle uploads in the model admin, so it's a simple feature to activate.

Motivation

File storage adds a new concern that your application did not previously have. Now, in addition to supporting database content (migrations, backups, and all the best practices that come with a database), you have user content living on the filesystem. This means a new area where you need to consider security (permissions that allow uploading without compromising the system), managing the uploaded files across redeployments, distributing them across application instances, etc.

When file storage is central to your application's purpose, then these concerns are worth spending engineering time on. However, there are many times where file uploads are guaranteed to be small and won't be heavily used throughout the app. In such cases, storing file content in the database is possible and can make your life a bit easier.

Database Storage

Database storage is an alternative to using the filesystem for uploaded file content: the whole thing can be stored in the database. Django doesn't ship with this capability — probably because in many cases it is a very bad idea.

Some searching turned up a snippet that implements this basic idea. It unfortunately doesn't use the Django database layer, and only supports Microsoft SQL Server as written. It would be possible to get working with other databases, given the correct connection string, but I'd really prefer to let Django abstract the database layer for me where possible.

Taking inspiration from the snippet, I've implemented a new database storage class that does use the Django database API. It's extremely easy to get going, and should work with any database supported by Django (I've used it with SQLite and SQL Server so far).

Check it out:

Example Usage

Here's a quick how-to on using this library. First, install it (ideally in a virtualenv):

    $ pip install django-database-storage

Add a FileField to your model object (in models.py):

    from database_storage import DatabaseStorage
    ...
    DBS_OPTIONS = {
        'table': 'blog_attachments',
        'base_url': '/blog/attach/',
    }

    class BlogEntry(models.Model):
        attachment = models.FileField(
            upload_to='blog_attachments/',
            storage=DatabaseStorage(DBS_OPTIONS),
            null=True, blank=True)

Create a table in the database for storing the attachments. Add a file in your_app/sql/create_blog_attach.sql, which will execute when you run syncdb:

    CREATE TABLE blog_attachments (
        filename VARCHAR(256) NOT NULL PRIMARY KEY,
        data TEXT NOT NULL,
        size INTEGER NOT NULL);

(The column names may be different from the above, but these are the defaults. If you use different column names, specify them in DBS_OPTIONS as 'name_column', 'data_column', and 'size_column'.)

Now any uploaded files will be saved into the database instead of the filesystem. Note that the data field is text; this is because Django doesn't presently support blobs or non-unicode data returned from queries. DatabaseStorage transparently uses base64 encoding to store your data, so you can store arbitrary binary files, but they will use some extra space from the encoding.

The last thing you need is a view to serve these files upon request. Add a url handler to urls.py:

    (r'^blog/atttach/(?P<filename>.+)$', 'blog_attach'),

...and the view to handle these requests (in views.py):

    def blog_attach(request, filename):
        # Read file from database
        storage = DatabaseStorage(DBS_OPTIONS)
        image_file = storage.open(filename, 'rb')
        if not image_file:
            raise Http404
        file_content = image_file.read()
       
        # Prepare response
        content_type, content_encoding = mimetypes.guess_type(filename)
        response = HttpResponse(content=file_content, mimetype=content_type)
        response['Content-Disposition'] = 'inline; filename=%s' % filename
        if content_encoding:
            response['Content-Encoding'] = content_encoding
        return response

That's it! Now you have the ability to save user-uploaded files in the database, without having to worry about file management on the server. Remember, this is intended for simple use cases and small files. An application that relies heavily on file storage, or wishes to store files greater than 1MB or so, should absolutely use a more robust solution.

Finally, to see the full documentation, pull up the help in your Python shell:

    $ python
    ...
    >>> from database_storage import DatabaseStorage
    >>> help(DatabaseStorage)

Feel free to comment here if you have any issues or questions!

Wednesday, October 19, 2011

Share your development server with a reverse ssh tunnel

Sometimes you want to allow someone access to your development server (e.g. a Django or Rails dev server) running on port 8000 on your laptop. Unless the other person is on the same subnet as you, it's very likely there's a firewall between you. (Whether you're at home, on a company LAN, or at Starbucks.)

Assuming you have access to a Linux host that is publicly accessible, this is easy to work around. I personally have a tiny virtual host that gives me remote ssh access and runs a few little services for me, so I use this host.

This is a quick and dirty way to open up access to your dev server by using ssh and a publicly accessible remote server as your proxy. Here's the entirety of my webtunnel.sh script:

    #!/bin/bash
    REMOTE="myvirtualhost.com"
    echo "Opening tunnel to $REMOTE..."
    ssh -nNT -o ServerAliveInterval=30 \
        -R $REMOTE:8000:localhost:8000 $REMOTE

Here's what it looks like in action:

    $ webtunnel.sh
    Opening tunnel to myvirtualhost.com...

As long as this ssh process is around, the tunnel will continue to exist. It's essentially a one-liner, but a little complicated, so I'll explain the options:

  • -n: Redirect stdin from /dev/null, mainly useful if you plan on putting the ssh session in the background (with -f).
  • -N: Do not actually execute a remote shell, just connect and establish the requested port forwards.
  • -T: Don't allocate a pseudo-TTY, since this is not intended to be an interactive shell.
  • -o ServerAliveInterval=30: Send a keepalive ping every 30 seconds. This will keep the TCP connection from being shut down due to inactivity if it is unused for several minutes.
  • -R $REMOTE:8000:localhost:8000: The key bit of magic, establish a reverse tunnel from the remote host, port 8000, to your local host, port 8000. Any incoming connection to the remote server on port 8000 will be transparently routed to your local development server.

As I hinted above, you can optionally pass -f to tell ssh to fork into the background after successfully connecting. I like to have it in the foreground, occupying a screen terminal, so that I don't forget it's open and I can kill it with CTRL-C whenever I want.

Saturday, August 6, 2011

supybot-git: An IRCbot plugin for Git notifications

I was looking for a way to get git commit notifications in IRC, and there are several examples of bots that do this. However, being a Pythonista, if I'm going to run an IRC bot it's going to be one written in Python. The main idea is that I figured I'd probably want to extend it, and in that case, it ought to be in a language I love.

After some investigation, I found Supybot, a robust, extensible IRC bot written in — you guessed it — Python. It has a pretty slick installation wizard to get you up and running without fiddling with any configuration files. It also ships with a robust collection of plugins, and many have written third-party plugins as well.

Since no git notification plugin existed, I went ahead and wrote one. It's been running on my IRC server for over a year now, and I recently decided to clean it up and open source it. I present supybot-git!

It's pretty straightforward to get up and running, and has the following features:

  • Can monitor any number of repositories
  • Repositories are associated with a single IRC channel
    • repolist command lists repositories associated with the current channel
    • Notifications will appear on this channel
    • Users can display recent commit log on this channel (with shortlog command)
    • People on other channels will have no indication the repository exists
  • Asynchronous commit notification
  • Configurable polling frequency (default: every 2 minutes)
  • Configurable notification format (you can use the commit author, branch, message, provide a link to the commit, and more)
  • Reload configuration with gitrehash command

It's built with the assumption that you may want to retain some privacy, i.e. monitor a closed-source git repository. This means that people in one channel are allowed to see commit information, but other channels will have no idea that the repository even exists. I currently have my bot monitoring six repositories across three IRC channels and it has been perfectly stable.

Grab supybot-git and try it out! Let me know how it's working for you.

Friday, June 3, 2011

Create notification bubbles in Python

Preface

Note: You can skip the background discussion here if you already understand what freedesktop.org and D-Bus are.

freedesktop.org

You could live your whole life as a user of Linux-based desktops and never know that freedesktop.org exists. And yet its existence makes your life better in a lot of ways.

freedesktop.org essentially specifies a bunch of standard behaviors that an X-window-system-based desktop should implement. It was created to address the problem of various desktop environments (Gnome, KDE, XFCE, etc.) each having different ways of solving the same problems.

Ever wonder how Pidgin is able to put an icon in the notification area (often called the "tray") whether you're running Gnome or KDE? It would have been a nuisance to have to write the notification applet logic twice, once for Gnome and once for KDE. Even worse, less commonly used desktops like XFCE might be neglected entirely, meaning those users wouldn't see any Pidgin notification applet.

freedesktop solves problems like this by defining a common way for programs to interact with desktop environments. This way, application authors only have to implement the applet behavior once, and it will work on every desktop environment that adheres to the freedesktop standard.

D-Bus

Another key system needs to be mentioned briefly here: D-Bus. It's another one of those things the average user may never hear about in a lifetime of Linux use. In fact, you typically don't hear about it unless something has gone wrong. Yet once again, it's sitting there quietly making life in Linux very cool.

D-Bus is a message-passing system that allows different applications to communicate in well-defined ways. The "well-defined" part is key here: an application can provide a service (and document exactly how it works and where to find it), and other applications can take advantage of the service without caring about the specifics of the system (desktop environment, programming language, etc.) Used this way, D-Bus is very similar to CORBA or RPC.

freedesktop takes advantage of D-Bus to provide a lot of the cross-desktop standards for behavior, and the result is that Linux applications can behave consistently no matter what desktop you use.

Fun with Python

I'm sure you find the above discussion enthralling and would love nothing more than to spend more time reading about open specifications and standards committees. But let's digress and do something fun with Python instead.

While working on a large C++ project, I realized that I often kick off a make command and then pop over to a web browser or an IRC session to distract myself while the test suite runs (knowing it will take a few minutes). The problem with this is that I sometimes find myself reading something fascinating and don't realize that the test suite finished five or ten minutes ago. What I needed was some kind of notification...

It turns out freedesktop.org has an app for that. There's a notification service available via D-Bus, and it's pretty straightforward to call from a Python program:

    import dbus
    import sys

    def notify(summary, body='', app_name='', app_icon='',
            timeout=5000, actions=[], hints=[], replaces_id=0):
        _bus_name = 'org.freedesktop.Notifications'
        _object_path = '/org/freedesktop/Notifications'
        _interface_name = _bus_name

        session_bus = dbus.SessionBus()
        obj = session_bus.get_object(_bus_name, _object_path)
        interface = dbus.Interface(obj, _interface_name)
        interface.Notify(app_name, replaces_id, app_icon,
                summary, body, actions, hints, timeout)

    # If run as a script, just display the argv as summary
    if __name__ == '__main__':
        notify(summary=' '.join(sys.argv[1:]))

I created this little script1 and dropped it in my ~/bin directory so I can pop up notifications whenever I feel like it. (The script wraps the D-Bus logic in a native Python function, then calls it.) Next, I added a new function in my .bash_profile to wrap the make command:

    # Run make and show freedesktop notifications on success/fail
    function make
    {   
        if /usr/bin/make "$@"; then
            notify.py "Make succeeded."
        else
            notify.py "Make failed."
            return 1
        fi
    }

Now whenever a make command finishes, I get a little notification bubble in Gnome to tell me to stop wasting time and get back to work!

Screenshot of notification
Figure 1: Screenshot of notification.

Achieving and maintaining a focused mental state is critical for writing software (among other things). When I'm trying to be productive, it's nice to get a little nudge in the right direction before I start to drift out of the zone.

1. Note that the timeout parameter will probably be ignored on Ubuntu. See this long-standing, contentious bug report.

Thursday, April 28, 2011

Generate bit.ly links from the command line

I'm a command line junkie. Whether that's a good or bad thing I'll leave up to the reader. But since the reader is here, I'll assume (s)he finds command line utilities helpful!

Lately, I've found myself generating short URLs for things often enough that I thought it would be nice to have a little script that generates them for me. Then I found python-bitly, an elegant little Python module that wraps the bit.ly API.

Over the next couple minutes, I unceremoniously hacked that module into a script that you can kick off whenever you want a bit.ly URL: bitly.py.

Prerequisites: python-bitly requires simplejson (or json), which you can install with: pip install simplejson. After this, register an account at bit.ly, grab your username and API key from the account page, and paste them into the script (API_USERNAME and API_KEY respectively).

Example session:

(crono:~)$ bitly.py 'http://github.com/'

Short URL: http://bit.ly/k7lifz

Get in the habit of quoting the URL, because a URL that contains ampersands (&) or semicolons (;) will cause problems with bash. Grab the script!

Saturday, April 23, 2011

Solution: X fails to start on Ubuntu after kernel update

Noted here for posterity: I recently had Ubuntu 10.10 (lucid) upgrade my kernel to 2.6.35-28, and on the next boot, the X server failed to start. It just hung indefinitely with a purple screen, and the X server log didn't contain anything terribly informative.

It took a lot of googling, but I eventually came across this post which identifies the root cause: I installed the new (experimental) gold linker on my system several weeks earlier. It seemed to work well with all my C++ projects, and is a good deal faster than the standard ld.

It turns out there is a well-known limitation with gold right now: it can't compile a working kernel. This also seems to apply to installing the nvidia kernel module against an upgraded kernel. Resolved it by booting in rescue mode to a root shell and running:

# aptitude remove binutils-gold
# dpkg-reconfigure nvidia-current

Thanks to Jesse Vogt for posting his description, it would've taken me a while to connect the dots between binutils-gold and that error!

Sunday, April 10, 2011

Saving time with SSH: ~/.ssh/config

If you're a web developer these days, you probably end up SSHing to a lot of different hosts. In the past, it used to be simple: ssh hostname, type your password, get in. Now it's much more common to use identity files (perhaps not your personal identity file, either), custom port numbers, and other custom SSH settings.

Often, people solve problems like this with an alias:

alias sshweb='ssh -p 3022 -X webuser@webhost.com'

It turns out there's a better way: use your local ssh config file! This is a file that lives in your ~/.ssh directory, simply named config. It uses all of the same conventions as the /etc/ssh/ssh_config file, so you may already be comfortable with the syntax and usage.

In this file, you can simply write Host foo, and everything that follows this will apply only to ssh sessions with that host. Even better, it works with scp and other tools that invoke OpenSSH (scripts, libraries, etc.). The shell alias approach doesn't help you there!

Obligatory example:

Host home
HostName ssh.myhomedomain.com
Port 4022
ForwardX11 yes

Host webstage staging.webserver.com
HostName staging.webserver.com
User webuser
IdentityFile ~/.ssh/id-webserver
ForwardAgent yes

Pretty much any SSH configuration variable can be set on a per-host basis this way. You can also see that hosts can be given multiple names (separated by spaces), as in the "webstage" example. This host can be reached by SSHing to either "webstage" or "staging.webserver.com". I use this because, on the command line, I prefer to type a short name, but our deployment scripts generally use the entire hostname. Listing them both means the same SSH configuration will always be used.

Now go get rid of those aliases (or stop typing long-ass command lines all the time) and enjoy this newfound power!

More Resources