Deprecation notice

Deprecation Notice: This blog is no longer updated. Please visit my current blog at subfocal.net.

Sunday, October 30, 2011

Database file storage for Django

Django provides a good mechanism for handling file attachments as model fields, using the FileField and ImageField classes. These field types store the path to a file in the database, while facilitating the actual file storage on the filesystem through the Storage API. They even come with the interface widgets to handle uploads in the model admin, so it's a simple feature to activate.

Motivation

File storage adds a new concern that your application did not previously have. Now, in addition to supporting database content (migrations, backups, and all the best practices that come with a database), you have user content living on the filesystem. This means a new area where you need to consider security (permissions that allow uploading without compromising the system), managing the uploaded files across redeployments, distributing them across application instances, etc.

When file storage is central to your application's purpose, then these concerns are worth spending engineering time on. However, there are many times where file uploads are guaranteed to be small and won't be heavily used throughout the app. In such cases, storing file content in the database is possible and can make your life a bit easier.

Database Storage

Database storage is an alternative to using the filesystem for uploaded file content: the whole thing can be stored in the database. Django doesn't ship with this capability — probably because in many cases it is a very bad idea.

Some searching turned up a snippet that implements this basic idea. It unfortunately doesn't use the Django database layer, and only supports Microsoft SQL Server as written. It would be possible to get working with other databases, given the correct connection string, but I'd really prefer to let Django abstract the database layer for me where possible.

Taking inspiration from the snippet, I've implemented a new database storage class that does use the Django database API. It's extremely easy to get going, and should work with any database supported by Django (I've used it with SQLite and SQL Server so far).

Check it out:

Example Usage

Here's a quick how-to on using this library. First, install it (ideally in a virtualenv):

    $ pip install django-database-storage

Add a FileField to your model object (in models.py):

    from database_storage import DatabaseStorage
    ...
    DBS_OPTIONS = {
        'table': 'blog_attachments',
        'base_url': '/blog/attach/',
    }

    class BlogEntry(models.Model):
        attachment = models.FileField(
            upload_to='blog_attachments/',
            storage=DatabaseStorage(DBS_OPTIONS),
            null=True, blank=True)

Create a table in the database for storing the attachments. Add a file in your_app/sql/create_blog_attach.sql, which will execute when you run syncdb:

    CREATE TABLE blog_attachments (
        filename VARCHAR(256) NOT NULL PRIMARY KEY,
        data TEXT NOT NULL,
        size INTEGER NOT NULL);

(The column names may be different from the above, but these are the defaults. If you use different column names, specify them in DBS_OPTIONS as 'name_column', 'data_column', and 'size_column'.)

Now any uploaded files will be saved into the database instead of the filesystem. Note that the data field is text; this is because Django doesn't presently support blobs or non-unicode data returned from queries. DatabaseStorage transparently uses base64 encoding to store your data, so you can store arbitrary binary files, but they will use some extra space from the encoding.

The last thing you need is a view to serve these files upon request. Add a url handler to urls.py:

    (r'^blog/atttach/(?P<filename>.+)$', 'blog_attach'),

...and the view to handle these requests (in views.py):

    def blog_attach(request, filename):
        # Read file from database
        storage = DatabaseStorage(DBS_OPTIONS)
        image_file = storage.open(filename, 'rb')
        if not image_file:
            raise Http404
        file_content = image_file.read()
       
        # Prepare response
        content_type, content_encoding = mimetypes.guess_type(filename)
        response = HttpResponse(content=file_content, mimetype=content_type)
        response['Content-Disposition'] = 'inline; filename=%s' % filename
        if content_encoding:
            response['Content-Encoding'] = content_encoding
        return response

That's it! Now you have the ability to save user-uploaded files in the database, without having to worry about file management on the server. Remember, this is intended for simple use cases and small files. An application that relies heavily on file storage, or wishes to store files greater than 1MB or so, should absolutely use a more robust solution.

Finally, to see the full documentation, pull up the help in your Python shell:

    $ python
    ...
    >>> from database_storage import DatabaseStorage
    >>> help(DatabaseStorage)

Feel free to comment here if you have any issues or questions!

2 comments:

  1. Database storage is an alternative to using the filesystem for uploaded file content... Django doesn't ship with this capability — probably because in many cases it is a very bad idea...

    Not if you use the nonrel fork of Django and store your user content in MongoDB`s GridFS :)

    ReplyDelete
  2. Hello Mike,

    thanks for your article. I have a little problem. It seems u missed the line where u declare 'mimetypes'. At this line there is an error:

    content_type, content_encoding = mimetypes.guess_type(filename)

    Eclipse says: "Undefined variable: mimetypes"

    Whats the matter?

    Thanks
    Peter

    ReplyDelete