Technobabble

Experiment: DuckDuckGo as default search engine for a week

Have people become irationally too used to Google search? I must admit, I haven't really considered using any alternative for the past many years. Especially since it has always been the default search provider of my primary web browser (Firefox, Chrome).

Inspired by a Reddit post, I will try to make up for my search engine ignorance and try out DuckDuckGo for a week (at least), and I dare you to do the same.

Some of the highlights of DuckDuckGo includes:

Here's instructions on how to change your default search provider of your favorite web browser.

MIPS machines giveaway

Instead of having these machines stored somewhere where noone gets to play with them, I am now offering them to anyone who wants to come pick them up.

http://www.technobabble.dk/attachments/mips1.jpg http://www.technobabble.dk/attachments/mips2.jpg http://www.technobabble.dk/attachments/mips3.jpg

They are SGI Indy machines and you should be able to get Linux to boot on them. Get in touch with me if you want them. They will be scrapped within a month or so. You must be able to pick them up on Nørrebro in Copenhagen.

Fun fact: They make a drum rimshot at boot time and have a port for 3D goggles.

Update

The machines has been donated to the folks at labitat.dk

Blogging software change

Having neglected my weblog a bit for the last couple of years, I felt something had to change. So I used the opportunity I got when the server hosting this site was going down and I had to move everything to a new server, to change the blogging software as well.

I'm now using blohg, which is a Mercurial backed blogging engine. That is, all content is read from a Mercurial repository and fed through a reStructedText parser. I needed to find a platform that would make me feel comfortable, and beeing a programmer, whats more natural than using my favorite editor and VCS for the purpose.

In order to facilitate the move, I needed to find a solution to blohg having a different URL structure than my old system, and I couldn't really come up with any good arguments as to why the old scheme was better than the new one, so I decided to go with the new and simple version. Now I needed a way of redirecting the readers from the old URL to the new one:

/<year>/<month>/<date>/<slug> => /post/<slug>/

To solve this problem I wrote a patch for blohg that quickly got accepted, and now blohg supports URL aliases trough an rst comment in your posts.

Time is an illusion, lunchtime doubly so

Do you have some spare CPU cycles and a little bandwidth on your server?

Then you should consider taking part in the NTP Pool project and become a part of the global effort to provide accurate time for millions of users. There continues to be a large growth of users of the pool, and the number of servers doesn't seem to grow quite as fast.

It will only take you about 15 minutes to setup. Read more here: How do I join pool.ntp.org?

Hacking Django forms for CSS flexibility

The default output of the Django forms (former newforms) module is not very CSS friendly. With a few simple adjustments, you can make your web designer colleague happy.

This patch will add three classes on the parent HTML element of the rendering of each form field (the tr, li or p tag depending on your rendering mode):

  1. The type of the form field. (Examples: CharField, ModelChoiceField)
  2. The type of the widget. (Examples: TextInput, SelectInput)
  3. Is the form field optional or required: Optional or Required
  4. Now a required DateField will render, using the as_table rendering, as:
1  <tr class="DateField TextInput Required">
2    <th>
3      <label for="id_date">Date</label>
4    </th>
5    <td>
6      <input type="text" name="date" id="id_date" />
7    </td>
8  </tr>

Example uses

A couple of example use cases where my patch will help you out:

  • Special styling of required fields possible.
  • Easier to add a date picker by JavaScript.
  • Special styling of checkboxes (styling input elements to width: 100% also affects those).

Download the patch

Patch against forms/forms.py in Django 1.0: Download - View

How to patch your newly downloaded Django-1.0.tar.gz

For those of you not quite familiar with working with patches:

$ wget http://www.hacktheplanet.dk/export/HEAD/misc/forms.py.patch
$ wget http://www.djangoproject.com/download/1.0/tarball/
$ tar xvfz Django-1.0.tar.gz
$ patch -d Django-1.0/django/forms/ < forms.py.patch

Django and mod_wsgi: A perfect match!

mod_wsgi is an Apache module for serving WSGI-based Python web applications from the Apache HTTP server. Django, along with almost every other Python web framework today, comes bundled with a backend for acting like a WSGI application.

A couple of months ago I decided to try it out in spite of mod_python. Discovering and trying out mod_wsgi really suprised me. It can take a massive beating, and outperforms mod_python in every practical aspect.

The setup

You will need a short Python "bootstrap" script to create a WSGI-handler for your Django project. Here is an example (call it wsgi_handler.py and place it in the root directory of your Django project - the one with manage.py and settings.py):

import sys
import os

sys.path.append(os.path.dirname(os.path.abspath(__file__)) + '/..')
os.environ['DJANGO_SETTINGS_MODULE'] = 'projectname.settings'

import django.core.handlers.wsgi

application = django.core.handlers.wsgi.WSGIHandler()

Finally set up your Apache virtualhost to use mod_wsgi:

<VirtualHost *>

  ServerName www.projectname.org
  ServerAlias *projectname.org

  Alias /admin_media /usr/lib/python2.4/site-packages/django/contrib/admin/media

  <Location /admin_media>
    Order allow,deny
    Allow from all
  </Location>

  Alias /media /home/user/projectname/media

  <Location /media>
    Order allow,deny
    Allow from all
  </Location>

  WSGIScriptAlias / /home/user/projectname/wsgi_handler.py

  WSGIDaemonProcess projectname user=user group=user processes=1 threads=10
  WSGIProcessGroup projectname

</VirtualHost>

In the WSGIDaemonProcess line, you can easily manage the amount of system resources (measured in processes and threads) mod_wsgi should use. In my experience a single process with 10 threads will cover most small to medium loaded websites.

Why?

This is some of the reasons why you should ditch mod_python for mod_wsgi when hosting Django projects:

Faster

The load times of the websites now served with mod_wsgi really surprised me. Normally a page would be served within 150-300 ms. This was reduced to load times in the range of 40-80 ms.

I also discovered that running mod_wsgi in embedded mode (as opposed to daemon mode) was not worth the effort. I didn't really see any difference between load times when using Django.

Less memory usage

Everyone hosting more than a couple of Django projects on a single Apache instance knows that Django projects squanders a bit with memory usage, and every single Apache child process will easily end up using 50 MB RAM.

mod_wsgi dedicates a process (or multiple processes) to a single interpreter for a single Django project, and keeps the memory usage low in the "normal" Apache child processes. On a server with 8 small Django projects, I went from using ~1500 MB RAM on Apache child processes to using 150 MB.

Secure

When using mod_python your Python interpreter will be running as the user running the Apache webserver itself (on Debian systems, the user is called www-data). Typically this will allow you to peek around in places where you do not want your users peeking. This is due to the fact that www-data must have read access to every file you use in your application (including settings/configuration/media files).

mod_wsgi addresses this problem by changing to a user id specified in the configuration file, and run your Python interpreter as another user than www-data, allowing you to lock down every project on your server to seperate user accounts.

These points cover mod_wsgi running in daemon mode.

Conclusion

mod_wsgi rocks!

So if you are thinking about moving your systems to, or just curious about, mod_wsgi, you should really get to it. I, for one, welcome our new mod_wsgi overlords! (sorry)

Futher reading

Exploiting the Python object system for server monitoring

I have tried a lot of server/network monitoring systems over the years. Nagios, Zabbix, Zenoss is some of them. However, as a programmer, I didn't really feel right with any of them. Then I got this crazy idea of exploiting the Python object system for server monitoring. In this blog post I will try to explain my idea to you and provide a proof of concept implementation, and probably you will see how great an idea this is. ;-)

The system is designed to have a very small footprint (currently less than 500 lines of Python) and also designed with inheritance in mind.

In my system every server is a class. Servers can inherit from other servers. Classes can be defined as being a stub, and needing to be subclassed. Servers can inherit from multiple server "types".

The checks to be performed are just methods on the server class. To distinguish checks from other methods, they must start with check_. If a check method raises a MonitoringError exception, this will be handled by the notification system. All settings, like what port number your httpd runs on, is done in class variables. All class variables are defined with a default value. This makes it easy to configure.

Whenever a check fails (or returns to a successful state) a notification is to be sent. This is done by invoking all defined methods starting with notify_ on the server class.

To put things into perspective, here is an example of a http/https server:

This file should then be saved in a directory (along with an __init__.py file - this is Python... ;-)). The monitoring client is then invoked as: monitoring-client.py /path/to/hosts/directory. All subclasses of the base server class is then discovered and entered into a scheduler.

Server abstractions

The experienced Python programmer would probably by now have realized that if you have a lot of web servers, you would make a web server abstraction:

from monitoring.hosts.bundled.web import WebServer, SSLWebServer

class MyWebServerType(WebServer, SSLWebServer):
    admins = ("someone@example.com",)
    stub = True

class WebServer001(MyWebServerType):
    hostname = "server001.example.com"
    address = "195.274.31.12"

class WebServer002(MyWebServerType):
    hostname = "server002.example.com"
    address = "195.274.31.13"

Obviously, this would make more sense when you have a lot of more generic settings per server - but hopefully, you will get the point. If you were to also check a plain HTTP server running on port 81, you would just add it in the generic class:

from monitoring.hosts.bundled.web import WebServer, SSLWebServer

class MyWebServerType(WebServer, SSLWebServer):
    admins = ("someone@example.com",)
    stub = True
    http_ports = (80, 81)

Class relationships can also span multiple files. For easy importing, the directory with your hosts is automagically exposed to Python as monitoring.hosts.

Remote checks

I have also made a simple mechanism for carrying out remote checks. That is, checks that are run locally on the remote servers. This could fx. be to check the available disk space or check that a given process is running.

I use a XML-RPC server with some HMAC-SHA-1 for authentication. However, this still needs some work and is not very well-tested at this point. But if you are into security systems, please do take a look and comment. One thing that comes to my mind, is that i probably should also validate the "calling" IP address.

Adding remote checks to our server checks is as simple as adding another base class for our template:

from monitoring.hosts.bundled.web import WebServer, SSLWebServer
from monitoring.hosts.bundled.remote import RemoteChecks

class MyWebServerType(WebServer, SSLWebServer, RemoteChecks):
    admins = ("someone@example.com",)
    stub = True
    http_ports = (80, 81)

This will automagically check disk usage, system load, ram usage and swap usage. The limits can be configured using class variables (if the default ones doesn't cut it for you):

from monitoring.hosts.bundled.web import WebServer, SSLWebServer
from monitoring.hosts.bundled.remote import RemoteChecks

class MyWebServerType(WebServer, SSLWebServer, RemoteChecks):
    admins = ("someone@example.com",)
    stub = True
    http_ports = (80, 81)
    disk_limit = 50 # warn when more than 50% if the disk is full

This also exposes a method called remote_call on the server. That is, if we were to check for a specific running process, we could just define a method:

from monitoring.hosts.bundled.web import WebServer, SSLWebServer
from monitoring.hosts.bundled.remote import RemoteChecks

class MyWebServerType(WebServer, SSLWebServer, RemoteChecks):
    admins = ("someone@example.com",)
    stub = True
    http_ports = (80, 81)
    disk_limit = 50 # warn when more than 50% if the disk is full

    def check_someproc_running(self):

        """ Someproc running """

        procs = self.remote_call("pgrep -f someproc").strip()

        if not procs:
            raise MonitoringError("someproc not running on '%s'" % self.hostname)

Remember that any method prefixed with check_ will be treated as a service check and any MonitoringError exceptions will go to the notification system. Also when defining custom checks, remember a doc string, as this will be used to identify the problem in the notifications.

More

I haven't made a release yet, but will try the system out in production in the next week or so. If it seems to work, I might put out a release for you to try. Until then you can check out the code in my trac browser or get a snapshot to play with.

About Django and the importance of releases

My favorite Python web framework, Django, has not been updated for a long time. The most current release, 0.96, was released in March 2007. This is a very long time, when you're in the market of web frameworks.

This doesn't seem to bother a lot of people, as the common answer in the django community seems to be just to run the trunk version (development version).

I for one doesn't like that solution. And here are some of the reasons why.

Some of the problems with running a development version

  • When a security release is made, I cannot just update, but need to merge the change in, in all of my installations. An update could maybe break my existing code with backward incompatible changes.

  • It's easier to tell my co-workers that our projects will run 0.96, and not r6389 for one project and r7291 for another (+ a couple of security patches). That's okay if you are a single-person team working on a single project, but not when you have several people and projects.

  • Developers are afraid to commit new things to trunk, because a lot of users will be disappointed when they eagerly update their repositories each morning just to find that backwards compatibility has been broken. A good example of this is ticket 3639:

    This patch will be committed to trunk eventually, don't worry. But I, personally,
    haven't done it yet because of the massive backwards-incompatibility it introduces,
    making timing important.
    

No-one should ever be afraid to break backwards compatibility in the development version. This will just add more complexity to the job of making new releases by queuing up a lot of uncommitted patches.

Release early, release often

Many great people have preached this paradigm a lot. And for a reason. This is the best way for an open source project to succed. If you don't know what your users want, you will most likely fail.

Getting out releases will foster users to tell you about their experiences and if you are heading in the right direction with development with regards to what the users actually need.

Having more frequent releases would probably also spur more developers to contribute. Developers like active projects and getting their contributions released.

A suggestion

One of my biggest problems with the current (0.96) release is how buggy newforms is. This could easily be solved by making django.newforms (and maybe django.contrib.admin) into a separate project with a separate release schedule.

Then everyone wouldn't have to rewrite a lot of form handling code when the 1.0 release happens, but would now currently be using cleaned_data and ModelForms in spite of their current incarnations.

Conclusion

Despite my points in this post, I still think The Django framework is the best thing that has happened to the craft of building web applications since the invention of HTTP ;-)

Translate strings using Google Translate

Someone dared "me" to write a python interface for Google Translate. Here it is:

"""
translate.py

Translates strings using Google Translate

All input and output is in unicode.
"""

__all__ = ('source_languages', 'target_languages', 'translate')

import sys
import urllib2
import urllib

from BeautifulSoup import BeautifulSoup

opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'translate.py/0.1')]

# lookup supported languages

translate_page = opener.open("http://translate.google.com/translate_t")
translate_soup = BeautifulSoup(translate_page)

source_languages = {}
target_languages = {}

for language in translate_soup("select", id="old_sl")[0].childGenerator():
    if language['value'] != 'auto':
        source_languages[language['value']] = language.string

for language in translate_soup("select", id="old_tl")[0].childGenerator():
    if language['value'] != 'auto':
        target_languages[language['value']] = language.string

def translate(sl, tl, text):

    """ Translates a given text from source language (sl) to
        target language (tl) """

    assert sl in source_languages, "Unknown source language."
    assert tl in target_languages, "Unknown taret language."

    assert type(text) == type(u''), "Expects input to be unicode."

    # Do a POST to google

    # I suspect "ie" to be Input Encoding.
    # I have no idea what "hl" is.

    translated_page = opener.open(
        "http://translate.google.com/translate_t?" +
        urllib.urlencode({'sl': sl, 'tl': tl}),
        data=urllib.urlencode({'hl': 'en',
                               'ie': 'UTF8',
                               'text': text.encode('utf-8'),
                               'sl': sl, 'tl': tl})
    )

    translated_soup = BeautifulSoup(translated_page)

    return translated_soup('div', id='result_box')[0].string

Usage:

>>> import translate
>>> translate.translate('da', 'en', u'Goddag')
u'Good day'

Bash history aggregation

So, I wanted to join the fun:

$ history|awk '{a[$2]++} END{for(i in a){printf "%5d\t%s\n",a[i],i}}'|\
sort -rn|head
   87   python
   56   svn
   42   ssh
   38   cd
   33   rdesktop
   22   touch
   21   mplayer
   20   ls
   16   whois
   15   host

I guess it's obvious that I currently mostly work on python projects hosted on svn ;-)

What does yours look like?