EuroPython talk: Python and the IMAP protocol

July 29, 2010 Tags: python, email, imap

The talk I gave at EuroPython, as a full article/tutorial. Long story short? Use IMAPClient, but try to learn how IMAP works before.

Introduction

This is a summary of my notes of my EuroPython 2010 talk about Python and the IMAP protocol. I had a interested audience and some very good feedback after the talk, and I thought it would be nice to have the content available as a full tutorial.

The reason I gave this talk is that I'm playing a little bit with IMAP while working on a Python/Django-based webmail with a friend of mine. Nothing is really usable at this point but I'll post some more details on this website once there's something people can actually play with.

The talk was divided into four parts. First, I explained how the protocol works using a telnet connection to an IMAP server. Then I showed how the same kind of things could be achieved using imaplib from Python's standard library. I then gave an introduction to IMAPClient and finished by showing how you can extend the functionnality of imaplib to take advantage of some interesting extensions such as XLIST, IDLE or AUTH=XOAUTH.

IMAP

The description of IMAP4rev1, the latest version of the protocol (2003), is available in RFC3501. IMAP is a message access protocol, meaning it is supposed to be used along a message transport protocol, typically (but not necessarily) SMTP.

The key difference between POP and IMAP is that the server and the different clients are synchronized: if you read your email using a desktop client, you'll see the account as you left it if you switch to a web-based client.

IMAP is available for instance with Gmail and MobileMe, and there are plenty of open-source servers which you can install on your own boxes. Microsoft Exchange is also based on IMAP4.

Let's start an IMAP session by logging in to Gmail's IMAP server using an ssl-enabled version of telnet:

$ telnet-ssl -z ssl imap.gmail.com imaps
Trying 209.85.227.109...
Connected to gmail-imap.l.google.com.
Escape character is '^]'.
* OK Gimap ready for requests from XX.XX.XX.XX t16if7810984wbs.33

Commands

The way we send an IMAP command is always the same: the command starts with a "tag", the command name follows and there also may be some arguments. The tag can be any alphanumerical string and is used to identify the response: the tag is sent back by the server at the end of the response. A convention is to use an auto-incrementing number.

At the end of the response, the tag is sent back and a status is sent by the server with some more verbose output. The response code can be:

  • OK: the server was able to fullfill the request and the response has been sent back.
  • NO: the server has actually understood what the client needs but for some reason he's not able to answer. That happens for instance if you try to login using the wrong credentials or if you want to fetch something that doesn't exist.
  • BAD: the request is not correct and the server didn't understand what the client is trying to do.

As an example, let's send the LOGIN command to authenticate ourselves:

1 LOGIN username "*******"
* CAPABILITY IMAP4rev1 UNSELECT LITERAL+ IDLE NAMESPACE QUOTA ID
             XLIST CHILDREN X-GM-EXT-1 UIDPLUS COMPRESS=DEFLATE
1 OK username@gmail.com authenticated (Success)

We're now authenticated. We can see some more output, this is the perfect occasion to mention the concept of "capabilities". The server has the ability to declare what he can do, and this is what happens here. This behaiviour is also available as a CAPABILITY command:

2 CAPABILITY
* CAPABILITY IMAP4rev1 UNSELECT IDLE NAMESPACE QUOTA XLIST
             CHILDREN XYZZY SASL-IR AUTH=XOAUTH
2 OK Thats all she wrote! n15if5996010wed.81

The output is actually different from the LOGIN command, so it's probably best to stick with the actual CAPABILITY command. Ideally, before issuing a non-standard command, you should check that the server has the corresponding capability.

States

The next concept to understand is the state: when a client sends a command, he may change from a state to another. A state determines the available commands, some commands may be available all the time or just for a specific state. The main states are:

  • Not authenticated: the client has just arrived, what he should do next is to try to login and check for the different capabilities.
  • Authenticated: login successful, the client may list the different mailboxes and select them.
  • Selected: the client is working inside a specific mailbox to fetch messages, flag them...
  • Logout: the client is leaving.

There is a beautiful chart on the RFC's page, illustrating how clients can switch from a state to another.

Now that we're authenticated, let's see what directories we have. This is done with the LIST command. The first argument is the directory you want to list, the second one is a pattern to restrict the list. For example, to list everything:

3 LIST "" "*"
* LIST (\HasNoChildren) "/" "INBOX"
* LIST (\Noselect \HasChildren) "/" "[Gmail]"
* LIST (\HasNoChildren) "/" "[Gmail]/All Mail"
* LIST (\HasNoChildren) "/" "[Gmail]/Drafts"
* LIST (\HasNoChildren) "/" "[Gmail]/Sent Mail"
* LIST (\HasNoChildren) "/" "[Gmail]/Starred"
* LIST (\HasNoChildren) "/" "[Gmail]/Trash"
3 OK Success

What we get back is a list of directories and some properties: \Noselect means that the directory can't be used to store messages (only subdirectories) and \HasNoChildren is pretty much explicit.

If we want to list only the directories under the [Gmail] folder, we can do:

4 LIST "[Gmail]" "*"
* ...
4 OK Success

Or:

5 LIST "" "[Gmail]/*"
* ...
5 OK Success

Those two commands are completely equivalent. The % sign can be used to limit the results to only top-level directories:

6 LIST "" "%"
* LIST (\HasNoChildren) "/" "INBOX"
* LIST (\Noselect \HasChildren) "/" "[Gmail]"
6 OK Success

Now that we have a list of mailboxes, we can fetch their properties. The STATUS command can return different properties of a mailbox:

7 STATUS "INBOX" (MESSAGES UIDNEXT UIDVALIDITY UNSEEN)
* STATUS "INBOX" (MESSAGES 14 UIDNEXT 20200 UIDVALIDITY 2 UNSEEN 1)
7 OK Success

The available "status items" are:

  • MESSAGES: the number of messages sotred in this mailbox.
  • RECENT: the number of messages arrived since the last connection to the server.
  • UIDNEXT: a indication of the UID the next message could have. This is just an indication, it can be used to check if a new message has arrived.
  • UIDVALIDITY: the "validity value" of the mailbox... Look at the RFC.
  • UNSEEN: the number of unread messages in the mailbox.

So, we have an unread message in the inbox, let's fetch it and read it. We need to "select" the inbox before fetching anything:

8 SELECT "INBOX"
* FLAGS (\Answered \Flagged \Draft \Deleted \Seen)
* OK [PERMANENTFLAGS (\Answered \Flagged \Draft \Deleted \Seen \*)]
* OK [UIDVALIDITY 2]
* 13 EXISTS
* 0 RECENT
* OK [UIDNEXT 20202]
8 OK [READ-WRITE] INBOX selected. (Success)

We've now switched to the "Selected" state. We can fetch and store the different messages of this mailbox, and when we're done we need to switch back to the "Authenticated" state with the CLOSE command:

999 CLOSE
999 OK Returned to authenticated state. (Success)

The command used to fetch information about the messages is the FETCH command. The first argument is the sequence to fetch (a comma-separated list of message IDs or UIDs or a sequence: 2:5 means messages from 2 to 5), the second one is a list of fields to fetch.

When using the FETCH command, it is important to request only the fields you really need: if you're just displaying the textual content of the message, it is a waste of time and resource to also request the attachements. When you display a list of messages, you don't even want to fetch the body.

For example, if you only want to display a list of messages with only the common fields:

9 FETCH 1:* ALL
...
* 14 FETCH (ENVELOPE (
    "Sun, 11 Jul 2010 17:33:45 +0100"
    "Live from Europython"
    (("=?ISO-8859-1?Q?Bruno_Reni=E9?=" NIL "bruno" "renie.fr"))
    ((NIL NIL "username" "gmail.com"))
    NIL NIL NIL
    "<AANLkTil93pqk-E6Xh4PUQkt@mail.gmail.com>")
  FLAGS ()
  INTERNALDATE "11-Jul-2010 16:34:05 +0000"
  RFC822.SIZE 1422)
9 OK Success

This is the response for a single message, obviously you will get this kind of output for each message in the mailbox.

ALL is a shortcut for (FLAGS INTERNALDATE RFC822.SIZE ENVELOPE). See the RFC for a complete list of the available feeds.

When you want to fetch a whole message, a nice way to do it is by fetching the RFC822 content item:

10 FETCH 14 RFC822
* 14 FETCH (RFC822 {1422}
Delivered-To: username@gmail.com
MIME-Version: 1.0
Received: by 10.204.50.206 with HTTP; Sun, 11 Jul 2010 09:33:45 -0700 (PDT)
From: =?UTF-8?B?QnJ1bm8gUmVuacOp?= <bruno@renie.fr>
Date: Sun, 11 Jul 2010 17:33:45 +0100
Message-ID: <AANLkTil93pqk-E6Xh4PUQktJyWqemC5MXE_0G_eLGmab@mail.gmail.com>
Subject: Live from Europython
To: username@gmail.com
Content-Type: text/plain; charset=UTF-8

Testing testing!
 FLAGS (\Seen))
10 OK Success

The output is the source of the message, which we can parse and display in a proper way. What is interesting here is that the FLAGS for this message have also been returned and the \Seen flag has been implicitly set. This behaviour can be desirable, but sometimes you want to explicitly leave your unread messages as they are until you explicitly mark them as seen. This can be done by using the EXAMINE command instead of SELECT. EXAMINE will switch you to the "Selected" mode but nothing will be altered, this is a read-only mode:

11 EXAMINE "INBOX"
* FLAGS (\Answered \Flagged \Draft \Deleted \Seen)
* OK [PERMANENTFLAGS ()]
* OK [UIDVALIDITY 2]
* 15 EXISTS
* 0 RECENT
* OK [UIDNEXT 20211]
11 OK [READ-ONLY] INBOX selected. (Success)

Just like with SELECT, issuing a CLOSE command will switch you back to the "Authenticated" state:

12 CLOSE
12 OK Returned to authenticated state. (Success)

This is the basis of the protocol. There are a few other interesting commands, to manage the mailboxes (CREATE, DELETE, RENAME), search for messages (SEARCH, with tons of criteria), store/delete messages and manage their flags (STORE, EXPUNGE) but at this point you should be able to find by yourselves how to use the commands. Let's move to the next step and see how to do all this in Python.

imaplib

Imaplib is a client library for the IMAP protocol available in the standard library. It is a mapping at the protocol level and not an object-level layer. That means that you haven't read the first part for nothing, you will actually need it.

Imaplib does very little response parsing, its main job is to map common IMAP commands to python methods. Here is how to use it:

import imaplib
imaplib.Debug = 4  # Default: 0.
                   # 1 = little output
                   # 5 = VERBOSE

m = imaplib.IMAP4_SSL('imap.gmail.com')

status, response = m.login('username', 'password')
  • A global Debug flag can be set for development, 4 is usually fine.
  • A client is instanciated with the hostname of your IMAP server.
  • Each time you send a command, you get the status and the response. The pattern is to write status, response = m.command(...).

You get back the IMAP status of the command and the slightly parsed content of the response. Imaplib may also raise a few exception so you have to do status checking and exception handling. Best of both worlds.

To list your mailboxes, send the LIST command:

status, response = m.list("", "*")

# response:
['(\\HasNoChildren) "/" "INBOX"',
 '(\\Noselect \\HasChildren) "/" "[Gmail]"',
 '(\\HasNoChildren) "/" "[Gmail]/All Mail"',
 '(\\HasChildren \\HasNoChildren) "/" "[Gmail]/Drafts"',
 '(\\HasNoChildren) "/" "[Gmail]/Sent Mail"',
 '(\\HasNoChildren) "/" "[Gmail]/Starred"',
 '(\\HasNoChildren) "/" "[Gmail]/Trash"']

Each line is a python string that you have to parse to extract the information about each directory.

To see the status of a mailbox:

status, response = m.status('INBOX', '(MESSAGES UNSEEN)')

# response:
['"INBOX" (MESSAGES 16 UNSEEN 0)']

The difference between SELECT and EXAMINE is done by passing a readonly=True keyword argument to the select() method:

# SELECT
status, response = m.select('INBOX')

# EXAMINE
status, response = m.select('INBOX', readonly=True)

# response:
['16']

Yes, the response is an integer, as a string. Finally, the FETCH command:

status, response = m.fetch('1:*', 'INTERNALDATE')

# response:
['...',
 '14 (INTERNALDATE "12-Jul-2010 20:38:05 +0000")',
 '15 (INTERNALDATE "13-Jul-2010 13:01:34 +0000")',
 '16 (INTERNALDATE "14-Jul-2010 20:00:50 +0000")']

Again, each line is a single string. In this case the parsing would be pretty simple but if you request several fields at the same time you will need quite a robust parser.

Finally, here is how to fetch the full content of a message:

status, response = m.fetch(14, 'RFC822')

# response:
[('14 (RFC822 {1422}',
  'Delivered-To: etc, full message content'),
 ')']

While the format of the response looks a bit weird, this behaviour is completely consistent and the RFC822 content is fully parseable using the email.parser module.

IMAPClient

So, what do you think of the imaplib API? If it doesn't appeal to you that much, you will probably like this part. IMAPClient is a third-party module written by Menno Smits. It's BSD-licenced and it provides a higher-level interface than imaplib, trying to return native python types and parsed responses whenever it's possible. It also removes the IMAP status from the responses, doing the response check for you and raising a proper exception if something bad happens.

IMAPClient is available on the cheeseshop, so you can install it easily:

pip install IMAPClient

Now profit. Here's how to create a client:

import imapclient

c = imapclient.IMAPClient('imap.gmail.com', ssl=True)
c.login('username', 'password')

Let's move on and list our IMAP folders:

response = c.list_folders()

# response:
[([u'\\HasNoChildren'], '/', u'INBOX'),
 ([u'\\Noselect', u'\\HasChildren'], '/', u'[Gmail]'),
 ([u'\\HasNoChildren'], '/', u'[Gmail]/All Mail'),
 ([u'\\HasChildren', u'\\HasNoChildren'], '/', u'[Gmail]/Drafts'),
 ([u'\\HasNoChildren'], '/', u'[Gmail]/Sent Mail'),
 ([u'\\HasNoChildren'], '/', u'[Gmail]/Starred'),
 ([u'\\HasNoChildren'], '/', u'[Gmail]/Trash')]

Some interesting differences with imaplib:

  • Each line is properly split into a tuple with all different logical elements.
  • Unicode strings. This is extremely valuable especially if you're writing in one of those crazy languages that needs more than ASCII.

Next, the status of a mailbox:

response = c.folder_status('INBOX')

# response:
{'MESSAGES': 16L,
 'RECENT': 0L,
 'UIDNEXT': 20277L,
 'UIDVALIDITY': 2L,
 'UNSEEN': 0L}

Second surprise, the response is parsed and we get a nice python dictionnary.

The distinction between SELECT and EXAMINE is also made using the readonly keyword argument:

response = c.select_folder('INBOX', readonly=True)
response = c.select_folder('INBOX')

# response:
{'EXISTS': 16,
 'FLAGS': ('\\Answered', '\\Flagged', '\\Draft', '\\Deleted', '\\Seen'),
 'PERMANENTFLAGS': ('\\Answered', '\\Flagged', '\\Draft',
                    '\\Deleted', '\\Seen', '\\*'),
 'READ-WRITE': True,
 'RECENT': 0,
 'UIDNEXT': 20277,
 'UIDVALIDITY': 2}

The SELECT response is also more complete than with imaplib.

Next, let's fetch a list of the date and the size of our messages:

response = c.fetch('1:*', ['INTERNALDATE', 'RFC822.SIZE'])

# response:
{ ...
 20236: {'INTERNALDATE': datetime.datetime(2010, 7, 13, 14, 1, 34),
         'RFC822.SIZE': 7045,
         'SEQ': 15},
 20274: {'INTERNALDATE': datetime.datetime(2010, 7, 14, 21, 0, 50),
         'RFC822.SIZE': 7197,
         'SEQ': 16}}

Another ability of IMAPClient is to deal with native datetime objects that even handle timezones. It also works with UIDs instead of message sequence numbers, and returns the sequence numbers in case you still need them.

Managing flags is also made very simple. Here is how to mark a message as new:

response = c.get_flags(20274)

# response:
{20274: ('\\Seen',)}

response = c.remove_flags(20274, imapclient.SEEN)

# response:
{20274: ()}

Finally, fetching a full message:

response = c.fetch(20202, ['RFC822'])

# response:
{20202: {'RFC822': 'Delivered-To: ...',
         'SEQ': 13}}

To summarize IMAPClient, I think it has a very clean API that is much nicer than imaplib. Some understanding of the protocol is still required but in a sense it is a good thing. Native types and unicode are key features and the only remaining job is to parse the content of the emails.

Extending

We saw at the beginning that the server is able to declare its capabilities. Some of them are widely available but not implemented on the client side, so we may want to have custom versions of imaplib to add support of additional commands.

XLIST

XLIST is a extension by Apple and Google that adds localization capabilities for folder names like Inbox, Spam, Drafts and additional folder flags to mark the properties of a folder. The usage is exactly the same as LIST:

14 XLIST "" "*"
* XLIST (\HasNoChildren \Inbox) "/" "Inbox"
* XLIST (\Noselect \HasChildren) "/" "[Gmail]"
* XLIST (\HasNoChildren \AllMail) "/" "[Gmail]/All Mail"
* XLIST (\HasChildren \HasNoChildren \Drafts) "/" "[Gmail]/Drafts"
* XLIST (\HasNoChildren \Sent) "/" "[Gmail]/Sent Mail"
* XLIST (\HasNoChildren \Starred) "/" "[Gmail]/Starred"
* XLIST (\HasNoChildren \Trash) "/" "[Gmail]/Trash"
14 OK Success

To implement it with imaplib, we need to introduce the contept of tagged and untagged response. Here, the tag I sent is "14", and at the end of the response the server sends the tag back. The line starting by the tag (the last line) is the tageed response and all the rest is the untagged response.

The way imaplib works is that it sends a command, waits for the tagged response and returns the untagged response. Here's how we do it:

import imaplib

imaplib.Commands['XLIST'] = imaplib.Commands['LIST']


class Imap(imaplib.IMAP4_SSL):

    def xlist(self, directory, pattern):
        name = 'XLIST'
        typ, data = self._simple_command(name, directory, pattern)
        return self._untagged_response(typ, data, name)

imaplib maintains a dictionnary of the available commands in the different state, so we need to declare that XLIST works the same way as LIST. Then we create a subclass IMAP4_SSL that implements the xlist() method the way we want: wait for a tagged response and return the untagged response. Then we can use it this way:

m = Imap('imap.gmail.com')
m.login('username', 'password')

status, response = m.xlist("", "*")

IDLE

Another interesting extension to IMAP is the IDLE command. The idea is to allow the client to be notified in real time when something happens in the selected mailbox. Here's how it works:

15 SELECT "INBOX"
...
* 12 EXISTS
15 OK [READ-WRITE] INBOX selected. (Success)

16 IDLE
+ idling
... time passes
* 13 EXISTS
DONE
16 OK IDLE terminated (Success)

We select the inbox and we notice that there are 12 messages in it. Then we send the IDLE command. The server starts idling, time passes and eventually something new comes. The server sends an untagged response saying that there are currently 13 messages in the mailbox, but does not terminates the command.

The client has to send the DONE message to actually terminate the command.

Implementation-wise, it is a bit tricky because the client has to step in to terminate the command. If we start idling and wait for an untagged response, it will be None and returned immediately.

Fortunately, someone (Piers Lauder, the original author of imaplib) has done it for us. imaplib2 is a complete, backward-compatible reimplementation of imaplib. It relies on threading to allow parallel execution of some commands and also implements the IDLE mechanism. Here is how it works:

import imaplib2

m = imaplib2.IMAP4_SSL('imap.gmail.com')
m.login('username', 'password')
m.select('INBOX')

status, response = m.idle()

status, response = m.status('INBOX', '(MESSAGES UNSEEN)')

When you call idle(), the client blocks until something happens or a timeout is reached (something like 30 minutes). When something happens it immediately sends the DONE message to terminate the command. When the command finishes, you know that something has happend but you don't know what yet, so you need to check the status of the mailbox.

OAuth

An interesting feature lauched by Gmail in March 2010 is the ability to use OAuth to authenticate to their IMAP and SMTP servers. If you're developing an application needing to access your users' Gmail accounts, this is a great way to do so without asking for (and storing) their raw password. Instead, the users give you a token which you can use to authenticate. The token can be revoked at any time and your users still control what can be done with their account.

So, OAuth is available with Gmail's IMAP and SMTP servers (remember the AUTH=XOAUTH in the server's capabilities?), and the good news is that you don't have much work to do: the python-oauth2 library provides an IMAP subclass supporting OAuth authentication. To install python-oauth2:

pip install oauth2

And the usage is very simple:

import oauth2 as oauth
import oauth2.clients.imap as imaplib

consumer = oauth.Consumer('anonymous', 'anonymous')
token = oauth.Token('your token', 'your token secret')

m = imaplib.IMAP4_SSL('imap.gmail.com')
m.authenticate(url, consumer, token)

There is a python script here that you can use to get a token for your account. For the consumer part, you can keep it as an "anonymous" consumer but google's OAuth pages will show a warning to the user so if you have an app deployed on a real domain it's best to register a proper conumser with Google.

Summary

The conclusion here is that if you're considering doing client-side IMAP in python, you should really use IMAPClient unless you really need the custom features. In this case you may even want to subclass IMAPClient if you need the extra features. Oh, and it already implements the XLIST command.

I had some very good feedback from the audience and some interesting questions. Tim Golden asked if there was any object-level library for accessing an IMAP, I told him some are mentioned on PyPI but the links are broken or the source is not available, so those projects are probably dead. There was another question about using an IMAP server to store files and backups. As long as it is a valid email, you can store anything you want on an IMAP server and that includes attachements. So your 7GB of Gmail storage could probably be used as a backup disk, and this is what seems to be done by the GmailFS (Python) project. But anyway, I think real filesystems are better at storing files...

Congratulations if you're still reading, have fun scripting your mailboxes!

Comments

July 29, 2010daks

Really interesting tutorial. Thanks for writing down your notes after the Europython presentation.

And I hope wombat will be usable soon, as I really miss a good and user-friendly open source webmail: Roundcube is an option but a Django one is better! ;)

Bye.

August 4, 2010Bruno

daks: I'm sprinting quite intensively on Wombat, it's getting more and more interesting. I hope to have the minimal feature set soon :-)

August 17, 2010Mathieu Agopian

Excellent! very syntetic post, and very useful ;)

Is there a video of your talk somewhere?

What is this Wombat thing you're talking about in the comment? Is that the name of the webmail you're working on? Is there a page/repository accessible somewhere?

That's a lot of questions ;)

Oh, and get going on the "provide a french RSS link for the django-fr.org planet" already, we need all this precious knowledge available and accessible to the masses !

August 17, 2010Bruno

Mathieu: Video recording was done only for about 30% of the talks, and not mine. There was a microphone though and audio records should be published somewhere at some point...

Wombat, indeed, is the name of the webmail I'm working on with a friend of mine. There is a git repository at http://gitorious.org/wombat, we're not doing much publicity because it's just not ready yet ;-)

And yeah, I need (and want) to blog in French. It's just a matter of time!

March 26, 2011rrsguru

Thanks a lot, I am a newbie to Python and getting bit addicted knowing its simplicity and ease of implementation.

As a note for all the newbies out there, who wish to implement ones own email update tool

1. download imaplib2.py
2. Add it to the system path of imaplib
3. import imaplib2 in the commandline
4. And check the updates of ur email by the "idle" command

Add a comment

Comments are closed for this entry.