Email Analytics
During the past six months I've been drowning in email. I spend a large part of my day responding to email messages and filing incoming messages I consider important. Yet I'm falling behind and this affects the quality of my work: I sometimes delay responding to important messages. Followng Peter Drucker's dictum "If you can't measure it, you can't manage it", I decided to write a tool to analyze my incoming and outgoing email messages.
Thankfully, I've resisted the temptation of using an online service for managing my email, and therefore I have all my email messages stored on my hard disk. They are stored in the relatively simple mbox format. Yet, the complications of parsing email headers are so great that I decided to use Mark Overmeer's excellent Mail-Box email processing Perl package. Through experiments and code reviews I performed last year I found that this package was the most correct and comprehensive among the libraries available for any language.
For reporting the results I initially planned to use Perl's built-in reporting mechanism. However, I then thought that tables would be easier to create and more readable if they were in HTML, so I opted for that approach. For the first time in my life I used an HTML generation library rather than printing HTML tags by hand. For this I adopted Pete Krawczyk's HTML::AsSubs module. I found it very easy to use, and it helped a lot my code's readability. I also used many function parameters, which reduced considerably the code's duplication. For instance, all tables are created by a single subroutine.
You can find the source code of the Perl script I wrote here. If you plan to use it on your own email you'll need to customize the script in the places I've marked. Sadly I lack the time to make it configurable through user options. If you add support for a different mail box format please post your changes as a comment to this blog entry, so that others can benefit from it.
The script creates a summary of the following measures:
- Number of messages
- Number of recipients
- Number of senders
- Number of active days
- Average messages per day
- Average messages per month
- Average messages per folder
- Average messages per recipient
- Average messages per sender
- Emails by month
- Emails by month ordered by volume
- Emails by day of week
- Emails by day of week ordered by volume
- Emails by hour
- Emails by hour ordered by volume
- Top 10 folders
- Top 10 email addresses
- Emails by folder
- Emails by folder ordered by volume
- Emails by address
- Emails by address ordered by volume
Through the analysis I found a number of interesting facts:
- I must process about 80 messages every day to keep my email under control,
- the messages I send are more than the messages I receive and file,
- the emails I've been processing each month has been increasing,
- midday and the days in the middle of the week are the busiest times (a well-known observation),
- most emails are related to human-resource issues, e-government, and a few tough projects, and
- most emails come from three people.