So I had this idea for a project – If you have a little server (from digital ocean), what cool things can you do with it?

The idea I ended up with (suggested by my dad) was to make a bot that emails you a haiku – email it with a few keywords, and it’ll randomly generate a haiku using those words and send it back.

I decided that there were 5 key sections to make the idea work

  • A program to send and receive emails, and scrape relevant information
  • A function to search wikipedia for the keywords
  • A function to make sentences from wikipedia articles
  • A function to form sentences into haiku
  • A way to run the program every couple of minutes

I decided to use Python for this.

Note – the code is, at this point, still quite buggy, and changing constantly – so I won’t be putting the full thing up and analysing it yet, although I might do that later.

The email section

The first thing I did was set up an email address using Gmail for the haiku bot – as it turned out, using Gmail might have made things a little more difficult, but it worked out in the end. I then set to work finding a way to scrape the emails down, and grab the bits I wanted – then to be able to reply to their emails. Since I had no idea what I was doing, I mostly scrabbled this together from online sample code.

Writing the actual Haiku

This bit is still very much subject to change, because the current Haikus are pretty bad. Here’s a few examples:

Prompted word: Rabbi
Haiku:
of Toanot is not
study sections of leaders
In addition to

Prompted word: Socks
socks usually top
Sports Most sports will sort of sock
In Kingdom a socks

Prompted word: (I don't even know what prompted this one)
This is a cancer
 Expenses controversy
 was to Deputy

It’s pretty much random, and not really recognisable as relevant to the keyword. I’ve got a bunch of suggestions for how to improve it, although I haven’t started implementing them yet.

So, how does this work? It starts by using the python wikipedia API to search for the keyword, and then downloads the text of that wikipedia page. Next, it uses a library called Markovify to generate random sentences based on the text, using markov chains. Then there’s a function that takes these sentences, and keeps making them until it finds one where the first few words are 5 or 7 syllables, and takes those few words to be a line in the Haiku. It’s not particularly sophisticated, and it’s quite buggy (the syllable counting function is surprisingly accurate considering how simple it is, but it’s still not great because syllable counting is a difficult process for computers.) There are a bunch of ways to improve it.

Possible Improvement #1 – Improve the syllable counter

There are multiple ways to do this – make lists of common exceptions, try and find more and more rules of thumb, and so on. Another method to look into is training a neural network to count syllables; this might be a little overboard, but it would be a neat project.

Possible Improvement #2 – Make the Haiku include the keyword

This one makes it more relevant – if there are multiple keywords, try and include at least one keyword. Fairly straightforward.

Possible Improvement #3 – Preference longer words

The longer the words are, the more interesting, and the less pressure there is to form a grammatically cohesive sentence – if you have “aesthetic capitulate”, not only is it more interesting than “as of it apple right not”, but it’s much easier to make it seem like real English, because of the lower word count. Obviously, you can’t only have long words, but making them preferred might improve the Haiku.

Possible Improvement #4 – Train a neural network to pick good Haiku

Ok, I know this is total overkill, but it is interesting, and not necessarily as hard as you might think. Neural networks are amazing things, and I’d love to learn more about them by doing a project. If you could train such a network to “like” haiku that closely resemble actual human-written haiku, you could make a really neat filter for the poems.

Making it run

Right, now you’ve got your script – you’ve tried it a few times, it works, but you need to make it run on autopilot. I used something called crontab, which is a built in part of unix/linux type operating systems. Go to the terminal, type “crontab -e”, and it will let you edit the crontab file; all you need to do is add a line pointing it to your program, and how often you want to run it, and it’ll run it for you. For example, mine has the line

*/3 * * * * python /root/haikumail.py >> /root/crontab.log

Which tells it to run the file “haikumail.py” every 3 minutes.

Advertisements