A dictionary and 79 epubs walk into a bar…
Word Suggestion Engine
This was the major theme of the first half of the work so far. After spending a few hours crawling the interwebs for someone who has done this in an open-source, information should be free, hack-the-planet kind of way, I gave up a decided I needed to do it myself. I took a hint from an old friend, Markov chains. I needed an engine that when given a word, it would spit out a list of words that would most-likely come next. In this scenario, Markov chains basically say ignore everything but the word that comes before it and that works pretty well.
So! I grabbed a pile of almost 80 fiction books in epub format and wrote an analysis tool that calculates for any given word, what are the most likely words to come after it. Filter all those words with actual words from a fully resolved Open Office dictionary (which was a bit of a project on its own)…and done!
Here’s how we are looking so far:
Look good? I think it looks pretty good. The word recommendations are normally pretty close if not spot-on for this passage. There’s no tab-completion yet, which is really the linchpin to see if this little experiment is actually useful. Better get back to work!
The task list as it stands today, for posterity: