Weekly 87 — Ten-key keyboard: Analysis and obstacles 21 Mar 2016 tiptaptypin

(continued from Weekly 86 — Ten-key keyboard: A pitch)

The benefit of typing this way always has to outweigh normal typing at every step, so all this work has to be invisibly done on the developer’s side if anyone hopes to find this useful. If solutions ever involve the user needing to retrain their muscle memory, the purpose is lost anyway.

An ideal end scenario would be like this: the user types (prose, essays, notes, stenography, etc.) using their preexisting muscle memory from touch-typing quietly in places where typing at a regular keyboard or handheld device is inconvenient. The user can read over the text later and change incorrectly guessed words to what was intended (which hopefully won’t be too often).

Let’s put aside the engineering aspect of the ten-key keyboard and focus on the programming side. Using ten keys, can someone type readable, logical, capitalized, punctuated prose?

For the sake of testing, finger presses are represented by the home row keys your fingers normally rest on (left pinky on “A”, left ring finger on “S”, etc). To guess what word the user was typing, we’re basically working with a hash table where the hash function is converting every word to its “home row” equivalent (“HELLO” → “JDLLL”), the value is the intended English word, and collisions are handled by guessing which value was intended based on the words beforehand.

Here is an example of a collision the algorithm would have to deal with:


Let’s assume you want to use this software to sit down and write The Old Man and the Sea. How many words hashed to the same “home row” key?

Words with 1 collision: 1958
Words with 2 collisions: 143
Words with 3 collisions: 35
Words with 4 collisions: 12
Words with 5 collisions: 3
Words with 6 collisions: 1

91.0% of the words in The Old Man and the Sea were completely unique and didn’t need to use the predictive algorithm at all. Not bad! But The Old Man and the Sea is a book with a famously minimal writing style. How do these ratios look with a more complex book, like Ulysses?

Words with 1 collision: 22112
Words with 2 collisions: 1339
Words with 3 collisions: 380
Words with 4 collisions: 144
Words with 5 collisions: 79
Words with 6 collisions: 54
Words with 7 collisions: 30
Words with 8 collisions: 13
Words with 9 collisions: 5
Words with 10 collisions: 6
Words with 11 collisions: 5
Words with 12 collisions: 4
Words with 13 collisions: 4
Words with 14 collisions: 1
Words with 15 collisions: 1

In spite of Ulysses’ famous density and complexity (or maybe because of this), even more words (91.4%) have unique mappings in the hash table. However, there are far more words that all collide to the same key, with one home row key even matching fifteen different English words.

If well trained enough, an algorithm should be able to easily sort out what word you want based on what part of speech it is (like, something that ends with “s” probably won’t come after “I”, etc.). However, problems arise when two words are similar. When writing your first sentence, how would the algorithm know if you’re writing about a dog or an elf? What if the words are synonyms, like roaring/blaring? (“The music was…”) It’s necessary to show options for incorrectly guessed words at a later point…but will the mistaken word throw off the rest of the sentence? Can any algorithm expect to take “JD FKF A FKF FKF LFDF A FKF AF A FKF SJKDJ FKF FJD FKF” and interpret it as “He bit a big rib over a bib at a gig which fit the fib”?

Here are other possible scenarios the algorithm would have to handle:

– Punctuation
– Deletion
– Invented names/words
– Numbers
– Capitalization
– Short, contextless sentences
– Storing all words + context

A free thumb can fill in for some of these, as well as the underused right pinky (Why does a spacebar require as much keyboard space as it does, anyway? Do people regularly alternate which thumb they space with?). Holding one pinky down while pressing another key could easily replicate capitalization, although it’s probably not too hard to algorithmically capitalize words correctly anyway. Numbers could be written out, as technical writing (or any kind of coding—oof) wouldn’t fit this at all. A finger (left thumb?) could fill in for a general punctuation key, but this could get messy. Plus, some method for deletion is necessary. I frequently hit the “wrong key” when typing on an armrest and muscle memory my way back with my right pinky. A wrong word can ruin whatever context the algorithm had built up and throw off future words. The corpus of English words should cover all common proper nouns, but writers who frequently use invented words and names might have to manually add these to the program’s corpus of words to look for, similar to telling Word to learn the spelling of your last name to stop underlining it red. Also—by storing all reasonably well-used English words (which is, an unfortunate aspect of this language, quite a huge number) and their contexts going to be a reasonable size? Or would a service like Amazon/Google/Facebook have to store the database, and allow use of it through the browser?

While this seems like a long list of obstacles, I’ve learned to be optimistic about the capabilities of computers when fed massive amounts of data to train from. The feasibility of this project will become more clear as early tests are created, something more elaborate than the short Python scripts I made to analyze the collisions of words in text files.