Deprecated: Assigning the return value of new by reference is deprecated in /nfs/c01/h11/mnt/34814/domains/ on line 36

Deprecated: Assigning the return value of new by reference is deprecated in /nfs/c01/h11/mnt/34814/domains/ on line 21

Deprecated: Assigning the return value of new by reference is deprecated in /nfs/c01/h11/mnt/34814/domains/ on line 540
Humanized > Weblog: Visual Feedback: Why Modes Kill
What our eyes are looking at and what our attention is on are two different things. It's the second that matters.

7 Dec 2006

Visual Feedback: Why Modes Kill

UI Design Fundamentals

Let me set the scene. It’s the comedy film “Airplane“. The flight crew is violently ill and Striker, a shell-shocked, former pilot, is forced to land a jet full of passengers in dire need of medical attention. The air is heavy with fog, rain pounds on the cockpit windows. Over the static-filled radio comes the voice of ground control desperately talking Striker through the landing.

Ground Control: The radio is off. Our one hope is to build this man up, I’ve got to give him all the confidence I can. Turns radio on. Striker… Have you ever flown a multi-engine plane before?

Striker: No, never.

Ground Control: Thinking the radio is off. #@&*#! This is a waste of time… there’s no way he can land that plane. Striker starts to tremble.

How did Ground Control make this mistake? The answer is simple. Mode error.

Don Norman defines mode errors as occurring when a user misclassifies a situation resulting in actions which are appropriate for the conception of the situation but inappropriate for the true situation. In Airplane, the action could not have been more inappropriate for the situation.

Mode errors are a ubiquitous bane: they cause us to lose our work and to kill hundreds of people. Despite the millions of dollars spent on the design of airplane cockpits, between 1988 and 1996 five fatal airline crashes were the direct result of mode errors. Many more crashes were probably indirectly caused by mode errors. What’s the lesson? If a system contains modes, people will make mode errors; if we design systems that are not humane—responsive to human needs and considerate of human frailties, we can be guaranteed that people will make mistakes with sometimes cataclysmic consequences.

Luckily, there are methods for combating mode errors, generally by employing sensory feedback to indicate the current system mode to the user. Let’s return to the radio in Airplane. It could have used visual feedback to indicate mode by employing a light that glows during transmission. Or, the radio could have used kinesthetic feedback by allowing the radio to transmit only while a button was pushed and held. With either of these two methods, ground control might have avoided the costly (albeit humorous) blunder.

Both methods are clearly better than nothing, but which one is more effective at reducing mode error? The answer is kinesthetic feedback. Why? Because kinesthetic feedback is much harder to ignore than visual feedback and, perhaps more importantly, kinesthetic feedback is actively maintained. For critical modes, visual feedback cannot be relied upon to keep the user from making errors—what the user’s eyes are looking at and what their attention is on are two different things. This is a lesson that many designers have yet to truly learn, at the cost of hundreds of lives.

It’s not always possible to use kinesthetic feedback, however, so when I’m designing interfaces and facing modes, I use the following three questions to guide my solutions.

1. Is the mode determinable before the user takes action?
2. Is it hard for the user to ignore feedback about the mode?
3. Does the user actively maintain the mode?

Is the mode determinable?

System designers are in the habit of hiding system state from the user in the name of “clean design”. The Start Menu, for instance, has a dizzying array of preferences which change the start menu’s behavior. Did you know that drag-and-drop support in the Start Menu is a preference? It’s a mode that you can’t discover until you actually try dragging things around, and even then you wouldn’t know why it had mysteriously stopped working. But, non-determinable modes aren’t confined to computers, the living room is a great example.

The state of the lights in a room constitute a mode: on and off. If your living room is like mine, there are multiple light switches that independently control the light. This way the up/down position of any particular switch is not an indication of whether the lights are on or off.

As is inevitable when I’m expecting guests, a light burns out. By the time I get around to replacing it, the state of the light is indeterminate and I don’t know which way to flick the switch to guarantee that the light is off. Changing the light bulb is a game of Russian Roulette with a 50% chance of losing. Not so good. This uncertainty is the result of a mode that cannot be determined before action is taken.

Analogous situations frequently occur when using computers. There’s a nice example from Microsoft Excel here.

Traditional phones also have no mode indication: If you get distracted half-way through dialing you’ll generally hang-up and start from the beginning because there is no way to figure out where you were. Why is dialing a mode? Because the first 6 key presses mean “queue this number” and the 7th key press means “queue the number and dial”. Interestingly enough, the behavior of resetting the system to a known state (even if it’s not necessary) is quite common. Those readers who are familiar with the profoundly-modal text editor vi know that trained users hit escape before issuing any command, just to make sure that the editor is in a sane state before continuing. Escape just becomes part of the command gesture—a habit. Trained users escape their commands even though there is visual feedback for the current mode. Why? Because it’s less of a mental burden than diagnosing the current state of the system and the feedback is easy to ignore. This brings us to question number two:

Is it hard for the user to ignore feedback about the mode?

I usually discover that I’m in the Caps Lock mode by noticing that MY LETTERS ARE ALL CAPS. Then, if Humanized’s upcoming product Enso isn’t installed, I have to go back and retype the whole thing.

Caps Lock is a prime example of a mode that gives barely-worthwhile feedback: a small light on the keyboard glows when Caps Lock is engaged. It’s the analog of the hypothetical visual feedback for the radio in Airplane. Of course, the keyboard is exactly where no touch-typist ever looks and, to make matters worse, the Caps Lock light can be unhelpfully unlabeled:

Caps Lock Light

Can you figure out whether Caps Lock is on?

The Caps Lock feedback is so easy to ignore that it just doesn’t work. As often happens in computing, we get band-aid fixes instead of true fixes. For instance, Microsoft added a nicely non-modal message to the Windows login screen that reminds the user of Caps Lock’s state. Even that doesn’t work to prevent mode errors, as I always type my password before noticing the warning.

Windows Login Caps Lock Indication

Some nice visual feedback for Caps Lock that unfortunately doesn’t work reliably to prevent mode errors.

More generally, visual warnings, even when placed in prominent view, are not guaranteed to work: visual feedback is ignorable and overused. But, you don’t have to take my word for it.

Renowned HCI researcher Bill Buxton investigated the mode error reduction ability of visual versus kinesthetic feedback back in 1992. Buxton experimented on 12 novice and 12 expert users whom he asked to perform simple editing tasks in a pared-down vi editor. He found that “feedback using the visual channel is generally avoidable: one can easily choose not to monitor visual information. Kinesthetic and audio feedback, however, are more demanding and inescapable by their very nature.” He continues, “Information delivered through the visual channel is simply not as salient as information delivered kinesthetically… even though the visual cues in the first experiment involved changing the entire screen area pink. This has important implications for systems which rely on more subtle visual cues such as changing the shape of the cursor or the color of the menu bar.”

Go ahead and look at his data. It’s persuasive. It shows that people have the amazing ability to ignore visual cues, even when they are looking directly at them. This illustrates the important distinction between what your eyes are looking at and what your attention is on. The two are not the same, and it’s your attention that matters. And if your attention is on the mode of the system, you won’t make mode errors–but then your attention won’t be on the work you are trying to get done. Visual indication is one method for trying to bring your attention to the system state, but as Buxton showed it has a decent chance of failure. Using sound-based feedback is more likely to succeed, because while you can avoid looking at a visual indication, you cannot avoid listening to an audio indication (as every player who has been forced to listen to Baby Mario’s cry in Yoshi’s Island 2 knows). But even that is inferior to kinesthetic feedback. Which brings us to the last question:

3. Does the user actively maintain the mode?

When the user actively maintains the mode it is known as a quasimode. When the system maintains the mode it is a proper mode. Caps Lock is a mode; Shift is a quasimode. Drag-and-drop employs a quasimode; Click-move-click employs a standard mode. As Buxton showed, quasimodes not only drastically decrease the number of mode errors users make but they also decrease the cognitive burden that is associated with keeping the system state in mind: “Even though many of the expert subjects commented that they were used to keeping track of the mode ‘in their head’, feedback of both kinds significantly reduced their mode errors.” That is, even though the experts thought that they could keep both their work and the system mode as their locus of attention, they could not. This reaffirms the notion that people are fundamentally bad at multitasking, when the multiple tasks require active thought. Interestingly, “experts made more mode errors on average than novices.” In other words, experts were worse than novices at actually keeping the system state in mind.

How does one measure cognitive burden? By distracting a user with an ancillary task and, once that task is finished, measure how long it takes for them to resume the main task. This gives a measure of the effort required to assess the state of the system and task. “Any difference in resume time among conditions must reflect a difference in cognitive operations since there [is] no [difference] in the physical actions required to return to the editing task after servicing the distractor.” Buxton found that not only does the quasimodal method lead to “significantly faster resume times” but also that the result was independent of skill: everyone’s resume time benefited from kinesthetic feedback and not at all from visual feedback. This corroborates the claim that quasimodes reduce cognitive burden, and that they do it by keeping your locus of attention – what you’re thinking about – on your work and not the modal state of the system. Using kinesthetic feedback helps treat your train of thought as sacred.


Actively maintained modes—quasimodes—are a versatile tool. While feedback of any kind is helpful in reducing mode errors, kinesthetic feedback dominates over the congested visual feedback channel. Not only is kinesthetic feedback more potent and less cognitively effortful, but it’s also more enjoyable: one has to look no further than exquisite key-edges of real keyboards versus the doubt-inducing flatness of touch-screen keyboards for verification. So, the next time you battle with design, try the power of kinesthetic feedback by adding quasimodes to your arsenal. Enso, uses a quasimode at its very core. The quasimode keeps your mind on your work instead of on the system state.

by Aza Raskin


41 Voices Add yours below.

This idea has a VERY hard time gaining traction. It is the single most difficult point to get across in interface discussions. So many people embrace modes because they are a simple way for the developer to do much more with the limited resources and doesn’t require fundamentally changing underlying designs. Even people from the Raskin Center have a hard time.

To gain the upper hand I adopt a strategy of loving “modes.” Start off by listing their good attributes, they are very powerful, easy to implement, taste better than cheesecake, whatever, but traditional modes cause a lot of errors, and the solution is to upgrade to a Quasi Mode.

It’s a bait and switch. Don’t start of the conversation with an oppositional stance. Pull the developer in close to you and expound upon how great their underlying work is, keeping the person in a receptive mood and you their fan. Now in the position as a friend you can disclose to them why modes (or whatever other interface problems they have) hurt people and how to make what they are already doing (which we may actually think is total crap) “better.”

Good work with this Aza, modes cause errors, and can damage our work, and ourselves. Kinesthetic feedback is a good solution to changing modes.

You should start your presentation with the conclusion, and state the solution in the title.

PS. Your dad is amazing, reading his book opened my eyes. It’s great to see your continuing the vision! Keep up the good work. The vision will soon become a reality, it’s unavoidable.

Caps Lock and other modes come from situations where a quasimode burdens the user too much and interrupts their train of thought. Caps Lock comes from times where you had to write in all caps. While those times are far fewer today, you can imagine screenwriters, military personnel, or professional trolls wanting to use Caps Lock than hold down the Shift key the whole time (although some software I’ve used eliminates the need for Caps Lock altogether).

Sometimes modes are good - in fact, some things would be useless without them. Video cameras, tape players, and VCRs all use modes because it’s the best tool for the job - you have play mode and record mode. I believe it’s best to take a clue from those devices and see what we have to do when we use a mode:

- When something is recording, you have a big red light on there that not only tells you what mode you are in, but also informs you that you are doing something potentially dangerous with your data. All other modes are self-evident (play, fast-forward, etc.). So, like you said in the article, if you have a mode, you should have a pretty good reason for it, and have a nice obvious sign.

- Record buttons are also protected in many cases. You have to do a lot of work to switch modes. This is a form of kinesthetic feedback I don’t think you mentioned in your article. I know on my keyboard at work, the insert key is right next to the delete key - sending my editor into a really annoying mode. But at home, my keyboard has the useless insert button moved to a remote part of the keyboard. It’s not a complete fix, but it’s definitely harder to get the keyboard into that mode if I don’t want to. Similarly, my Caps Lock at work requires that I be holding the Shift key down to turn it on.

So I guess I would have to say that modes aren’t bad, it’s just that poorly implemented modes somehow became “traditional”. Modes are good when the context switch of the user’s through costs less than the effort required to maintain a quasimode.

Great post. It encapsulates what I’ve always said about “modes.” The problem has never been about the mode switch, it’s always been about realizing what mode you’re in that’s the problem. I’ll be referring to this post quite a bit in the future.

(click name for a recent mode problem I wrote about)

Great article. My current ‘mode’ pet peeve are electronic devices that have a little light to tell you the device is off. So, when the light is on, the unit is off. Silly.

This sort of works when the unit has another display of info that would be lit up when the device is on (like the track and time listing on a CD player), but why to I need a light to indicate ‘off’? Doesn’t the lack of a light do that?

On one device I own (a cheap DVD player), there is no other screen on the control panel. Additionally, the light is on constantly when the device is on and flashing when the device is off. The only way to determine which is which is to actually turn on the TV and tune to the line input and check.

If modes are appropriate (and on and off modes certainly are), not only should an indicator be present, it should clearly indicate which mode is currently in use.

Just curious - are you related to Jef Raskin - the book is liberally quoted, sometimes almost word-for-word…

Great article. I think peoples’ ability to miss visual cues is one of the most overlooked challenges in web design today.

I’m late to the party, but I’ll play devil’s advocate here.

The best case for modes is the continued popularity of the Vi editor among programmers. Each year, hundreds of thousands of programmers (I’d guess 1 in 10 of the programmers I’ve met) willingly choose Vi (or, more likely, Vim) over other editors. If modes are inherently evil, why would these wayward souls choose a modal editor?

I propose that Vim’s mode errors are infrequent and inconsequential enough that the beneficial side effects are enough to overcome the drawbacks.

Take the Buxton study cited above. It represents a worst case scenario. The subjects are tasked with making 75 edits (150 mode switches if you’re doing it the na?ve way*) interrupted once every 4.5 seconds and the expert subjects still only average 8 mode errors with no visual feedback and 4.5 mode errors with visual feedback (novices were lower on both counts but slower). The foot petal was clearly superior at around 2.5 mode errors — I want a foot petal for working with Vim– but the overall amount of time wasted was probably in the order of a couple seconds.

Given that people WILL make mode errors while using Vi, what’s the tradeoff? In exchange for the occasional mode error, you get an enormous number of mnemonic sentence-like commands for operating on text**.

For example, take the common (programming) task of replacing the contents of a string. In something like MS Visual Studio, you reach for the mouse, target the space between the opening quote and first character, click, find the ending quote, drag to that quote, switch your mouse hand back to the keyboard and type the replacement. Admittedly, everything up to typing the replacement text is chunked mentally as one task: select text, but you still have to perform all the subtasks and that does take you out of work flow temporarily. In vim, you hit {esc}, navigate to the string (t” or /”{enter}), get the cursor in the string (l), and replace to the closing quote (ct”). This takes fewer steps and is much, much faster.

By introducing modes, Vi reduces the complexity of the command structure and allows you to compose complex actions out of relatively simple pieces. The only other editor that provides a similar number of raw commands is Emacs. I haven’t seen formal usability studies done, but I expect the expert Emacs user maintains a much larger number of information chunks mapping key command sequences to actions. Simple things like replacing a word (^w, in the Unix keystroke convention) and replacing to the end of the current line (^k) or moving to the beginning (^a) or end (^e) of the current line are completely unrelated and have to be memorized by rote. The vi equivalents (cw/c$ and ^/$, this is a literal ^) maintain consistent commands and motions. I’ll admit that I don’t know the Emacs equivalent for vim’s w movement or how to replace to the beginning of the line (it’s easily guessable if you know the vim grammar). Other word processors don’t deal with having large numbers of commands and force the user to input a large number of simple commands, which trades off speed/power for a shallow learning curve.

Modes are bad but they’re only as bad as their cost. In the case of a tradeoff between a foot pedal (holding down caps lock with my pinkie does NOT count) quasi-mode versus standard Vim, I’ll take the foot pedal . If the choice is between the modes and consistency of Vi versus the M-x quasi-mode mess of keystrokes that is Emacs, I’ll take the Vi. Others choose Emacs and I support that choice. I do believe that all professional text wranglers should know one or the other, the investment in learning is worth the lifetime of faster edits.

This doesn’t just apply to text editors, I’ll note that Mac OS X menus follow the Windows 9X pattern and are modal on click and quasi-modal on drag instead of the exclusively quasi-modal behavior of Mac Classic. The point is that if the cost of mode errors has to be balanced with the benefits of modal behavior (in this case, supporting user expectations).

Incidentally, I can make an argument for why Enso should be modal rather than quasi-modal, but I’d rather do that in a place where it can be discussed.

Thanks for reading.

* Incidentally, the smart way to do the task in vim is “{esc}:%s/\[A-Z]\+\z/errorerror/g{enter}” or or in Vi “{esc}/[A-Z]\+{enter}eaerrorerror{esc}” and then “ne.” repeatedly to jump to the next match and repeat the insert.

** The grammar for Vi commands is count|command|motion (I’m not using count here). Notice the movement command (t”, move to ) is the same for moving and for the movement of the replace text (c) command. I can swap out the command (dt”, delete to , this clears the string) or swap out the motion (cw, replace word).

Welcome Karl, and thanks for your detailed analysis.

I use Emacs in my daily work, but I also have familiarity with vi; in fact, I learned vi first, but I switched to Emacs precisely because of vi’s modality. I’m not going to argue that Emacs is superior, because I recognize that it’s a matter of personal preference and I have no interest in reopening The Internet’s Oldest Flame War ;-) But more importantly, the world of software design possibilities is not limited to just Emacs and vi. Can we perhaps find a way to combine the strengths of both and the weaknesses of neither?

It seems to me that the heart of your argument is this: “By introducing modes, Vi reduces the complexity of the command structure and allows you to compose complex actions out of relatively simple pieces.” You’re not just saying that vi’s modality is tolerable, you’re saying that modality makes a positive contribution to the design of the interface by allowing commands to be simpler (i.e. commands can be single keystrokes), and a library of composable single-keystroke commands give a highly information-theoretic-efficient way to accomplish a complex task with very few keystrokes. Is that right? Let me know if I am misinterpreting you.

It seems like other benefits of vi that you mention, such as its logical consistency across different classes of commands, could be replicated in a non-modal or quasimodal environment, given sufficient thought. But the question is, can we replicate the composable simplicity of single-keystroke commands in a non-modal or quasimodal environment? And if we can, would we want to, if we had to sacrifice other virtues? These are interesting questions, and I don’t neccessarily have a quick answer for them. But I can start to sketch out a solution I prefer.

A flaw shared by both vi and Emacs is that they require the user to memorize a largely arbitrary mapping of keys to commands. This is what typically scares of newcomers and makes people think of Emacs and vi as programs geared towards “experts”. But every expert was a beginner once, and the learning process of both Emacs and vi is more painful than it needs to be.

An approach which associates words, instead of single keystrokes, with commands has the benefit of being easier to learn and to remember, both for beginners and experts, because it taps into your brain’s preexisting expertise in your native language. It also gives a much larger command “space”, since there can be a command for each word in the language rather than one for each key on the keyboard. This neatly avoids the need for “compound commands” like Emacs’ infamously cryptic “Ctrl-X-Ctrl-C” — something which Emacs had to introduce to get around the limitation of the number of keys on the keyboard.

Word-based commands may never approach the information-theoretic efficiency of atomic, single-keystroke commands like those in vi, but there are ways to reduce the number of keystrokes needed until the efficiency is almost as good. And a sufficiently well-designed word-command system could give users the ability to combine commands using an already-familiar natural-language syntax rather than the cryptic sort of “{esc}/[A-Z]\+{enter}eaerrorerror{esc}” combinations found in vi.

Finally, word-based commands can be made either modal or quasimodal; we prefer the latter approach for the reasons we’ve already explained.

While there are trade-offs and counterarguments to be made, I think this ought to explain why our design for Enso is neither an Emacs clone nor a vi clone, but rather a quasimodal word-based system.

Thanks for your considered response. I’ll address word-based interfaces via the scenic route so you can understand where I’m coming from.

Both Vi and Emacs are products of the environment in which they were written, which is notably different than our current desktop environment. Vi is the evolution of a printed line terminal editor and Emacs was designed for a keyboard with more modifiers, so the C-x/M-x is a hack to make it work on standard keyboards Super-c (instead of C-x C-c, I don’t know if C-x -> Super is actually the mapping or not) is a reasonable mnemonic if Super is the buffer-level modifier key.

I believe these programs persist because they are extremely well designed for their target audience. I’ve run into a number of articles authored by the HCI community making derisive comments about the design of these programs and for the general population, they are terrible.

Programmers, however, are a very specialized user group. The programmer archetype is a mental gymnasts, able to retain and recall large amounts of abstract information because that’s basically what programming is. The programmer spends his professional life translating these ideas to something executable using their text editor. During this process, the programmer will perform the same tasks tens or hundreds of thousands of times.

Given the frequency of use and an expected lifetime of use, speed and expressiveness assert themselves as higher priorities, the large number of symbols becomes less troublesome (abstract thinking ability of target user) and the longer learning curve becomes less of a barrier because it’s expected that you’ll earn the time back later. Other systems written during that period (e.g. Englebart’s NLS) exhibit similar qualities because they’re written for the same audience of expert users who are also programmers.

Historic considerations aside, I believe (but cannot prove) the information-theoretic efficiency of these programs has a much larger impact than the savings in time and typing would indicate. The reason is that I don’t work at a constant rate. I’m a passable programmer most days, but when I’m in the zone I implement algorithms at about 5 times my normal rate. I say implement algorithms because I type at about the same rate, but I get a lot more done. The best explanation I’ve found is Csikszentmihalyi’s concept of Flow, which is basically the mental state of being fully engaged in the activity you’re doing. I only get into Flow for 10-20% of my work week, but the majority of useful code I write gets written during these periods. As such, I value my Flow time tremendously and am willing to invest a considerable amount of time into optimizing my speed during this state. I’ve discussed this with a number of programmers and almost all acknowledge a “in the zone” productive state but say that they aren’t as dependent on it as I am. By allowing people to get more done when they’re in Flow, dense (in the information-theory sense) apps pay off in multiples and the happiness associated with Flow is transferred/associated, which explains why Vi and Emacs fans are so fanatic. I have an extensive set of theories and ideas based on the concept of Flow optimization as it relates to interfaces. Unfortunately, I do not know of any HCI research that backs me up, so I’ll leave it at that.

Word-based interfaces have a different set of advantages and constraints, but the two systems share a number of flaws. Both require the user to operate through a limited command set. Natural language is often context-dependent and has tendencies toward both ambiguity and imprecision, not to mention it’s not fully grokkable by today’s computer systems (and may not ever be). Once you toss out full natural language, the user has to remember what your limited command set is and express their desires using your syntax. A word based interface is constrained to a subset much smaller than the full natural language. When you get too many words with limited context and natural language processing capability, you run into the tyranny of choice/regret theory/ how many spellchecks class of problems.

So the challenge is to design a word-based system (for ease of learning) that assists users in learning the command set while requiring the minimal amount of input (for speed). It’s not perfect, but my vote for the best app exploring this space is Quicksilver. It is word-based, uses the first (subject) pane to contextually filter the available verbs (which are then displayed in a list), and does match-ordering based on some sort of machine-learning algorithm. In practice, I usually type less than 2-3 characters per command invocation to do stuff like mark the current web page for reviewing during the weekend while my non-geek sister happily uses it as an app/document launcher. It’d be well worth your while to fully investigate QS, especially a full-on geek setup (start from the user’s guide and lifehacker), to see how they solve the same problems Enso does. The promise of Enso being a (hopefully better) Windows QS is why I’m here. I’m happy to compare the two but that’d probably be better in a feedback email rather than a blog comment.

Great article. Enso’s definitely an interesting product.

I’d love to see some research on RSI-related problems, comparing quasimode with mode. Using mode, say Caps, enter command, Enter, is two quick movements, while quasimode is one prolonged period of static work (with a finger not really designed for that kind of work). I suffer from some kind of light RSI, and one big problem for me these days is the quasimode usage of shift with my right pinkie. However, I also use my right pinkie for hammering Enter and the arrow keys (I’m a developer), so it’s really quite hard to say what the cause of my problems is.

Brian Vallelunga
January 28th, 2007 10:41 pm

I’d have to say that the task of launching a program is something where the quasimode used in Enzo is detrimental to the program’s usability. Many other program launchers require a quick two-key combination to open them. Then all of the fingers are free to type the command and Enter launches it. Simple. Enzo, is more stressful on the fingers and requires the user to type in “open” before the program name.

In fact, I would say the act of launching a program is one of suck brevity that modes don’t hurt the user interaction at all. It’s not as if I’m going to leave the launcher window open for an hour and then forget about it. Even if I did, the consequences and irritation would be trivial. For the launcher scenario, I’d like to see two keys quickly open the launcher, and it would stay open. If you’re worried about modes, have it automatically close after five seconds of no-input. Can I call this a quasi-quasimode? Anyway, there are enough launchers out there to please everybody, but I’d require the option in Enzo before I’d consider buying it. Otherwise, I think the program is really well designed.

Just read the above set of comments about Vi and Emacs. I agree with you on the quasimodal thing, but I have to say I much prefer Vi’s single-keystroke commands to a word-based solution, simply because of four commands that I use extensively in my programming work (given with possible spelled out equivalents):

* dd - Delete Line
* dtx - Delete till character x
* dfx - Delete through character x
* And the quick start editing at (end of line, next line, previous line) commands that Vi uses.

The very reason that I use Vi is because it allows me to do this insanely fast and without leaving the keyboard. Word-based commands are okay, and have low mnemonic load, but the longer I have to spend typing in an action, the more my attention is drawn away from my task and towards the action itself. Three keystrokes allow me to do what I want without thinking, but ten (what I would think would be the absolute minimum for word-based commands, even with autocomplete) make me lose my train of thought, that ever important thing.

My personal favorite solution would be a quasimodal system that allowed both spelled out and abbreviated commands, and showed the abbreviated command while I typed in the spelled-out one.

Just my $0.02.

Comment from the guy who was two months late to the game:

Most of the posts here look like they’re discussing a theoretical quasimodal system vs. typical modal systems. Having downloaded and tried Enso, here’s another viewpoint.

You may have a good point about the quasimodal system, but it’s implementation via Enso using the Caps Lock key is problematic. By forcing users to hold down a key with their left pinky finger you make typing difficult for commands that use keys that finger would normally type–e.g.,
“open excel

I am forced to switch fingers to type a key like “x”. Being a fairly good touch typist, this context change of fingers slows me down. Also, holding down Caps Lock while stretching my index finger to type “g” or “b” feels like it would lead to finger strain rather quickly.

In the Enso preferences you “strongly recommend” sticking with quasimodal input. I strongly recommend that you remind your users of the ergonomic issues with repetitively stretching your fingers like that.

Perhaps a foot pedal would be ideal.

Aza and all you are incredible! It’s like the Canon Cat on steroids. Is there a way to Learn keystrokes like on the Cat? I used and use even today the like of “Learn ‘Leap to ‘one’, change to ‘two’ ,caplock, make bold, and release.

Once you do this I wanna buy stock! I know your Dad is proud— and happy.

Adam Trotter
July 9th, 2007 2:01 am

I just discovered Enso via a post in Edward Tufte’s forum, and a link to a lecture at Google campus.

I once was a vi user (don’t have much call for it anymore), and I appreciate the discussion of the relative advantages of modal and quasi-modal interface…I had my fair share of vi accidents, and quickly learned to always hit ESC after any pause.

But my impression is that Enso is (ultimately) aimed at the average PC user, and that’s the spirit in which I’ve been trying it out. And I have to agree with those who complained of the awkwardness of touch-typing while holding down the Caps Lock key. I’m sure I’ll get better at it, but it’s extremely awkward at times, especially if you need to type a Shifted character. And it’s annoying to type a longer command only to lose it by a slip of the left pinkie.

I like the suggestion in a previous post of being able to switch to sticky mode by double-clicking the caps lock key. In the meantime, I will have great fun exploring this, but in sticky-key mode only. I’m only so acrobatic with one finger tied down.

Fantastic vision, though!

Adam Trotter
July 9th, 2007 2:27 am

Also, I think the average computer user has become used to hitting ‘Enter’ to execute a command (as in Google search), and it doesn’t add much extra time to executing a command via the CLI. Certainly, it adds less time than inadvertently releasing the fingers from the keyboard, only to find that you’ve accidentally told your computer to open the wrong application, then having then to wait for that application to load, and then having to close it. I don’t doubt that there are advantages to quasi-modal input…but the difference between accidental use of the Shift key and accidentally opening Microsoft Word is significant, and may be a hurdle for new, intuitive users.

That said, I’d be very curious to know how very young, new computer users respond to this. Perhaps what’s frustrating for me will be entirely intuitive, and more efficient for them. And now, I’m going to spell-check this post.


I’m finding it difficult to type commands while my pinkey is holding the caps key. I’m pretty good at typing and this feels un-nattural.
You make an interesting point about modes and state a problem that you may have resolved without realizing?

“The Caps Lock feedback is so easy to ignore that it just doesn’t work.”

With Enso running, Caps Lock activates the command bar at the top. You can’t miss that. You’ve solved the problem.

I set my preferences to be modal and I’m sticking with it. No, my brain will not be occupied trying to remember the mode - since it’s right in front of me, in the top left corner of the keyboard.

You’re doing great work - keep it up. I look forward to more work changing applications from Humanized.


I’m finding it difficult to type commands while my pinkey is holding the caps key. I’m pretty good at typing and this feels un-nattural.
You make an interesting point about modes and state a problem that you may have resolved without realizing?

“The Caps Lock feedback is so easy to ignore that it just doesn’t work.”

With Enso running, Caps Lock activates the command bar at the top. You can’t miss that. You’ve solved the problem.

I set my preferences to be modal and I’m sticking with it. No, my brain will not be occupied trying to remember the mode - since it’s right in front of me, in the top left corner of the keyboard.

You’re doing great work - keep it up. I look forward to more work changing applications from Humanized.


First of all, GREAT IDEA :)
I will definetely download the trial of luncher. Words one is also interesting, video is just selfexplanatory.
I preffer VI(not VIM) due it allows you work on slow connection. The short one letter commands were invented for purpose of not waiting for terminal response, that i wrote all correct. You know when you are waiting for visual feedback for 1sec it is slow. So the /dd is better than any other options /dw is much better than any forcing the “press something” “write something” (i wrote “Write” due in VI you just “stroke” :) the keys representing the words or actions. The movement and all other modal stuff is from the times, when the keyboards had less keys. Good and simple example is quit and save or quit&notsave commands :)) ( :wq vs :q! it needs any more comments ?
My second note to all excelent posts ;) (thanks i spent two episodes of my favorite serial to read almost all)… okey the note is:
Dvorak vs Qwerty :) if someone is thinking about the ergonomic and fast typing, i don’t understand why there are only qwerty (okey qwertz and weird french one …but take is as one modified)
QWERTY is just old , from the times of the mechanic writer machines, so it is desing not to stuck the sticks during writing the words, not to make it easier and faster for typer. DVORAK is more better, but never massively implemented. Try learn old dog new tricks. I used to qwerty and still do, so ;-)

Sorry no URLs, due i wrote it from scratch and if you used to google you wil find all necessary documents.
PS: sorry me for my english, hope you will understand it.

Very nicely argued. But in the case of Enso’s use of capslock for a quasimode kinesthetic mechanism, I have to give up a lot, namely my entire left hand. Sorry, i just don’t see the point in training myself to love one-handed typing while most of my left hand goes to waste. It will never — for me, anyway — beat two-handed typing.

Besides, (a) the huge transparent message is a pretty strong mode marker, even if “only” visual; (b) I have such a strong habit of using to change modes, it’s really fine; besides,t he key is nice and large.

Another way to look at it is that, even if Enso’s modal mechanism is superior, it clashes with the modal mechanism now in nearly universal use, and that imposes a large mental and physical burden. So, if you could get everybody to adopt Enso’s technique, maybe. But that’s like covering the world in leather rather than wearing shoes.

Bottom line — your abstract argument is well and good. But I’d like to see some a actual study specifically on Enso’s two (or three) choices for modal mechanisms. My money says traditional modality will win in this case.

Actually, Enso’s sticky mode is a pretty nice innovation as it is. Hijacking a problematic key to initiate a mode with a quick hit is really brilliant. Also, why not give me the option of the key — now there’s a problematic key if I ever saw one!


Leonardo Ramirez
November 28th, 2007 3:54 pm

I think the whole reflection on modes and so on is really interesting from a theoretical stand point. But you don’t seem to see a quite obvious problem with enso which is the acknowledge of a command by releasing a key. That is, by leaving the quasimode, you are issuing a command. That’s sounds for me quite counter intuitive. That, and the fact that after a while typing with a finger on Caps lock, my left hand started to hurt really bad.

Enso is amazing. Great work guys
Only thing I’d like to comment about (like maybe a couple of people or more said earlier here) is the Caps Lock issue.
I totally agree with this topic, and I’d go with quasimodal controls at anything, except for Enso’s case.
Holding down Caps Lock with pinky finger is hard, straining, and difficult to get used to. This will force users to type keys assigned to pinky using ring instead (keys Q, A, Z) which will cause typing errors and wasted time during typing the commands which are supposed to save time in the first place.
Moreover, the sticky mode isn’t a big problem itself, maybe some peole would prefer pressing Caps Lock instead of Enter to do commands, but it’s nice that way, the habit of pressing Enter for “Do it” in CMD, Run command, and Address Bars.
So, going with the quasimodal setting, the list of optional keys are totally unworthy, Caps Lock is easily the best, but it isn’t good itself, so we need another key to be pressed with the unused finger, the left thumb. What key can be holded using left thumb?
Left Alt? not comfortable
Space? not possible!
better ideas?

update: I have written the above post after spending hours playing with Enso and reading about it. I missed two things:
1. the Enso 2.0 Prototype which eliminates my complains
2. the ability of sticking using Alt key (this piece of info wasn’t on the tuturial video, or on any page other than the FAQ page)

Guys: do you think that Colemak keyboard layout is more humane than QWERTY?
also, Colemak replaces Caps Lock with Backspace. Now what??

“Quasimode” — not to be confused with “quasimodo”… sorry, couldn’t resist!

Antun Ivankovi?
May 11th, 2008 12:41 pm

This is a good software.

I own both macs and pcs.

On my macs I run an enso-like program called Quicksilver. It’s modal. I far, far prefer it to Enso’s quasi-modal approach for reasons that have been nicely outlined above.

Enso’s caps-lock quasimode approach has stopped me using it. Perhaps that puts me in the minority, but perhaps not.

Frankly, I find quasi-mode to be awkward and counterintuitive. Theoretically humane or not, I don’t like it.

Justin Barrett
October 5th, 2008 3:05 pm

I’ll echo the comments above re: my dislike of the “hold Caps Lock while you try to type normally” methodology. I’m also coming to Enso after having used Quicksilver on my Mac laptop, and greatly prefer the Quicksilver method. While quasimodes may work well for some things, I don’t feel they’re the best choice for a tool like Enso. CTRL and SHIFT usage can’t be evenly compared with what’s required in Enso. Most software that uses CTRL shortcuts don’t require you to hold CTRL while typing several other keys in succession. Similarly, most people don’t hold SHIFT while typing entire words or phrases.

I also don’t agree with the strong recommendation to just keep at it and get used to it. While there is some adjustment involved when becoming accustomed to any new form of working, I don’t feel that persistence through and beyond the adjustment period leads to beneficial results in every case, and I feel that the hold-Caps-Lock situation is one of those cases where long-term use could be detrimental, as others have stated.

Sorry guys. I actually use the caps lock key. If I were willing to try to reverse years of touch typing training on the universal keyboard interface just to use this software, it would break me for all computers on which this software were not installed. I really really want to like this software, but I can’t give up my left pinkie for it. I tried once before but couldn’t get there.

I’m a big fan of the windows “WINKEY-R” command. If you have python or cygwin installed, and a couple of batch files, you’re just as capable as this software without giving up any keys or retraining yourself. Much more efficient.

I am unfortunately going to uninstall this. Despite the speedy interface and great idea, the unforgiving drive for design purity destroy this implementation.

I finally decided to install a couple of launchers for a test run: I started with Launchy, because it seemed basic, reliable and functional, along with Enso, because Enso’s fan base was so enthusiastic and because the theory was intriguing.

After a week of trying both, I’m uninstalling Enso. The discussion on UI nuances is fascinating and worth bookmarking, and I think Enso is a great experiment, but it’s just too slow and inconvenient to hold down capslock. I was aware of the advice that this would improve with enough use, and it did (slightly), but I did not get the sense it would ever improve enough to make it my first choice to use, and the RSI complaints listed in the comments here were what convinced me to abandon the effort.

When I use Launchy, I press two keys and the un-ignorable visual indicator is there in the middle of my screen, until I hit enter to launch the command or escape to cancel it (which is faster, easier, and more familiar than typing a handful of random characters to cancel the command, as Enso requires).

About modes and user error: there could be confusion in the overly-congested visual channel if Launchy was popping up unexpectedly, so that I had to parse the screen contents to figure out what was going on. But it doesn’t work that way; I press two keys, and the window pops up exactly when and where I expect it, with no uncertainty about whether or not it’s active. It does not remain open while I go about other activities; it stays open only long enough for me to type a command.

Everything else is tuned out for me visually at that moment–it’s not like I’m scanning a 747 dashboard for many dozens of signals at once. The visual congestion argument for Enso seems off target. I loved the Airplane! reference, but there is no danger that I’m going to accidentally and unwittingly leave the Launchy window open while I’m typing an email and have my input misdirected there. I’m unlikely to hit CTL-Space by mistake, and if I did, I would see the window and immediately cancel it.

Launchy’s method of operation–activate a modal window, type into the window, press enter or escape–is extraordinarily well-trained for most users after years of daily experience with modern PC interfaces. Like the QWERTY keyboard, it’s thereby more efficient in practice if not by design (although, I’m sorry to say, holding down capslock and giving up one of the ten keys you use for touch typing is simply not a good design–despite the compelling theoretical argument for quasimodes).

While I appreciate that Dvorak may have a vastly improved design (or was that an urban legend?), you can imagine the non-fun of learning to use Dvorak for one application while still using Qwerty with all the rest. Would it be particularly efficient when used in that manner, for perhaps 1% of your daily use of the machine?

But aside from the obstacle of learning a novel UI approach for this one limited area, the specific implementation is too great a hindrance. You have to plead with people to tolerate it from the start, which is usually a warning sign in interface design. And even when they do, RSI problems crop up. Describing the kinesthetic feedback of holding capslock down as more enjoyable, of all things, than a visual indicator, indicates that people are falling in love with the theory to the point of completely ignoring the less flattering reality.

That particular choice of how to implement quasimodality in Enso is what kills the product for me–it doesn’t damage the theory at all, but it nukes the application’s usability.

I’d love to see another more practical implementation of Enso’s UI approach, though offhand, there doesn’t seem to be any really good answer that stops short of adding new hardware for the purpose. I agree with the guys who said a foot pedal would work great for this, even if it’s hard to keep a straight face while picturing that option.

Hello! (I hope this comment is read by the humanized staff).

I installed Enso today in my work PC, it’s so humane!!! ;) But I found the quasi-modal feature uncomfortable, it conflicted with touch-typing. But I became happy for the Sticky option.

I agree that modes opens up it for errors. But I think that this “not even a single mode” is too extremist. If you think those huge font size isn’t enough, blur and darken the areas that does not belong to the “Enso mode”. After that the user has no excuses.

I used GnomeDo and now I’m using Kupfer (Gnome/Linux versions of QuickSilver) for many months. Despite it’s modality I love them, and I have no complains.

* I have just tried to type the text above holding my thumbs below the keyboard (as there were quasimodal keys). It felt stucking and unnatural.

** I’ve learned Dvorak a couple of months ago, and it feels much more natural than QWERTY. I’m not going back. (

Nature is modal. The universe is modal. This article is almost pointless.

“trained users hit escape before issuing any command, just to make sure that the editor is in a sane state before continuing. “

It appears the author has never actually seen anyone use Vi, much less a “trained user”. When using Vi you’re pretty much always in command mode, and drop into insert mode briefly to enter text, then you immediately pop back into command mode.

Interesting concepts. I think a lot of my designs use far more modes then they should. Probably why they are sometimes difficult to use.

@EricT, I think it’s a pointless distinction–either the user habitually hits ESC prior to a command, or after inserting a batch of characters. Either way, I turn off “ESC does something destructive” whenever I have the option (typically after finding out it does so by destroying something important.)

@Aza, edge-triggered sound actually can be ignored effectively. I used to have to occasionally change my “new IM” sound, and my alarm clock, or else I’d learn to ignore them. (Nowadays I don’t IM much, and I get enough sleep that I can wake up on the first ring.) Baby Mario is level-triggered and much harder to tune out.

Finally, even if someone is “looking at” a visual feedback indication (assuming it’s present–I have a wireless keyboard with no CapsLock indicator whatsoever), it may simply not capture their attention as important. This can be dangerous when driving.

This whole discussion on modes, what is a mode? Caps-lock on or off is a mode so you say, but what about holding the shift key? That puts me in caps mode as well. There is no light on the keyboard for shift.

I think the key is to not go overboard with the quantity of modes. Having a few _big_ modes in a programme/tool isn’t particularly bad.

Saying modes are bad as a blanket statement is just silly.

louis vuitton ???

Excellent goods from you, man. I’ve understand your stuff previous to and you’re just extremely wonderful.

I really like what you’ve acquired here, really like what you are stating and the way in which you say it. You make it enjoyable and you still take care of to keep it wise. I cant wait to read much more from you. This is really a tremendous site.


Please respect this public space




Live comment preview