Making Screen Readers Work More Effectively on the Web

Gregg Vanderheiden
Wendy Chisholm
Neal Ewers

This document consists of two sections: List of Critical Needs, and Problems and Possible Strategies.

This is a draft document. Your comments and suggestions are welcome.

The growing list of critical needs is comprised of urgently needed solutions for making the WEB accessible to persons using screen readers. At this time, this is currently just a list with no discussion of the problem or the possible strategies which might be implemented. For a more in-depth discussion of these and other WEB access problems which may be solved by screen reader strategies, please refer to Problems and Possible Strategies.

List of Critical Needs

The ability to search for hyper text links.
The ability to let the user know when a link has been crossed as the document is being read.
The ability to easily navigate between screen windows such as the URL location window and the main text window.
The ability to read each cell of a table along with both the row and column title of that cell.
The ability to notify the reader of the positioning of horizontal and vertical information such as the existence of different levels of white space in an outlined document and the occurrence of a new line, paragraph, or title.
The presence of a font alert which would alert the reader to a new font or a change in font size.
The presence of an attribute alert which would separately announce the existence of bold face, italics, etc.
The ability to navigate by grammatical portions of a document such as by title, paragraph, sentence, line, word, and character.
The capability for a full document search.
The existence of a word pronunciation dictionary.
The existence of a graphics dictionary.

Problems and Possible Strategies

This section of the document is currently divided into four sections:

Document Formatting Problems and Strategies
Document Navigation Problems and Strategies
Other Issues: Problems and Strategies
Conclusions

Section 1. Document Formatting Problems and Strategies

PROBLEM. The inability to easily obtain page layout information which often contains the real key to a full understanding of the text presented on the page.

Many of the problems encountered by persons using screen readers to read electronic text center around the inability to quickly determine the layout of the page or screen. Because screen readers can only present material one word at a time, using them to read is rather like looking through a soda straw. The straw is only large enough to see one character or one word at a time.

If the screen reader user continuously presses the right arrow key, thus moving the straw along the line one character at a time, he or she will be able to discover, for example, that there might be 14 blank spaces before the first word on the line being read. That might suggest that the particular line being read is centered. But as there is likely a hard carriage return immediately after the last word on the line, rather than a similar number of blank spaces (blank spaces at the right margin are not needed by the computer in order to center the line), there is no real easy way to know if the line is really centered or simply indented x spaces from the left margin.

Similarly, if the screen reader user continuously presses the down arrow key, thus moving the straw down one line at a time after the line has been read, he or she may ultimately find a line on which there is no text. This might suggest that a paragraph boundary has been reached, but it could just as easily be the next title, the beginning of a list of items, etc. And as the reader reads with the press of the down arrow key, he or she will know nothing about the existence of blank horizontal spaces at any point in any of the lines being read.

Few people would choose to read more than a few documents one character, one word, or one line at a time with no clue about the way these lines are positioned on the page.

A much quicker way for a screen reader user to have a document read is to ask the screen reader to read the entire document until a key is pressed to cause the text to stop being read. In this scenario, the reader is still using a soda straw to read one word at a time. The difference is that he or she can cover more material in a shorter time. Another difference, however, is that the reader doesn't even know in which direction the soda straw is pointing. The straw is crudely allowing the user to hear a string of words presented one after another with no knowledge of where a line ends, where a paragraph begins, where text is indented, etc. Whether a word is italicized, in bold print, in reverse video, in a different font size, etc., is just as impossible to know.

Thus, although the screen reader user can read the words on the screen, he or she can rarely totally understand or easily converse about the most important points in the material because these points may only be obvious from the way the words have been physically presented and placed on the page or screen.

POSSIBLE STRATEGIES

* AUDIBLE HORIZONTAL AND VERTICAL MARKERS. The ability of the computer to generate a tone of a particular pitch and duration or some other sound when certain patterns of horizontal and vertical spaces occur in the document. These tones or sounds could be user defined through screen reader macros. Some specific examples are:

Horizontal spaces. One or more horizontal spaces (x inches). This would enable the user to the attach a particular tone or sound to different levels of indention of say 5 spaces. Thus, the user might hear, for example, 1 tone of a selected frequency for level 1, the same tone plus an additional tone of a selected frequency for level two and so forth through all the levels needed to represent the current cursor location.

Because this would be tonal and not verbal, it would give the user the needed information without denigrating the verbal content of the material. Because the tones could be presented faster than the words used to present this information, the material being read could be covered more quickly.

Vertical spaces. One or more carriage returns in fixed font material and a given measurable space in a proportional font. In this manner the user could hear paragraph boundaries, of 2 returns (x inches) with one tone and title boundaries, of 3 returns, (x inches) with another. In case the actual space that separates a paragraph is the same as that which separates a block of text from the next title, title boundaries may have to be presented by looking at a combination of horizontal and vertical space, as well as attribute changes in the text. A tone for each carriage return or line end could also be toggled on and off thus giving the user the ability to distinguish between lines of regular length and shorter lines which may be items in a list.

* FONT ALERT. An ability to relate font size to vocal pitch or an audible tone. For example, the bigger the font, the lower the pitch of the voice or the lower the tone used to represent that font.

* ATTRIBUTE ALERT. Different voices or tones for particular attributes such as bold face, italics, etc. Unfortunately, the more we use tonal strategies for different kinds of information, the more one has to remember which tone relates to which. Verbal announcements, such as "begin bold face," are much more specific, but they greatly interfere with the content of the material. It may be easier to have the change spoken in words but on a very different pitch than is currently being used for the remainder of the text. It would be interesting, however, to research whether users could become familiar with different tonal patterns to represent different attribute changes. One possible tonal configuration follows:

The "Begin Tag" for bold face, italics, etc. could be represented by 3 rapid ascending tones. Specified tones could be prescribed which would allow significant pitch difference between the set of tones used for each attribute.
The "End Tag" for bold face, italics, etc. could be represented by 3 rapid descending tones. Once again, specified tones could be prescribed which would allow significant pitch difference between the set of tones used for each attribute.

Another option is to have a set tonal sequence for the beginning and ending tag of all attributes and to couple these tones with the verbal announcement of the specific attribute change which would be presented in a voice substantially different from that of the text voice. This bi-level alert may prove to be both simple and effective.

(Note. These sounds could be generated by a MIDI sound box which could be inserted between the computer and the speech synthesizer. Such a device could generate a variety of digital effects such as reverb, phase shift, delay, attack changes, envelope effects, etc. even though the synthesizer being used may not be capable of these alterations. In this way, selected portions of the text could be spoken in distinct ways which would give the user the feeling of bold face, italics, etc. without cluttering up the text with the verbal labels.)

Section 2. Document Navigation Problems and Strategies

PROBLEM. Difficulty navigating by grammatical portions of a document in Windows applications.

Most of the DOS screen readers support a wide variety of strategies which allow the reader to quickly move to and listen to information found in a number of text blocks. Next and previous character, word, line, sentence, paragraph, screen, and total document can be easily searched for and read. To date, the Windows 3.1 screen readers have made only some of these navigation strategies available. Most support next and previous character, word, line, and screen.

Few screen readers allow the reader to hear the entire document. In addition, the screen readers which support this full document reading capability do not always support indexing (the ability to press the stop command and have the cursor placed on the word which has just been spoken by the synthesizer). Without this ability, the user who desires to perform some action with regard to the last word, is forever looking around for that portion of the text which was just spoken.

None of the Windows screen readers that I am aware of allows the user to search the document for the next sentence, paragraph or title. As a result, unless the user knows the document well enough to search for a particular word in any of these blocks of text, he or she is forced to either listen to the entire document until the text in question is found or to attempt, with quick multiple presses of the down arrow key, to skim the text in search of the desired point. In this scenario, as soon as the screen reader user presses the down arrow key, the text currently being read is interrupted and replace with the text on the next line. As a result of this incomplete scan, the desired text is often missed altogether.

POSSIBLE STRATEGIES.

Make available navigation commands which would include the following:

Move the cursor to the next, current, and previous paragraph and read a user defined portion of that paragraph; the entire paragraph, one line, one sentence, etc. The option to have this portion of the text read without moving the cursor would also be available.
Move the cursor to the next, current, and previous title or to the next occurrence of a particular level of title and read the title. The option to have the next title read without moving the cursor would also be possible. With the advent of ICADD, HTML, etc. it would be possible to standardize the search for a particular format boundary by searching for the particular title level call for in the tagged document. If ICADD were to be used, the document would obviously have to be prepared using the proper tags. If HTML were used, the screen reader would need to be able to easily and quickly access the HTML source in order to carry out this search.
Move the cursor to and read the next, current, and previous text attribute specified by the user: bold, italics, change in font, change in font size, inverse video, change in text color, etc. The option to have the attribute in question read without moving the cursor would also be possible.
Move the cursor to the next, current, and previous link and read the link. The option to have the link read without moving the cursor would also be available.
Read the entire document from the point of the cursor until the stop command is issued. This includes the ability to support indexing or the ability to have the cursor stop on the last word read when the user presses the "stop" command.

(Note. One possible strategy for carrying out these navigation commands might be to use the number pad in conjunction with option keys. In one possible scenario, the user could locate specific portions of the text in a document much as he or she would on an audio CD player which indexes selections or song titles on a disk. In this way, the user might go to the 5th title in the document by pressing the modifier key and the 5 key. Or, the user could scan from one title or paragraph to another by pressing the option key and the X key which would be mapped to a "Next" key. It would also be possible to hold down a "Quick Scan" key and have the text be read in a very rapid manner much as it would be presented by holding down a similar key on a CD player or quickly winding through an audio tape which possesses an ability to hear the sound while in "fast forward" or "rewind.")

Section 3. Other Issues: Problems and Strategies

PROBLEM. Synthesizer inaccuracies. In certain cases, the speech synthesizer is unable to correctly pronounce certain words or symbols found in a document. Most DOS screen readers do give the user the ability to change the pronunciation of a word or other groups of characters. This utility was added to screen readers when speech synthesizers were not very good at pronouncing all words accurately. Some of these inaccuracies still exist. Many Windows screen readers, however, do not contain pronunciation dictionaries. If the user is reading in full document mode and hears a word which he or she does not understand, he or she must press the stop command and spend numerous seconds or even minutes trying to re-locate the word in question in order to have it spelled.

Many DOS users of screen readers which contain word pronunciation dictionaries have taken these dictionaries far beyond word pronunciation to such things as causing HTML or ICADD tags to be pronounced as the symbol they represent, converting roman numerals to arabic numbers, etc. The ability to do this in the Windows environment is crucial to a full and speedy understanding of the information being read.

POSSIBLE STRATEGY. Make available a word pronunciation dictionary.

PROBLEM. Trouble with the period pause rule supported by most screen readers. Currently, most screen readers pause briefly at a period in order to enhance grammatical clarity. There are, however, conditions where these pauses are not desirable.

In a table of contents where the period is used to separate the title from the page number. Because there is a pause after each period, it often takes several seconds for the screen reader to get from the title to the page number.
In a number containing one or more periods such as a dollar amount or a number designating a discreet document section such as 2.5.1. Some screen readers put a pause after each period thus making the reading of these numbers much longer than is desirable.

POSSIBLE STRATEGY. Make the necessary exceptions to the period pause rule.

PROBLEM. The inability to accurately read tabular material. The biggest block to reading tabular material results from the fact that many Windows screen readers do not allow the user to read columns of data. It is only possible to read each row of data as one would a line of text. Further more, when one reads a row of numeric entries, it is often difficult to know when one number stops and another begins. Depending on the screen reader's ability to deal with reading digits or full numbers, a row of 5 numbers such as 14 125 82 2254 695 might be read by the screen reader as 14125822254695.

Using a screen reader which allows the user to read by columns as well as by row is certainly a step in the right direction, but using the soda straw approach to reading a table, even when the straw bends in two directions, is extremely difficult. The sighted user can easily work his or her way through the data presented in this format because it is reasonably easy to always know what row or column one is currently in. But if the screen reader user, using the soda straw, is currently positioned in row 5 column 7 and forgets what the current title of either of the row or collum headings are, he or she has to move the straw or cursor to the particular row or column header in order to remind himself. Then, he or she has to quickly try to find the place which was just vacated in order to refresh his or her memory of the data in the table. This is a very time consuming, taxing task which many screen reader users just don't bother to attempt.

What makes it even more difficult is the fact that column reading, for most all Windows screen readers which support it, is not entirely free of navigation problems. If one is moving the cursor up a column, for example, it is often impossible to get to the column titles if there is a border which separates the columnar data from the titles. One has to exit column reading mode, skip over the border, then arrow across the column titles until one comes to the one they want, if they happen to remember what that one is. On the other hand, if there is no border which separates the data from the columnar titles, the user can easily abort the column reading mode by crossing over the boundary between the table and the text above. At that point, the current line of text is read and the cursor is likely positioned at the end of the line containing the text. Once again, the user has to re-orient himself or herself to the material. As many Windows screen readers have just recently made the reading of columnar data possible, more research is needed to determine the nature of the problems experienced in table mode.

POSSIBLE STRATEGIES.

Make column mode available and develop workable strategies to prevent the user from aborting the mode.
Develop a positional location system which is coupled with the column reading option.
- Holding down an Option key and pressing the up arrow would read up the column.
- Pressing the down arrow with the same option key held would read down the column.
- The right and left arrow keys would take the user across the current row, one cell at a time.
- Holding down 2 option keys and pressing the left arrow key would result in the left most cell in the current row being read without moving the cursor from its current location.
- Holding down the same 2 option keys and pressing the right arrow key would result in the right most cell in the current row being read without moving the cursor from its current location.
- Holding down the same 2 option keys and pressing the up or down arrow keys would result in the top most or bottom most cell in the current column being read without moving the cursor from its current location.

In this way, the user could easily link the data in the table titles to the data in the table itself.

(Note. This strategy will not be completely effective until screen readers are able to read cells of a table and not just portions of the text which happen to be separated by tabs or spaces. A simple "tab/space" column reading mode will not solve the problems associated with text on a row which wraps to the next line.)

PROBLEM. Limited full document search capability. Most Windows screen readers allow the user to search the current screen for text and graphics. They do not, however, allow the user to search the full document. Although many Windows applications allow full document search capabilities, they are often extremely difficult to use with screen readers. This is because it is almost impossible to read the document text to see if the user is at the correct place in the document while the "search" dialog box is still present. One has to exit this dialog in order to read the text and then go back into the dialog box to continue the search if needed.

In addition, as there are currently no strategies to allow the user to search for the next title, paragraph, etc., one often is desirous of searching for carriage returns which may indicate such boundaries. This cannot be done in those applications which do not allow the user to search for format controls or which use the "return" itself as a search delimiter.

Section 4. Conclusions

The growing realization is that the strategies needed to bring about equal access to information will have to be acted upon jointly by screen reader manufacturers, application developers, and operating system designers. For example, in Windows 95, you can call up a great number of individual sounds to indicate that an application has been opened or closed, an error has occurred, etc. These are sounds which the user can either record with his or her own voice or call up from a variety of system sounds. The ability to link this sound library to other actions, such as tones used to announce format changes in a document, would be a real break through. But, the individual pieces to this and other parts of the puzzle may be held by one or more of the groups mentioned above. Setting up a mechanism for communication among these groups is crucial.

We will, of course, need to come up with compelling reasons for manufacturers to build in this extra audible flexibility, but it is certainly possible to come up with some rather good reasons why "able-bodied" persons might applaud some of these concepts.

Finally, no set of recommendations will be worth very much if they aren't read, thought about, torn apart, and re-assembled by the persons who will benefit from them the most. We sincerely hope that you will send us your comments and suggestions no matter how involved or simple they may seem to be. Your input is crucial to this project, and this project is crucial to the ability to equally access information on the WEB as well as in other formats. Thank you for taking the time to read and respond to this cooperative attempt to improve the way all people receive and process information.