Universal Access Project NII Infrastructure

Universal Access Project
NII Infrastructure
Initial Draft
January 1995

Components of the NII

In order to make the NII accessible, it's important to understand the different components of it. Although there are many ways of breaking it down, for this discussion we basically break it into four categories:

Sources of information;
Transmission mechanisms (pipeline);
Translation and other services during the transmission process;
Viewers.

Sources of Information

The first component of the NII is basically the information providers. These are the people who create the information or data which is sent over the NII to others. Information must either be produced in accessible formats, or in formats which can be easily translated into accessible formats. Examples of information sources include:

Libraries

Government services (information on employment, financial aid, taxes, hours of service, services available, etc.)

Most commercial companies (information on products, prices, deliveries, stock, hours, etc.)

Companies whose products can be sent "over the wire" (movies, advice, newsletters, information on any topic)

Local schools (homework assignments, homework aids, schedules, meetings, school lunch menus, etc.)

Universities (course schedules, financial aid, program descriptions, research opportunities, jobs)

Clubs (announcements, newsletters, meetings)

On-line information services (e.g., CompuServe, Prodigy, Genie, eWorld, etc.)

Your family (plans, schedules, coordination of emergencies, group letters/updates, gift lists at holidays)

You (things for sale, resume, services, newsletters, advice, information on any topic)

Transmission Mechanisms

Once you are connected to the information highway, you will have no idea exactly what channels the information will take, either coming to or going from you. In most cases, the information will travel over many different transmission mechanisms along the way.

Some examples of different transmission mechanisms include:

The Internet

Cable television wiring

Special optic fiber links

Microwave

High-speed phone/data lines (ISDN)

Satellite

Cellular phone

Radio carrier or subcarrier

In-Transmission Services

Although it is all up for discussion at the present time, regional Bell operating companies have not in the past been able to alter the signal in any substantive way between origin and destination. However, as more general NII services unfold, there may be many different ways that information is translated between the sender and the receiver. In many cases, these mechanisms will increase accessibility options. Some examples of translations include:

Translation of voice to e-mail or fax

Translation of fax into e-mail

Translation of e-mail into fax

Translation from one language to another

Translation of TDD to voice, or voice to TDD (providing more direct, secure, and confidential communication)

Frequency shifting (to better match the hearing profile of the receiver)

Speech filtering (to increase the intelligibility of some types of speech)

With these translators, information can be made available in the form most convenient at any particular time (e.g., via voice for someone who is driving a car, but who might want the information in printed form if they were at home or at the office). It is also possible to convert information from a form which is inaccessible to some people into other forms which are accessible (e.g., a fax might be converted into electronic mail or voice for someone who is blind). In such a case, it is not clear whether this would be considered a translation of the materials or simply what Frank Bowe calls a "protocol conversion" which does not alter the message but simply converts its modality.

Viewer

This category includes all systems or devices used to receive and display information. (If you are sending information, you would be a source, as described above.) In order to be accessible, the viewer must both be able to display the information in a form compatible with the person receiving it and have controls which are compatible with the individual's physical, sensory and cognitive capabilities.

Viewers can take a wide variety of forms, including:

Television sets (with special "set-top" adapter boxes)

Standard telephones

Telephones with video or touchscreens

Kiosks (public information systems which look like a touch-sensitive television screen mounted in a cabinet of some type)

Ordinary fax machines

Cellular phones with built-in display screens

Special information appliances

How it feels when you use the NII

The look and feel of these information systems will vary greatly, depending upon their design, intended use, and the target audience. Some Internet-based systems are quite sophisticated, and provide powerful search and retrieval software and techniques. They are intended for use by people with more experience and knowledge who require more exact or powerful tools.

At the other end are systems which are available that require no training and are easier to use than your VCR or your microwave oven. In some cases, operation is rather like changing channels on your television and then making selections from choices presented on screen. Some systems have as few as two or three buttons, while others let you just touch the particular items or topics on screen that are of interest to you. Other systems under development will allow you to talk to them and explain what you are interested in.

1) Access Issues Regarding Source and Pipeline

Issues regarding creating accessible source information

Information which was purely auditory would be inaccessible to people who are deaf, and information which was purely visual in nature would be inaccessible to people with visual impairments. Making this type of information accessible has to occur at the source, since it is not possible to render it accessible later. For discussion of these issues, see the separate paper on "Access to Content."

Information which is purely auditory or visual in nature, however, needs to be separated from information which is simply rendered in auditory or visual form. For example, speech consists of words and prosodics (intonation, etc.) The words or text portion of the speech would not be considered pure auditory information, but rather information rendered in auditory form. Similarly, a page of text which was rendered as a graphic image contains at its heart text which has been rendered graphically.

When information is not inherently visual or auditory, but rather just rendered in that form, there is the potential for having automatic translation of the information from one form to another. For example, software already exists which will convert faxes (which are pictures of text on a page) into ASCII text. Some software will in fact convert the fax into a file which contains not only the text but also text characteristics and formatting such as bold, italic, indent, etc.

Similarly, speech recognition software is advancing so that spoken words can be converted into ASCII text. It is important to note in passing that, depending on the way the material is spoken (for example, in an angry tone), converting it into text may not convert all of the information (e.g., anger) in the original utterance.

Today, both of the these types of software are improving but still have limitations. The optical character recognition (OCR) software can recognize printed text and convert it into ASCII. However, the text must be clear, and it must use fairly straightforward fonts (fancy fonts do not work). It will not recognize stylized logos, and the text must be very easily separable from the background (e.g., text written over the top of graphic information may not be discernible). Similarly, speech recognition software requires fairly predictable pronunciations of words and a separation of the speech from other background sounds.

Building translators into the pipeline

Automatic translation algorithms such as fax-to-text are already good enough that they are beginning to be built into information channels (e.g., the translator would be at the phone company). For instance, it is already possible to fax someone information and have the information sent through a translator on the way to the recipient. The recipient can then pick up their information via e-mail or have the fax read to them aloud. Although still in the demonstration stage, these types of services may eventually allow individuals to choose the presentation mode for information sent to them. Their preference of presentation mode may be due to a disability, or it may be due just to their current circumstances. For example, an individual who is blind might like to have a fax read to them, but so might someone who was driving their car and accessing their information over the car phone.

Such services might even be used by an individual who was blind to send information to themselves through one of the translators. For example, an individual who received a letter but did not possess a computer, scanner, OCR software, etc., might use a common fax machine to send a fax of the letter to themselves. Since they would already have stated a preference for receiving faxed information in voice form, the translator would be invoked and they could then check their voice mail for a spoken version of the letter they faxed to themselves. When the optical character recognition software is good enough, they may also be able to fax themselves a copy of the label for a can, a box, or a medicine bottle, and have it read back to them.

There would be several advantages to this approach over owning or purchasing a personal reader. First, the equipment investment cost could be very low, since fax machines are now dropping into the couple hundred dollar range. A fax machine which was send-only or had a very inexpensive printer could also be used, since the individual would not generally be receiving faxes in print form. More importantly, however, the OCR software and algorithms used could be extremely powerful techniques running on a much more powerful computer than the individual would normally want to pay for. In addition, they would have access to continually improving software, without having to go out and purchase a new set of software each year to keep up with this fast-changing area. Finally, this capability would be available to anyone wherever they were, even if they did not have any equipment with them. They would simply have to locate a fax machine (or carry a small fax machine with them).

There are, of course, disadvantages, and depending upon the cost of the "in pipeline" translation service it may be either more expensive or less expensive for different individuals to use this as a service versus purchasing the equipment themselves. There is also the question of which approach would be easier to use for individuals with different levels of technical expertise, etc. The potential, however, is there, and will undoubtedly be of interest and use to some individuals and not others. However, It will be very important to ensure that services such as this, and voice-to-text services for people with hearing impairments, are fully accessible to these individuals, and to individuals who have combinations of impairments. For example, a voice-to-text service which requires that the user be able to hear in order to set it up or use it would be a problem. Similarly, a voice-to- text service which requires good vision would be inaccessible to someone who had a hearing impairment and a visual impairment (as is common among people who are older).

It is also important for manufacturers or service providers to note that services such as this would not be used only by people who were deaf or had low vision. Many individuals who currently use the phone but have difficulty understanding what people are saying might find a simultaneous presentation of the speech on the screen to be of enormous benefit. Again, one of the primary audiences would be people who are older. Many older people have difficulty hearing and begin to avoid situations which require them to communicate with someone who is talking. However, they do not consider themselves "hearing impaired," and therefore would not use any other type of communication (such as a TDD) which they felt labeled them as having a disability. In these cases, services such as this would be best received if they were available via a device or system which was used for a number of different purposes rather than being a special service or device for people with hearing impairments or deafness.

Difficulty classifying information

The advent of these types of services will make increasingly difficult to classify information as being accessible or inaccessible.

If there are translators available in the pipeline between an information source and a destination, is the information considered accessible, even if it is not in an accessible form at the source?

The answer would probably be yes, if the cost and availability of a translated version of the information was the same as the cost and availability for the untranslated version. There would also, of course, have to be some assurance that the translator could handle the information. Having a speech translator available in the pipeline would not necessarily render all speech accessible. It would only render accessible that speech which could in fact be handled by that translator.

Fortunately, the test for this is fairly simple. Information can simply be fed through the translator and checked. If it translates successfully, it is accessible.

This also suggests the other strategy that can be used, even with marginal material or marginal translators. Those who are sourcing information could send it to themselves through a translator. Even if the material that came back had rough spots, these could be straightened out by the source and then the cleaned up translated version could be kept at the source. Users could then request either version. The translated version could also be bundled with the other version and both sent out whenever a request came in. The user could then select either or both presentations at the other end. For example, if audio recordings carried with them (buried in the audio signal) the text translation of the audio, anyone who received an audio recording could choose to have it presented to them either as audio or as text or as both.

For this to occur, it would require:

that there was an audio recording standard that allowed incorporation of text information,
that the transmission system or pipeline (which usually compresses this information) was able to successfully transfer the entire format to the user, and
that the user's viewer had the ability to present either or both forms of the information.

Since all transmission of information over the NII will be in digital form, this type of hybrid data format is eminently doable. In fact, the high-definition television transmission standards now being developed are being developed to support a large number of parallel data streams of different types for each television "channel."

Although these concepts (such as burying a text rendition of the speech in an audio recording) are being advanced from the perspectives of access by people with disabilities, it is imp, it is important to note that there is a wide variety of advantages and uses of this dual presentation of information commercially and for individuals who have no disabilities.

In the discussions above, examples are used which are oriented either to auditory information or to visual information. However, the underlying concepts are applicable to both. They are also applicable to individuals for whom English is a second language, but who have no disability. For example, automatic Spanish to English translating software is now available, although, like speech recognition and optical character recognition technologies, it is still developing.

Summary: Source and Pipeline Issues

In summary, the emerging NII and next-generation information technologies will introduce a number of new capabilities which can drastically change the way we think about information and accessibility. As these technologies become inexpensive and portable, these opportunities expand even further. People are currently envisioning pocketbook-sized or even pocket-sized communicators which would replace phone, fax, mail, radio, and even television. Large wall-sized displays would be available when you are at home or eventually most anywhere. The systems would operate like a cordless or cellular phone and allow you to send and receive images as well as text and voice anywhere you happened to be. As this happens, the availability of in-transmission or in- pipeline translating services will become even more important and will be used increasingly by everyone. A major use will be to amplify the computing power or features of their portable systems and have these capabilities whenever they need it without having to carry that computing power around with them as they move around.

It is critically important, however, that the disability access and usability issues surrounding these capabilities be addressed in the very early stages. It is so very easy to design a system which should provide great benefit, but which is inaccessible due to an oversight.

2) Access Issues Around Viewers

Assuming the information has been sent from the source in a form which could be made accessible to users, and assuming that the pipeline was designed to transmit that information fully (and/or translate it to make it even more accessible), the end user must have some piece of equipment to view the information. In this context, the term "view" is used generically to include audio, visual, tactile, or any other mode of presentation appropriate and effective for the user.

In order for the viewer to be useful to an individual, two things must be true:

It must be possible for the viewer to present the information in a form accessible to the user.
The individual must be able to operate the viewer.

Currently, many of the telecommunication and information tools are designed to present information in a single form. Telephones are designed to present information only auditorially, and fax machines only visually. Until recently, televisions were designed to present information auditorially and visually, but only some of the information was presented visually, and only some of the information was presented auditorially.

For viewers to be accessible, it should be possible for the user to specify how they would like each of the different types of information being sent to them to be presented or displayed. Using the television as an example, the individual should be able to request that the information which is traditionally presented auditorially be presented auditorially, visually (e.g., captions), or both (for mixed audiences, for individuals with residual hearing, and for children or others using the captions to learn to read). Similarly, the individual should be able to request that the information traditionally presented visually be presented visually, auditorially (for individuals who are blind, or who are driving a car or otherwise visually engaged), or have it presented in both forms simultaneously.

Accessibility of Computers Today

Currently, most access to the NII is through the use of computers. Because of the power and flexibility of computers, it is generally possible to have very flexible presentation of information. Especially if information is basically accessible in its source form, it should be possible to have users specify the form in which information should be presented. However, even though the technical capability is there, current software (both operating systems and applications) is not always designed to provide this.

It is important to note here that although manufacturers of operating systems are often singled out for critique in this regard, the developers of application software are quickly becoming the primary source of problems in this area. This is not to say that the operating system developers do not have a fair distance to go in making their operating systems accessible. It is just to note that even if the operating systems were designed to be perfectly accessible today, only about 10% of the application software would be fully accessible, since the application software developers actually have the final say in how their software programs operate.

Often, programs are inaccessible because they do not use the standard capabilities of the operating systems which access software developers rely on. For example, a screen reader may use the standard menu structures of the operating system to provide access to the menus. If application software developers use the tools provided by the operating system to create their menus, they will be accessible to the screen readers. If the application software developers create their own menus, or use alternate interface strategies to present control choices to the user, the screen readers may find it difficult or impossible to provide access. Thus, even if the operating system cooperated perfectly with screen readers, application software (word processors, spreadsheets, etc.) may not work with the screen readers because of the interface conventions implemented within the software. Interestingly, although operating system manufacturers strongly lobby application developers to use the standard system tools, many of these operating system companies ignore this advice when writing their own applications.

One example of an application's impact on accessibility can be seen in its ability to be operated from the keyboard (e.g., without using a mouse or trackball, etc.). The ability to operate a program entirely from the keyboard (e.g., no mouse required) is important both for individuals with physical impairments and for individuals with visual impairments or blindness. Even though mechanisms for accessing all of the menus, dialog boxes, etc., are provided within an operating system to allow keyboard use, application programs can write their interfaces in such a way that a mouse is required for operation of the full program. In some applications, such as drawing or painting programs, this may have some logic. However, in word processing programs, the logic for not allowing full keyboard access to the program is less clear.

A case example

An illustration of problems both in presentation of information and in operation of the program can be seen in the program called Mosaic, which is very widely used to access information on the current NII.

In making these comments, it should be noted that when NCSA (who developed Mosaic) learned of these accessibility difficulties, they launched an effort to examine the accessibility issues and to build accessibility into their product. This project, which includes individuals with disabilities working side-by-side with researchers and programmers, is currently ongoing. Some of the issues around the current version of Mosaic, however, are excellent examples of how a product that is designed without consideration of disability needs from the beginning can be created and disseminated on a mass basis with unnecessary barriers built into them.

Mosaic is a multimedia browsing tool for use on the Internet. A browsing tool is essentially a tool that lets you look around at the various information sources on the network. These include everything from an AT&T 800 number phone book, to discussion groups on every conceivable topic, to scientific databases, to recipes for chocolate chip cookies. Mosaic allows information to be presented using text, pictures, movies, or audio sound clips.

Although Mosaic for Windows allows access from the keyboard for many operations, some operations still require that you move or position the mouse. Most Mosaic pages of information include text mixed with graphics. Although there is a standard in place which allows information source developers to attach text descriptions to each of the graphics, and although there is a growing number of sites that are attaching such text descriptions to the graphics within their documents, the Mosaic program does not provide a mechanism the text descriptions of the graphics to be presented. Even though it has a mode which will present pages with the graphics removed (since it is much faster to transmit pages of information if they don't have pictures in them), it does not have a provision for presenting the text for the missing pictures. It simply presents a small icon, essentially saying "there is a picture here, and if you would like to see it please click on the icon." Thus, we have an instance where an alternate form of the information (e.g., the text description for the graphic) is available, but where it is not accessible to the user because the viewer was not designed to present that information. Interestingly, most individuals who are operating Mosaic with the pictures turned off would find this information very useful. In fact, it is used in text-only alternates to Mosaic (such as Lynx).

Mosaic also has a feature which allows you to embed a picture in a document which does different things depending upon where you click on the picture. For example, you could put up a map of the United States. Clicking on one of the states would take you to more information about that state. When you call up the information server at the White House, it presents a picture of the First Family, with a series of medallions around it, each with words on them. Clicking on the medallions will take you to more information about that aspect of the White House. Other servers use this feature to create very fancy menu bars. The menu bars essentially consist of four or five words on a bar stretched across the screen. This could be easily done in text, but it is rendered graphically to make it more visually pleasing. When you click on the menu bar graphic, the server determines which word you clicked on, and then takes you to different functions depending on where you clicked.

This feature, which is called an ImageMap, is completely inaccessible to any type of screen reader. The screen reader sees this as exactly what it is: a picture. As a result, a user who is blind has no ability to click on any of these pictures or even menu bars rendered in this fashion and embedded in the document.

In this case, access would depend not only on Mosaic but also on the information servers. The ImageMap structure would need to be modified in order to transmit not only the picture but also the full information about the choices presented on the picture. A user could then choose whether they wanted to have the picture or the list of choices in text form presented. Again, it is interesting to note that independent of any disability access issues, this approach to implementing the ImageMap function has been raised in two forms by the Internet community itself. It also would provide a much faster way for everyone to access pages of information which contained this type of information. By skipping the graphic presentation, you can receive the information on the page 5-50 times faster. This is often important to people who are accessing the information over a phone line using a modem, where they might have to wait for minutes to get pictures downloaded, depending upon the speed of their modem and the size of their picture.

It is interesting to note that, had disability issues been considered early in the process, this improved, faster mode for transmitting information would have been present in the initial implementation (to the benefit of all users) rather than being introduced as a speed and usability improvement at this latter stage, which requires rewriting all of the code, reimplementing it in the field, and all of the pains of transition from one mode to another. It also impairs the ability to implement it in an optimal format, since it is now necessary to consider issues of compatibility with the older format. It is again important to note that this reimplementation and transition is not being done for the disability community, but by the Internet community for the benefit of the Internet community as a whole.

Pens and touchscreens

Graphic user interfaces

For the NII to be commercially feasible, it must be widely used. For it to be used widely used, it must contain lots of different types of very useful information. It must also be very easy to use for as many people as possible.

Ease of use today generally involves heavy use of graphic user interfaces, since they are especially helpful for individuals who are not "technophiles." Ease of use is also increasingly coming to mean the use of touchscreens which are used with or without a small stylus or "pen." Touchscreens are appearing not only in kiosks but also cellular phones and personal digital assistants (PDAs) (pocket-sized electronic notebooks, calendars, e-mail systems, etc.). Graphic user interfaces make it possible to create screens which have a much lower cognitive lower requirements to either learn or operate. Touchscreens make it possible to vary the buttons and controls to fit the particular activity or type of information being presented. This is extremely helpful in multi-function devices.

In the past, access to graphic user interfaces and touchscreens has been problematic. Access to graphic user interfaces has caused problems for screen reading software. Touchscreens (whether finger- or stylus-based) have been impossible for people who are blind to access, except in limited "blind mode only" form.

While each of these systems has caused problems, they both are powerful interface systems which are not likely to go away. Even after voice input systems become commonplace (see below), access via touchscreens and the use of graphic user interfaces will continue to be common. An important part of access to next-generation systems will therefore be the development of effective strategies for working with these interface systems. It should also be noted that these very same interface strategies in fact cause barriers for one type of disability, but are assets to others. While the graphic user interface and touchscreen is currently a problem for people who are blind, they are an asset to individuals with lower cognitive skills. Voice response systems which would be an asset for people who are blind pose problems for people who are deaf or who have significant speech impairments. Thus, cross-disability accessibility will never be achieved by avoiding one particular interface or focusing on another. Flexible cross-modal systems, however, are believed to be the only viable approach.

Draft for NTIA Advisory Meeting
January 1995