U L T R A S O N I X F R E Q U E N T L Y A S K E D Q U E S T I O N S FAQ Version 1.0, last updated December 27, 1996 Maintainer: Keith Edwards Xerox PARC kedwards@parc.xerox.com 1.0 INTRODUCTION 1.1 What is UltraSonix? 1.2 Why do I sometimes see the software called Mercator or Sonic X? 1.3 Licensing 1.4 Porting 2.0 PROJECT HISTORY 2.1 Early Days 2.2 Heading Towards RAP 2.3 The Current Architecture 2.4 Current Status 3.0 REQUIREMENTS 3.1 Development Requirements 3.2 Runtime Requirements 3.3 Hardware Requirements 4.0 USAGE 5.0 PUBLICATIONS AND RESOURCES 5.1 What documentation is available? 5.2 Is there a Web page on UltraSonix? 5.3 What papers have been published on the system? 6.0 DESIGN OVERVIEW 6.1 Connecting to the application 6.2 Downloading widget information 7.0 HOW THE RAP, HOOKS OBJECT, AND ICE STUFF WORK TOGETHER 7.1 The Hooks Object 7.2 Rendezvous 7.3 RAP ============================================================= 1.0 Introduction 1.1 What is UltraSonix? UltraSonix is a prototype screenreader for the X Window System and UNIX. It provides speech and non-speech auditory representations of the applications on a user's X desktop, and can also generate Braille output of text areas. It can synthesize the input that applications "expect" (mouse clicks and key presses) from alternative input sources. The software works best with X11R6.1 applications built using the Xt toolkit and Motif. The software was developed at Georgia Tech over a period of several years. Since then, the original developers have moved on to other projects. This release of the system is free for non-commercial use. See 1.3, Licensing, and 1.4, Porting, for more information. The work at Georgia Tech is no longer on-going, and the developers unfortunately have very limited resources for supporting the software. The current version of the software is known to work only on Sun SPARCstations running Solaris 2.5 and the Common Desktop Environment (CDE). 1.2 Why do I sometimes see the software called Mercator or Sonic X? The original project at Georgia Tech was called Mercator, after Gerhardus Mercator, a 16th century cartographer who devised a projection of the Earth's surface in which lattitude and longtitude projections are parallel, easing navigation. It turned out that the name Mercator was trademarked, so we went with Sonic X as a potential new name. That name (or one close to it) was also taken, so we finally wound up with UltraSonix. The current distribution still has a number of references to Mercator and Sonic X in it. 1.3 Licensing The UltraSonix software has a registered copyright on it by the Georgia Tech Research Corporation. The use of the program is restricted to educational and non-commercial use only. Any inquiries regarding licensing should be directed to: Georgia Tech Research Corporation Technology Licensing Centennial Research Building - Rm. 275 400 Tenth Street, N.W. Atlanta, Georgia 30332-0415 1.4 Porting An effort is underway by a number of people to port the software to the Linux operating system on Intel hardware. Mark Novak of the Trace Center in Wisconsin is coordinating this effort. Mark can be contacted at menovak@facstaff.wisc.edu. ============================================================= 2.0 Project History 2.1 Early Days The Mercator project was started at Georgia Tech in 1991, and was initially a collaboration between the Tech Center for Rehabilitation Technology and the Multimedia Computing Group. The CRT had expertise in access software and a need for providing some form of access to X, and the MCG had a lot of experience hacking the internals of the X Window System. At this time, the project was funded by the NASA Marshall Space Flight Center and an interdisciplinary research grant from Georgia Tech. When we started, we wanted to take a pragmatic approach toward engineering so that we could produce something that someone might actually be able to use, but also keep a reseach-oriented focus and try to gain some new insight into novel interaction techniques for auditory environments. One result of the research focus that fell by the wayside was an investigation of techniques for synthetically spatializing audio to produce "3D sound" from a digital source. While not incorporated into Mercator, our group did produce some rather nifty 3D sound software. David Burgess (now at Interval Research Corporation) was chiefly responsible for this work. http://www.cc.gatech.edu/gvu/multimedia/spatsound/spatsound.h tml has more details if you're interested. Version 1.0 of the software was completed in 1992. Version 1.0 was architecturally very different than UltraSonix today because it started from a presumption that turned out to be not true: that we would not be able to alter the basic design of the X Window System to support screenreaders. Version 1.0 used an "external" approach to providing access. This means that it required no modifications of any kind to applications, toolkits, or window servers to operate. The system basically situated itself between the X server and client applications. To applications, Mercator appeared to be a generic X server, and to the "real" server Mercator appeared to be a client application. A big problem with this approach is that the X protocol provides very low-level information only. So the creation of an on-screen button would result in an sequence of X protocol traffic specifying lines to be drawn at absolute pixel coordinates. To work around the low-level nature of the protocol traffic, version 1.0 of the system augmented the information derived from the protocol traffic with Xt-specific information available via the higher-level Editres protocol. Editres was originally designed as a tool for interactive customization, however, and it became clear that it wasn't suitable for our needs. Mercator 1.0 is described in Technical Report GIT-GVU-92-05, available from Georgia Tech or from the Mercator web site. Despite the architectural differences between 1.0 and later versions, the hierarchical navigation scheme and auditory icons used by 1.0 have pretty much carried through to the latest version. 2.2 Heading towards RAP. After version 1.0, we backed away from our requirement that we not modify X, and started to look at what minimal set of changes could be made to support screenreaders and similar applications. This resulted in an Editres-like protocol designed to gather information at the level of Xt widgets: buttons, scrollbars, and the like. This protocol was called (creatively enough), XtProto, and served as a testbed for us to see what types of services we could provide to screenreaders if we could communicate information about widget state. The work on XtProto resulted in Mercator version 2.0. This work and XtProto itself were described at the X Technical Conference. Version 2.0 worked well enough that we believed it was the way to go for future screenreader work under X. We also had a set of modifications to the Xt and Xlib libraries that could be used to provide information about GUI state to screenreader programs. This version also had the added side- effect that it let us show a proof-of-concept to folks at the X Consortium, and get the ball rolling on incorporating functionality like XtProto into future versions of X. Versions 3.0 and 4.0 refined the internals of the system and cleaned up the protocol to the point that we felt comfortable with its stability and utility. Both of these versions added user interface functionality, but were based on a protocol derived from the XtProto used back in version 2.0. These later versions were supported by the NASA Marshall Space Flight Center, Sun Microsystems, and the National Security Agency. 2.3 The current architecture At about this time, we took over the task of proposing a protocol and set of Xt and Xlib extensions for screenreaders to the X Consortium. Broadly, these modifications fall into a number of categories: - Changes to Xlib. We needed a way to get low-level protocol information from Xlib reliably; our old pseudo-server approach introduced too many race conditions to work well, so X11R6 shipped with a new client-side Xlib extension, called XESetBeforeFlush that can catch this protocol information before it goes out over the wire. - Changes to Xt. We needed to hook into Xt to trap information about widget state changes. Kaleb Keithley implemented a set of hooks in the R6 libXt that provides can capture widget state information. A number of people on the x-agent mailing list also contributed to this design. - A rendezvous protocol. Will Walker at Digital Equipment Corporation proposed a protocol for initially connecting screenreaders (called "external agents") to running applications. This protocol shipped in X11R6.1, and is called the ICE X Rendezvous Mechanism. Look for a description of it in the Interclient Communication Conventions Manual (ICCCM). A number of other people also had input into the design of this protocol. - A remote access protocol. Will Walker came up with the nifty acronym RAP, for Remote Access Protocol. RAP is basically a descendent of XtProto, cleaned up to use the hooks into Xlib and Xt, and Will's rendezvous mechanisms. Unfortunately, RAP has not yet been adopted by the Consortium as a standard. Version 5.0 of the system was a complete overhaul, incorporating RAP and a number of other changes including the ability to dynamically load new I/O modules without recompiling. This version is described in the ACM Symposium on User Interface Software and Techology, (see Publications, below). Version 6.0 added much better text-handling ability, lots of bug fixes, and support for the Common Desktop Environment (CDE). At this time, our main sponsors were Sun Microsystems and the National Security Agency. We shipped code to them in December of 1995, and the project officially ended at that time. None of the core people from the project are at Georgia Tech any longer. Version 7.0 is the "external" release. It is exactly the same as the version 6.0 code we shipped to the NSA, but includes some legal disclaimers that the Georgia Tech Office of Techology Licensing wanted. A few people have version 6.0 source code that they got by signing individual-use licenses before version 7.0 hit the streets. We should probably rename version 7.0 to 1.0 or something, since noone will ever use any earlier versions anyway. Past project members include: Elizabeth Mynatt Keith Edwards Tom Rodriguez Ian Smith Kathryn Stockton Sue Liebeskind Will Luo Stacy Ann Johnson Kevin Chen John Selbie David Burgess Phillip Seaver Thanks to others who have played a great role in getting this software off the ground: John Goldthwaite Will Walker Gerry Higgins Craig Moore Gary Day Earl Johnson Sue Hartman Mayer Max Sheila Stanley Jim Hoover 2.4 Current Status The current external release of the software is version 7.0. Work is ongoing under the auspices of the Trace Center to port UltraSonix to Linux (see 1.4, Porting, above). ============================================================= 3.0 REQUIREMENTS 3.1 Development Requirements The project at Georgia Tech was only concerned with bringing the system up on Sun hardware running Solaris. The system as released from Georgia Tech is known to run on Sun SPARCstations running Solaris 2.5 and the Common Desktop Environment (CDE). The requirements below reflect this platform; ports to other platforms may have other requirements. To build UltraSonix, you will need the following: - An ANSI-compliant C compiler (we used Sun SPARCcompilers C 3.0.1). - A reasonably-good C++ compiler, with support for templates and exception handling (we used Sun SPARCcompilers C++ 3.0.1). - Fairly POSIX-standard include files. - The Rogue Wave Tools.h++ class libraries, version 7.0 or later. - The Tcl scripting language. - X11R6 or later. See the Design Guide for more information. 3.2 Runtime Requirements UltraSonix requires a reasonably quick machine to run well. We developed on SPARCstation 10-class hardware, but ran on everything down to a SPARCstation 2 with 32MB RAM. To run the system, you will need - Hardware supported by the system (see 3.3, Hardware Requirements, below). - The Sun audio device (/dev/audio). - Either modified X11R5 or X11R6 for client applications. - Client dynamically linked against X11R5 or X11R6. 3.3 Hardware Requirements The following hardware is currently supported: - Dectalk DTC01 speech synthesizer - Dectalk Express speech synthesizer - Entropic TruTalk software-only speech synthesizer - Alva 3/20 Braille terminal - Alva 3/80 Braille terminal - Genovations keypad ============================================================= 4.0 USAGE ... I'll be adding stuff here as the questions come in. :-) ============================================================= 5.0 PUBLICATIONS AND RESOURCES 5.1 What documentation is available? A Users Guide and Design Document are included in the source distribution. 5.2 Is there a web page on UltraSonix? The original project web page is at: http://www.cc.gatech.edu/gvu/multimedia/mercator/mercator.htm l Be forwarned that it is significantly out-of-date. A page on the RAP and ICE Rendezvous efforts is at: http://www.x.org/x-agent/ 5.3 What papers have been published on the system? A subset of the papers are on the web page. These include: Technical Report GIT-GVU-92-05: Mynatt, E. D., and Edwards, W. K. "The Mercator Environment: A Nonvisual Interface to the X Window System," February, 1992. Technical Report GIT-GVU-92-28: Mynatt, E. D., and Edwards, W. K. "New Metaphors for Nonvisual Interfaces," 1992. Mynatt, E.D. and Weber, G., "Nonvisual Presentation of Graphical User Interfaces: Contrasting Two Approaches," in the Proceedings of the 1994 ACM Conference on Human Factors in Computing Systems (CHI'94), Boston, MA, April 24-28, 1994. Mynatt, E.D. "Auditory Presentation of Graphical User Interfaces, " in Kramer, G. (ed) Auditory Display: Sonification, Audification and Auditory Interfaces, Santa Fe. Addison-Wesley: Reading MA., 1994. Mynatt, E and Edwards, W. K., "Mapping GUIs to Auditory Interfaces," in the Proceedings of ACM Symposium on User Interface Software and Technology (UIST), 1992. Edwards, W. K. and Rodriguez, T., Runtime Translation of X Interfaces to Support Visually- Impaired Users," in the Proceedings of the 7th Annual X Technical Conference, Boston, MA, January 8-20, 1993. Click HERE for ASCII version. Edwards, W. K. Mynatt E., and Rodriguez, T., "The Mercator Project: A Nonvisual Interface to the X Window System," in The X Resource, Seastopol, CA. Issue #7, 1993. Mynatt, E. and Edwards, W. K., "New Metaphors for Nonvisual Interfaces," book chapter to appear in Extraordinary Human- Computer Interaction, Edwards, A. (ed.), Addison Wesley, due 1994. Edwards, W. K., Mynatt, E. D., and Stockton, K. "Providing Access to Graphical User Interfaces--Not Graphical Screens," in Proceedings of ACM Conference on Assistive and Enabling Technologies (ASSETS), Marina Del Rey, CA, November, 1994. Edwards, W. K., Mynatt, E. D. "An Architecture for Transforming Graphical Interfaces," in Proceedings of ACM Conference on User Interface Software and Technology (UIST), Marina Del Rey, CA, November, 1994. ============================================================= 6.0 Design Overview 6.1 Connecting to the application. When Ultrasonix first starts, it creates an instance of the "Xserver" object. It's the responsibility of this object to open up a standard X protocol connection to the X server on whatever machine the user is sitting at. This object will inform Ultrasonix whenever anything "interesting" happens in the X world. (XServer is a subclass of FDInterest, which is an important class to understand if you're trying to figure out how the system works. Basically classes that are subclasses of FDInterest can participate in Ultrasonix' main loop of input dispatch. The XServer class is a variety of FDInterest that "watches" for things happening on the X server, and tells the rest of Ultrasonix whenever anything occurs. See the design docs for a discussion of FDInterest.) One of the important things that the XServer object does is it looks for new toplevel windows being created (by soliciting the X "SubstructureNotify" event on the root window). Whenever a new toplevel window gets created by an application, Ultrasonix gets an event from the X server. This event is received by the Xserver object. After a bit, the code percolates down to the HandleMapNotify routine in XServer. It's this routine that's responsible for trying to connect with the application. The algorithm that it uses is a bit complicated...check the comments in HandleMapNotify for the details. The reason it's complicated is that we try to distinguish windows created by new applications we haven't seen before, and windows (maybe dialog boxes) created by applications we're already connected to. The event we get from the X server is the same in either case and we have to do some trickery to figure out what's going on. But basically the end result is that, if the new window seems to be a new toplevel window for an application, we try to connect to the application. I won't discuss that process here--it's part of RAP and the ICE Rendezvous Protocol--but the end result of a successful connection is that Ultrasonix does the following: - It creates a new Client object that's used to represent the application we've just connected to. (There's one Client object for each application that we've connected to.) - It creates a new RAP object that'll be used for communicating with the client. (RAP objects are subclasses of FDInterest so that they can cause Ultrasonix to "do things" when interesting RAP events occur.) (There's one RAP object for each application we've connected to.) - It sends, via RAP, a message to the application asking it to send us a RAP message whenever new widgets are created or destroyed, widget attributes are changed, etc. This ensures that any future changes in the state of the application will be sent to us. - It sends, via RAP, a message to the client to ask for its entire widget state. The way it does this is to call the FullQueryTreeRequest() routine on the RAP object. See below for what happens next. 6.2 Downloading widget information FullQueryTreeRequest sends, via the RAP protocol, a message to the client application asking for its widget state. In response, the client sends back a message containing information about: - All of the widgets in the application, including parent/child relationships between them. - All of the X "resource" information associated with the widgets. (In X, all of the attributes of a particular widget, like the text string displayed in a button, or the position of a scrollbar, are called "resources.") As this information comes in, Ultrasonix builds the offscreen model of the application. The "top level" object in the offscreen model is called the "AppModelMgr." There is only one of these. The AppModelMgr keeps a list of Client objects (described earlier), one for each application we've connected to. Clients keep a list of XtObject objects, each of which represents one widget in an application. Each XtObject keeps a list of Resources objects that represent one resource in one widget in an application. So basically the offscreen model is a big tree that looks something like this: AppModelMgr (only one of these for all of Ultrasonix) | | |------------- Client #1 | |------------- Client #2 | |------------- Client #3 | | . . . | |------------- Client #n | | | |--------- XtObject #1 | |--------- XtObject #2 | | . . . | |--------- XtObject #n | | | |------ Resource #1 | |------ Resource #2 | | . . . | |------ Resource #n You can look at AppModelMgr.h, Client.h, XtObject.h, and Resource.h to get a feel for the kinds of information that's stored in the offscreen model. As an example, XtObjects maintain a lot of information about the widgets they model: parent, children, the window ID of the widget's window, position, border width, whether the wiget is mapped (visible on the screen), and so on. So basically, by the time the application replies to the FullQueryTreeRequest RAP message, we've got a Client object for it and all of the stuff underneath that describe the application's graphical user interface. If any changes occur in the interface--a dialog box gets created and popped up for instance--we'll get RAP notification messages from the application that tell us the changes to make in the offscreen model. (I won't talk more about how Ultrasonix handles changes in an application's interface once we've already connected to it. Some of this is fairly complex. Maybe I'll try to write up a separate message describing it.) Our goal for the offscreen model code was to build a model that, to the rest of Ultrasonix, looks like it automatically stays in synch with what's going on in the application. The RAP object itself handles the update messages from the applications and changes the model as neccessary. The rest of the system never has to change the offscreen model by hand. If you are looking for the particulars of how the offscreen model is generated based on information received from the RAP protocol, the places to look are in RAP.cc: - ProcessFullQueryTreeReply builds the model based on the data received from the initial FullQueryTree message. - Changes in the interface that occur after startup time are handled by: ProcessRequestNotify, ProcessEventNotify, ProcessChangeNotify, ProcessConfigureNotify, ProcessCreateNotify, ProcessDestroyNotify, and ProcessGeometryNotify. Each of these handles a particular type of RAP message that comes from the application when something changes. ============================================================= 7.0 How the RAP, Hooks Object, and ICE stuff work together. 7.1 The Hooks Object X11R6 included a new widget called the HooksObject. While the HooksObject is a widget, you can think of it as being just a data structure that's maintained by the internals of Xt, and is used to store a collection of pointers to functions. There is only one HooksObject per Display structure in the X toolkit. R6, in addition to putting this data structure in the toolkit, also made extensive modifications to nearly all of the Xt routines. All of the routines that can create, destroy, or in any way change a widget are modified so that they'll get the HooksObject maintained internally by Xt, and call one of the functions stored there. The HooksObject has five slots that can hold pointers to functions: - Create, which will be called by any Xt routines that create widgets. - Destroy, which will be called by any Xt routines that destroy widgets. - Configure, which will be called by any Xt routines that change the "configuration" of a widget--X, Y location, width and height, etc. - Geometry, which will be called by any Xt routines that changes the "geometry" of a widget (basically this is about the same information that Configure sees, but done in a slightly different way). - Change, which is a grab-bag function that will be called from any Xt routine that does any other kind of widget changes. This includes changing resource values, translation tables, all kinds of stuff. By default, there are *no* functions installed in any of these slots in the HooksObject. So all of the modified Xt routines that try to call out to the HooksObject when they do their work just ignore the functions they find there if they're NULL. You'll notice that the HooksObject--and the Xt modifications that call out to it--are completely independent of RAP or even of screen readers. The HooksObject is a general-purpose facility by which applications or the toolkit itself can install code that'll be executed whenever Xt does anything. The result of these modifications, though, is that by installing code in the HooksObject you can take some special action whenever Xt changes state. So what will happen when UltraSonix connects to an application is that we'll stick pointers to some RAP-related functions in these slots. This will allow us to run our code that sends messages to an external screen reader whenever Xt changes state. The process by which these functions get installed is described below. 7.2 Rendezvous [NOTE: The text below describes how Ultrasonix performs rendezvous with clients now. The actual protocol for rendezvous became a Consortium standard in X11R6.1, but has not been implemented in Ultrasonix. The version of the rendezvous protocol that the Consortium adopted is not too different from ours though. Moving Ultrasonix to the latest Consortium code should be on any TO DO list for the project.] The X Consortium has provided, as of X11R6.1, a standard way for any external program (called an "external agent," in the Consortium docs) to initiate a "rendezvous" with a client application. Basically this mechanism just gives two applications a means to jump-start a connection between them; it says nothing about what actually happens once a connection is established. In R6.1, the Xt library has been modified so that any top- level application widgets have a special "event handler" installed on them. This event handler basically specifies that when an event in a certain format is received by the application, the toolkit will interpret the event as an attempt by an external agent to contact it. To contact an application, an external agent (such as Ultrasonix) has to go through several steps. First, it opens an ICE connection and begins listening on it. ICE is a general-purpose transport protocol that's used for inter-client communication in X11R6 and later releases. ICE is simply a transport layer--other protocols, such as RAP, are layered atop ICE. In the rendezvous model, the agent listens for incoming connections and has the client connect to it, rather than the other way around. So to get the client to connect to the agent, the agent must communicate the "ICE address" that it's listening on to the client. It does this in three stages: - First, the agent creates a "Property" on the root window. A Property in X is simply a named piece of data that's associated with a window. Properties are a common way of exchanging data between clients. - The agent stores the address information for the ICE port it's listening on on the property. - The agent then constructs a ClientMessage event that contains the (1) name of the property containing the address and (2) the name of the protocol it wishes to speak (in this case RAP), and sends this to the client. This rather convoluted process is necessary because ClientMessages can only contain a few bytes of data, and ICE addresses are too large to be sent in a ClientMessage event. So we store the addresses on a property and communicate the property name to the client. When the client receives this message, several things happen: - The special event handler that all R6.1 Xt applications have built-in "catches" the ClientMessage event and treats it as an attempt by an external agent to initiate a communication. The event handler runs a function that does the client-side portion of the rendezvous protocol. - This function gets the name of the property out of the event, and then fetches the value of the property--the ICE address of the agent--off of the root window where it's stored. - Once the client has the address the agent is listening on, it initiates an ICE connection back to the agent. - The client takes the protocol name that was sent to it (RAP in the case of Ultrasonix) and looks in an internal table that maps from known protocol names to functions that should be run when an agent comes along and wants to speak that protocol. This is the part of the protocol that's standardized by the Consortium. Note that this standard says nothing about what data is actually communicated between the client and the agent once the connection is established. Since RAP isn't adopted by the Consortium as a standard, the protocol name "RAP" isn't known to the rendezvous code, and the required RAP code isn't linked into Xt. So we relink libXt, adding in the RAP code and putting the RAP initializer function in the protocol table on the client. This function is called RapInit() and lives in RAP.c in UltraSonix.source/RAP/client. This function installs some special RAP functions in those function pointer slots in the HooksObject. After doing this, it goes ahead and connects to the agent, Ultrasonix in this case. The code to handle the rendezvous process is in Xtea.c (Xtea for "Xt External Agent"), in UltraSonix.source/RAP/client. 7.3 RAP At the end of all this processing, there should be an open ICE channel between Ultrasonix and the application. The application has installed our own RAP functions in the slots in the HooksObject. Because Xt will call these functions whenever anything "interesting" happens, we have a way to trap things like widget creation and changes and so forth. These RAP functions will send messages out over the ICE channel to Ultrasonix describing what happens when things change in the interface. The client will also listen on this ICE connection for messages coming from Ultrasonix, or any other external agent program. So the screenreader can send commands to the application to change resource values, request the widget tree to be sent back to it, or several other things. This is how the FullQueryTree message that retrieves the application's state works, for instance. If you look at the RAP protocol description you'll notice that the protocol is broken into several types of messages: - Requests go from the agent to the client, and ask that the client perform a specific action or return some information. - Replies go from the client to the agent, and are sent in response to a request from the agent. - Notifies go from the client to the agent, and are sent "asynchronously" (that is, without the agent having to ask for them) in response to some change in the interface. These messages get generated by the functions installed in the HooksObject. The RAP code that goes in the HooksObject slots and the code that lives in the client and knows how to respond to incoming requests is in client-out.c and client-in.c, in UltraSonix.source/RAP/client. The lowest level of RAP processing that happens in the agent (Ultrasonix) is implemented in the files agent-in.c and agent-out.c in UltraSonix.source/RAP/agent.