Voice User Interfaces
A vOICE manual for MOBILE users
Manuals usually assist users to become familiar with a newly purchased device by providing support for learning new tasks. When using a paper back manual, users are limited by their own ability to find the desired information and by the clarity of manual’s structure. Also, searching online manuals or through forum pages can be frustrating, inefficient and time consuming. Instead, a voice interface could offer an alternative solution to support users with information in real time while performing tasks with the device.
Challenges
Cognitive load: Voice user interfaces (VUIs) enables communication ONLY through non-persistent messages. This means, the user hears a message and subsequently the message is gone. The interaction pace is dictated by the output (aka speech) speed. This interaction style can be very demanding in terms of cognitive load
Interaction pattern: Unlike typical GUIs, the VUI must anticipate the users’ convention and interaction patterns rather than creating underlying conversational elements. This is because language is an ability learned by people since childhood
Language variability: Expressing something through natural language has a large variability, i.e. one can say something in so many different ways and therefore the response error rate of a VUI is much higher as compared with a GUI
Content: Mobile phones have dozen functionalities. However, which ones are particular difficult to grasp for users are at the first place unknown.
Visual elements to speech: Manuals often present explanatory information in visual form: setting up steps are explained using successive sequences of icons. Translating icons in a verbal representation is another challenge
Cultural aspects: Creating the right VUI persona that is tailored on Singaporean customers is crucial for a good user experience. However, Singapore is a multicultural place, so what aspects should be prioritized.
Solutions
Cognitive Load: to decrease the overall load VUIs phrases were designed to be short and handling single pieces of information. Further, users could ask the system to repeat the prompts if they were misunderstood or too fast. In this was, users had a tool to control the pace.
Interaction patterns: the prompts were designed to follow the patters of instructions and vocabulary typically found in user manuals. Here a few examples:
User: Where is my downloaded document? System: For files, please tap swipe up or tap the Folder icon on the home screen
Language variability: To decrease the input variability system questions were formulated close-ended, i.e. inviting for a ‘yes’ or ‘no’. However, for questions formulated by users, a data collection was required.
Content: Fundamental for any project dealing with direct user input is understanding user needs and interaction pattern, that means when user would need the help of a manual, what topic could be relevant, and how would users formulate their questions – see Experiment A. To avoid interference with more common mobile phones brands used by locals, we chose a Motorola smartphone – see figure 1.
Visual elements to speech: Using Motorola user manual, phone screens were carefully analyzed an translated into verbal representations that closely mirrored the visual description. Further, the verbal representations were tested for adequacy with help of several test users – see Experiment B.
Cultural aspects: Since the voice is a central feature of our system, both accent and speech pattern can come into play as cultural aspects. Within an experiment with several local users we studied the impact of voice accents on the system evaluation – see Experiment C.
User Testing
EXPERIMENT A
8 test participants were invited for a first round of requirements gathering. None was familiar with the Motorola phone. They were asked to complete some tasks with the phone (e.g. send a message, change settings etc.) and write down a list of unclear topics related with the phone functionality (if any). Almost half of the topics indicated by our participants were related with message sending options.
- problems encountered by sending and/or receiving text message
- using or installing new T9 language packages
- saving pictures from messages in a special folder
- sorting/searching for messages/keywords in the messages
Thus, we chose our first prototype to focus on these particular topics. Further, the same 8 participants were asked to generate questions on these topics. We searched online for the answers and compiled a small database on Q&A on our topic of interest.
EXPERIMENT B
Generating the Q&A however, turned out to be more complicated that we initially thought as often explanatory information was presented in visual form (see Fig. 2)
Our approach was to analyze the screens in detail and translated them them into a verbal representation that closely mirrored the visual description:
Create: To type an sms press the blue plus button
Attach: While tying the message you can tap the plus sign to attach a picture or a video.
Send: Tap the arrow icon to send the sms
Reply: To reply a message, tap on the message, then enter your response in the text box at the bottom and tap the arrow icon to send it.
An initial dialogue script was designed and tested with 5 participants. Conversations were recorded and dialogues manually transcribed. Transcriptions were compared with the original dialogue in an attempt to identify possible problematic dialogue units. The test enabled us to gather additional data in terms of language variability to further improve the system accuracy.
EXPERIMENT C
One of the distinct feature Singaporeans have in common is the accent. Accents are voice characteristics that designers can easily manipulate to create systems tailored on particular user groups.
Our hypothesis was that Singaporean customers would prefer to interact with a VUI having a local accent as opposed to a more standard accent, such as the British accent.
To verify our hypothesis, we performed an experiment with 59 participants who interacted with our voice user manual.
We used two voice settings: one system spoke with Singaporean accent (Singaporean Standard English – SSE) while the other spoke with British accent (British Standard English – BSE).
The content exchanged with the VUIs was identical. Similarly, the voices were identical in terms of quality, timbre and pitch as they were stemming from the same speaker: a voice talent who was trained to speak both Singapore and British accented English.
Participants, evaluated the VUIs in terms of voice quality, politeness, dialogue easiness and trustworthiness. Results showed that contrary to our expectation and despite identical information content, the British VUI was ranked higher in all categories than its Singaporean counterpart. The ranking was significant for politeness, voice quality and dialogue easiness.
The experiment showed us that the voice accent is a critical design issue for VUIs as it strongly affects users’ perceptions of other system features, such as dialogue structure or sound quality.
On the other hand, widely recognized stereotypes, such as ‘similarity attracts’ might be inconsistent in situations where cultural and psychological biases interfere: a deeper analysis into our results reveled that Singaporeans are very fond of British English (the language of the former colonial power) and tend to avoid Singlish in formal settings. As such, mindlessly designing interfaces conform to such stereotypes might be unjustified and even detrimental for the user satisfactions.
More details about this experiment can be found in:
“Impact of English regional accents on user acceptance of voice user interfaces”