Shihui Ruan
Research-first project Master’s thesis · Georgia Tech Aug’20 – Apr’21

Voice Interface
Design for
Automobile

This is a research project, not a product launch. The goal was to understand how voice interfaces for cars should be designed — by studying 55 real-world cases, talking to drivers and industry practitioners, and synthesizing it all into frameworks that the field could actually use.

Role

UX Researcher & Designer

Format

Individual · Academic Research

Instructor

Dr. Wei Wang, Georgia Tech

Research before design.
Always.

Voice interaction in cars is new territory. When I started this project, there was no established design framework for it — only scattered industry implementations (Siri, Alexa Auto, Google Assistant) with generally low user satisfaction.

Instead of jumping to solutions, I spent the first half of the project deeply understanding the problem space: what’s out there, what fails, what drivers actually need, and what the industry wants from voice AI.

The “design” output isn’t a polished interface. It’s two research-derived toolkits that give voice designers a structured way to make decisions — about personality, use cases, and cognitive load. That’s the contribution.

55

real-world VUI cases analyzed

4

research methods deployed

2.2/5

avg. satisfaction with current voice assistants

3

attention levels mapped in taxonomy

Drivers are frustrated.
And they have a point.

A preliminary survey surfaced a consistent pattern: people use voice assistants in cars because they have to — hands-free laws, road safety. But satisfaction averaged just 2.2 out of 5. Why?

01

It doesn’t understand you

Users frequently said their input “was hard to understand by the device” — especially with accents, background noise, or natural speech patterns.

02

Recovery is broken

When the system misunderstands you, recovering is difficult — users often have to start the entire conversation over. No graceful fallback, no partial recovery.

03

It creates new safety problems

Intended to reduce distraction, voice assistants often force users to look at the screen to verify commands. The solution became the problem.

04

Touch is still faster

For simple tasks, users said touch screens were faster than voice. The assistant wasn’t worth the cognitive overhead of activating it and rephrasing commands until they worked.

Voice assistants currently do not fully understand natural language, and do not fully utilize the advantages of voice interaction.
— Key finding from preliminary survey synthesis

Four methods to understand
one deeply human problem.

Each method answered a different question. Together they built a complete picture of what voice interaction in cars actually needs to be.

01 — Desktop Research & Taxonomy

55 cases. Every input and output method mapped.

I started with a thorough literature review: how does voice interaction work across different in-car systems? What are the HMI components? How do attention requirements vary by task?

The review led to a taxonomy of 55 real case studies — categorizing them by HMI type, input method, output method, cognitive load level (focused / peripheral / implicit), and task type (driving, alerts, navigation, infotainment).

The most important insight: different voice interaction modes correspond to different levels of driver attention. This became the organizing principle for the entire project.

Taxonomy table categorizing 55 in-vehicle VUI cases by HMI type, input/output method, attention level, and task type
Add: vui-taxonomy-table.jpg Taxonomy table — HMI × Attention × Input × Output
Full taxonomy of 55 in-vehicle VUI cases — the organizing framework for the entire project
Competitive analysis dot chart comparing voice interaction features across 4 system types
Add: vui-competitive.jpg Competitive analysis dot chart (4 system types)
Voice interaction features across system types — competitive benchmarking
Diagram illustrating the attention level model: focused, peripheral, and implicit interaction
Add: vui-attention-model.jpg Attention level model diagram
The attention level model — key conceptual output from taxonomy research

02 — Preliminary Survey

Current voice assistants score 2.2 out of 5.

I surveyed current drivers about their experiences with in-car voice assistants: Siri, Alexa Auto, Google Assistant, and built-in systems. The results were clear and consistent.

  • Generally low satisfaction — 2.2 out of 5 average
  • User input is frequently misunderstood by the device
  • Recovering from errors means starting the entire conversation over
  • Safety concern: users still look at the screen to verify commands
  • Touch is faster for simple tasks — voice felt like overhead

The data confirmed the opportunity: these systems do try to help with hands-free interaction, but they fail in the specifics — understanding natural language, recovering from errors, and knowing when not to require active attention.

Survey questions table about in-car voice assistant satisfaction and pain points
Add: vui-survey.jpg Survey questions & results table
Survey instrument — questions about satisfaction, pain points, and usage patterns

03 — In-Depth Stakeholder Interview

Three things a manufacturer expects from voice AI.

I interviewed a voice interaction designer at an automotive manufacturer to understand what “success” looks like from the industry side — not just from users. Their expectations shaped what the design toolkits needed to address.

1

Understand vague instructions

The assistant should parse natural, incomplete language — the way a human would fill in the gaps from context. Not require rigid command syntax.

2

Be one step ahead

Anticipate what the user will need next based on context (current location, time, calendar, previous behavior) rather than waiting to be asked.

3

Feel like a human being

Natural dialogue rhythm, personality, the ability to handle ambiguity gracefully. Not robotic command-response patterns.

Research methods process diagram showing sequence from semi-structured interview to Wizard of Oz
Add: vui-research-methods.jpg Research methods sequence diagram
Research method sequence — interview, co-design, focus group, Wizard of Oz evaluation

04 — Contextual Inquiry

Three drivers. Observed in real conditions.

I conducted contextual inquiries with three drivers in realistic driving environments — sitting in the back seat, observing behavior and asking questions during the drive. This method was chosen because voice interaction behavior changes significantly in context: people use different vocabulary, shorten commands, and respond differently to failures when they’re also managing the road.

The contextual inquiry grounded the taxonomy in real behavior, and revealed patterns the survey couldn’t: which failure modes caused genuine frustration vs. mild annoyance, which tasks drivers tried voice for even when they expected failure, and how they developed workarounds.

Researcher observing driver during contextual inquiry session in a moving car
Add: vui-contextual-1.jpg Contextual inquiry session photo (researcher in back seat)
Contextual inquiry session — observing driver behavior in a realistic driving environment
Session 1 contextual inquiry observation worksheet
Add: vui-session-notes-1.jpg
Session 1 notes
Session 2 contextual inquiry observation worksheet
Add: vui-session-notes-2.jpg
Session 2 notes
Session 3 contextual inquiry observation worksheet
Add: vui-session-notes-3.jpg
Session 3 notes

Not an interface.
Two frameworks for making one.

In research projects, the design output isn’t always a screen. Here, it’s two toolkits — structured frameworks that give voice designers a principled way to make decisions that the research revealed were being made arbitrarily or not at all.

Toolkit 01

Personality Framework

Derived from the Big Five personality model (OCEAN), this toolkit gives designers a vocabulary and decision structure for defining the voice assistant’s personality — because the research showed this was a major differentiator in user trust and satisfaction.

Instead of making personality decisions intuitively or inconsistently, designers can use the framework to deliberately choose traits and trace them through the interaction design: how the assistant handles ambiguity, failure, and proactive suggestions.

Personality framework grid organized by the Big Five model: Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism
Add: vui-toolkit1-personality.jpg Big Five personality trait grid (color-coded)
Toolkit 1 — Personality framework based on the Big Five model. Designers choose traits and trace them through interaction decisions.

Toolkit 02 — Use Case Scenario Cards

The second toolkit is a set of use case scenario cards covering the key interaction contexts in a car: music recommendation, route recommendation, advanced fork/turn warnings, entertainment, and more.

Each card defines: what the driver might say, what the assistant should say, and the alternative/fallback interaction. This gives designers a systematic starting point for conversation design rather than making each scenario from scratch.

Use case scenario cards for VUI design: music recommendation, routes, warnings, entertainment, and blank templates
Add: vui-toolkit2-cards.jpg Use case scenario card grid
Toolkit 2 — Scenario cards defining driver utterances, assistant responses, and alternative paths for each use case

What this project taught me
about research.

01

Research is a design artifact

Toolkits, frameworks, and decision structures are design outputs — they just have a different user. In this case, the users are other designers. Building something useful for them required the same user-centered thinking as building for end users.

02

Context changes everything in in-car UX

Survey data told one story. The contextual inquiry told another. Drivers rationalize their behavior after the fact — but in the moment, frustration is immediate and recovery strategies are improvised. You have to be in the car to see it.

03

The attention model opened more questions than it closed

The focused / peripheral / implicit attention framework is a useful starting point, but applying it rigorously would require longitudinal studies with more participants across more driving conditions. This project sketched the map; filling it in is future work.