This post combines sessions 7 and 8 of my course on Social Robot Design (2025/2026) into one write-up, structured around the in-depth tool design assignment.
In-Depth Tool Design: CardsAgainstRobots
8.1 Motivated direction choice
We chose scenario design, in the form of a card-based improvisation game called CardsAgainstRobots. Our case is about supporting student mental wellbeing with the Miro-E robot. Splitting actions into human cards, robot cards, and wildcards let us combine and recombine many small human-robot scenarios quickly, without writing out full interactions by hand, and gave the whole team a shared, playful way to explore ideas together instead of one person proposing suggestions alone.
The push toward this specific direction came from teacher feedback on an earlier version of the game. At that point the game only generated scenario ideas: pick some cards, imagine what would happen. The feedback was that this made the tool feel too much like a generative suggestion box. What was missing was the validation of the scenarios: after thinking up a scenario, something needed to actually happen to check whether it would work out that way in real life, for example through puppeteering or acting the scenario out. That is why we added the acting and puppeteering manual and the evaluation form on top of the card game itself, and why scenario design (interactive testing, involving the actual robot) fit our case better than treating this as a pure ideation exercise. We are not extending an existing course tool here; the course covered general design methods but not a concrete way to test many robot-behaviour scenarios against a real device, so we built our own.
8.2 Visualisation: conceptualisation and physicalisation
Conceptualisation
We iterated on the game rules before settling on a final version. We tried three set-ups:
- Pick random cards: draw a robot card and act it out with Miro-E, then draw a human card and act out the scenario, with a wildcard round to add complications. This lowered the threshold to start acting things out, but felt too random to compare outcomes.
- Pick a human action, then choose a matching robot action: draw one human card, lay out five robot cards, and discuss which robot response fits best, worst, and why. This let us rank and compare responses directly, so we kept this as our main format.
- Rejected option: one robot card plus five human cards. This did not work well because the human action naturally comes first in the timeline, so it made more sense to react from the robot side.
We also designed an evaluation form to score each acted-out scenario on eight criteria (effective, motivational, clear, kind, natural, empathic, predictable, trustworthy), each on a 1 to 5 scale, so that the ranking discussions would be grounded in something more concrete than gut feeling.
Physicalisation
We built the Miro-E puppeteering setup so we could act out scenarios live, controlling the robot by hand rather than through autonomous behaviour. We wrote a step-by-step instruction manual for both games, printed the cards, and folded them into a booklet together with the game guidelines, the evaluation guidelines, and the evaluation form itself (see the booklet layout below).

The evaluation form went through the same process, from a rough table sketch during session 7 to the printed version with rating scales shown below.

8.3 Test plan and method
We tested the game with the real Miro-E robot, puppeteered Wizard-of-Oz style through its controller, since this let us try out any card combination without needing the robot to be programmed for every possible action beforehand. Riek’s review of Wizard-of-Oz practice in HRI notes that puppeteering is a common and useful way to explore robot behaviour before committing to autonomous implementation1, which matches how we used it here.
The setup was: one team member left the room, another drew and enacted a random robot card with Miro-E, and a third person drew and enacted the matching human card, based on our first game mode. For the second game mode, we drew one human card and five robot cards, laid the robot cards face-up, and discussed as a group how the human would react to each one, then ranked them.
After each round we filled in the evaluation form based on what we had just observed, rating the scenario on the eight criteria. We ran multiple rounds across both games, repeating with wildcards to see how an added complication changed the ranking.
One thing changed as we went: we originally planned only the “pick one robot card, then react with several human cards” version, but dropped it once we noticed the human action logically had to come first. We replaced it with the “one human card, five robot options” version instead, which we kept.
8.4 Observations: tool in action
Puppeteering Miro-E worked well for quick improvisation. Once someone was holding the controller and cards were drawn, it took very little time to get from a scenario idea to seeing it played out with the real robot.
Some observations from the sessions:
- Wildcards sometimes changed a scenario meaningfully, but other times had no visible effect, or pushed the scene in a direction unrelated to the original cards. We treated “no effect” as a valid outcome in itself rather than a failed round, and simply redrew when a wildcard produced something unusable.
- Ranking the five robot responses against one human card, then discussing why the worst-fitting one failed, was more useful for generating design insight than just improvising freely. It forced us to articulate what made a response feel awkward or unkind rather than just noting that it did.
- Technically puppeteering Miro-E was a bit slow: it required a computer to be hooked up to the controller, which made it less suited for evaluating polished, final behaviour choices, even though it was fine for early exploration.
- The evaluation form worked well as a lightweight way to record scores right after each round, but we only had a limited number of printed copies, so people sometimes had to share.
8.5 Resulting redesign
To go from the ranked, evaluated scenarios back to actual design decisions, we used a requirements-matching step. We laid out a table of our design requirements against the scenarios we had tested, and went through it to cross off which requirements were actually met and which were not. We also ranked our requirements by importance beforehand, so that when a scenario satisfied one requirement but conflicted with another, we knew which one should win.
This is where the facial-expressiveness limitation became a real redesign input rather than just an observation: several highly-ranked human robot combinations were still capped in how kind or empathic they could score, because Miro-E could not show a clear emotional reaction (lack of degrees of freedom). Since expressive feedback ranked as an important requirement for the wellbeing case, this pushed us to explicitly treat facial/emotional expressiveness as a constraint to design around, rather than something we could puppeteer our way past (again, due to hardware limitations).
8.6 Evaluation and reflection
Design outcome. The tool gave us a more grounded picture of which Miro-E behaviours work for the student wellbeing case. Instead of guessing, we now have ranked evidence for which responses come across as kind, clear, and trustworthy, and a concrete limitation to design around (facial expressiveness). The poster below summarises the case, the tool, and this result.

Tool quality. The game is specific enough to be constraining: the card categories and evaluation criteria keep discussions focused rather than open-ended. It is usable by people outside our team, since the booklet contains full game guidelines, evaluation guidelines, and an evaluation form in one document, and the rules do not assume prior knowledge of Miro-E. Wölfel and Merritt’s survey of card-based design tools highlights that reusability and clear structure across purpose, methodology, and material qualities are what separates a usable card tool from a one-off exercise2, and we tried to design with that reusability in mind by keeping the card categories generic (human action, robot action, wildcard) rather than tied to Miro-E specifically.
The game is also fairly generalisable: any team designing a social robot’s response to a set of human actions could reuse the same three-category structure and evaluation form with different card content. What would need to change is the puppeteering step. It depended on Miro-E’s specific controller and a laptop connection, so another team would need to adapt that part to whatever robot or Wizard-of-Oz setup they have available. We would recommend the tool for early-stage exploration and comparison of robot responses, but not as the final evaluation step before deployment, since puppeteered behaviour is not the same as tested autonomous behaviour.
This was done with:
- Liz van Ginderen (s27349745)
- Anna Hornman (s3056600)
- Oyindrila Sen Gupta (s3697762)
- Sarah Mans (s2306379)
Riek, L. D. (2012). Wizard of Oz studies in HRI: A systematic review and new reporting guidelines. Journal of Human-Robot Interaction, 1(1), 119–136. https://doi.org/10.5898/JHRI.1.1.Riek ↩︎
Wölfel, C., & Merritt, T. (2013). Method card design dimensions: A survey of card-based design tools. In Human-Computer Interaction – INTERACT 2013 (pp. 479–486). Springer. https://doi.org/10.1007/978-3-642-40483-2_34 ↩︎