Critical Play with Large Language Models

Thursday, October 16

Facilitated by Zach Muhlbauer

Teaching and Learning Center | Interactive Technology & Pedagogy Lab

1 min

Welcome and introductions

10 min

Please share with the group:

  • Name and pronouns
  • Field of study or work
  • A favorite game you play with family, friends, or students

Game mechanics as analytical scaffolds

10 min

Game mechanics serve as analytical scaffolds that reveal AI limitations in situ.

Example: When Chess.com blogger "Nightly-Knight" played against ChatGPT, it repeatedly made illegal moves—including attempting to move a pawn horizontally to capture (pawns can only capture diagonally). ChatGPT "forgets the position of the game" and makes moves that violate basic chess rules rather than accepting disadvantageous positions.

Jeopardy! Board Emulator

This interactive Jeopardy! emulator allows you to input custom categories and witness the LLM generate clues in real-time, exposing how it handles knowledge at different difficulty levels and revealing confabulation patterns when pushed beyond its training data.

The format requires the LLM to generate:

  • Coherent categories
  • Sliding difficulty levels
  • Factual clues

This makes confabulation immediately visible when it fails.

Prompting AI-generated Jeopardy! board

Navigate to Jeopardy!

We'll test three category types to progressively stress the model's knowledge boundaries:

  • Simple category (accurate baseline)
  • Obscure real category (mixed results)
  • Fictitious category (confabulation trigger)

Debrief discussion

Respond in chat:

What did you notice from the Jeopardy board demonstration? What was most striking?

What games offer similar affordances in their ability to expose a large language model for the bullshit machine that it is?

Jeopardy board interface

Critical play with Mary Flanagan

Mary Flanagan's "critical play" uses game design to challenge conventions and reveal hidden systems. We apply this iterative design model to expose AI limitations through playful constraints.

Traditional iterative game design model:

Iterative game design model
  • Set a design goal
  • Develop rules
  • Develop playable prototype
  • Playtest
  • Revise goal
  • REPEAT

Playful interactions with LLMs

Interrogating AI: Characterizing emergent playful interactions with ChatGPT (via r/ChatGPT)

Type What it does Try this
Reflecting Prompting AI to self-represent and express "opinions" Ask about self-understanding
Jesting Generating humor and nonsensical exchanges Request absurd combinations
Imitating Requesting persona or character mimicry Ask it to role-play
Challenging Testing capabilities until failure Push logical limits
Tricking Attempting deception/boundary bypassing Try jailbreak techniques
Contriving Creating impossible or fabricated content Request non-existent things

Quick demo session

10 min

LLMs generate responses through vector similarity: finding statistically likely associations from training data. When given "teacher," the model maps to nearby concepts in semantic space: "classroom," "student," "education."

But what happens when we constrain these associations through game rules? Can we force semantic breakdown by limiting the model's ability to select from its most probable outputs?

Exquisite Corpse system prompt

"You are participating in a game of Exquisite Corpse. Respond only to the user's most recent word. Reply with exactly one word per turn, with no punctuation or commentary, and label your response in sequence (Turn 2, Turn 3, etc.). Continue this pattern until Turn 20. When that occurs, stop producing numbered turns and compose a short poem using only the words that appeared in this conversation: nothing added, removed, or borrowed from elsewhere. Arrange them with line breaks and spacing to create a cohesive poem, then ask if the user would like a close reading of it."

https://openwebui.cuny.qzz.io/

Critical design activity

15 min

Use this worksheet to design a game that reveals AI limitations using 2-3 types of playful interactions

Choose 2-3 game formats that might be interesting to combine or explore:

  • 20 Questions
  • Exquisite Corpse
  • Two Truths and a Lie
  • Word Association
  • Trivia/Quiz Games
  • Riddles/Puzzles
  • Chess/Game Annotation
  • Role Play/Improv
  • Debate/Argument
  • Mad Libs
  • Other: _______

Target AI limitations

Select which AI weakness(es) you want to expose:

  • Hallucination/confabulation
  • Logic inconsistency/reasoning failures
  • Sycophancy (excessive agreement)
  • Instruction following failures
  • Semantic breakdown
  • Other: _______

Example: Sycophancy - Tell the AI an obviously false "fact" and ask it to explain why it's true. AI often agrees with incorrect user statements rather than challenging them.

Craft your prompts

System prompt: Configure the AI's behavior and constraints

Example: "We're playing Mad Libs. I'll give you a sentence with blanks labeled with parts of speech. For each blank, provide two options: (1) a probable word that fits the context, and (2) a statistically improbable word that still matches the part of speech but disrupts semantic coherence. Label them clearly."

User prompt: Your first message to start the game

Example: "The [adjective] teacher walked into the [noun] and began to [verb]."

Expected outcomes

What do you want to reveal about AI abilities/limitations?

What do you predict will happen? What failure modes might emerge? How will game mechanics make limitations visible?

Shareback and playtest

15 min

Navigate to the Open WebUI demo site and sign up: https://openwebui.cuny.qzz.io/

Test your game design:

  • Select a model from the dropdown menu at the top
  • Click the configuration icon to open the settings panel
  • Input your system prompt in the "System Prompt" field
  • Adjust optional settings (temperature, max tokens) if desired
  • Begin with your conversation starter in the main chat

Share what you discover about the AI's limitations

Resources and Q&A

Interactive Tools

Research & Citations

  • Flanagan, M. (2009). Critical Play: Radical Game Design. MIT Press.
  • Petridis, S., Bazhydai, M., Kinzler, K. D., & Ahl, R. E. (2023). Interrogating AI: Characterizing Emergent Playful Interactions with ChatGPT. CHI EA '23: Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems.
  • Palisade Research (2025). Playing chess against a stronger opponent can trigger frontier AI agents to cheat. TIME Magazine. Article
  • Nightly-Knight (Chess.com). Playing chess against ChatGPT | It is a cheater! Blog post
  • Acher, M. (2024). Debunking the Chessboard: Confronting GPTs Against Chess Engines. Research blog
  • r/ChatGPT community discussions on playful AI interactions