Critical Play with Large Language Models

Thursday, October 16

Facilitated by Zach Muhlbauer

Teaching and Learning Center | Interactive Technology & Pedagogy Lab

1 min

Welcome and introductions

10 min

Please share with the group:

Name and pronouns
Field of study or work
A favorite game you play with family, friends, or students

Game mechanics as analytical scaffolds

10 min

Game mechanics serve as analytical scaffolds that reveal AI limitations in situ.

Example: When Chess.com blogger "Nightly-Knight" played against ChatGPT, it repeatedly made illegal moves—including attempting to move a pawn horizontally to capture (pawns can only capture diagonally). ChatGPT "forgets the position of the game" and makes moves that violate basic chess rules rather than accepting disadvantageous positions.

Jeopardy! Board Emulator

This interactive Jeopardy! emulator allows you to input custom categories and witness the LLM generate clues in real-time, exposing how it handles knowledge at different difficulty levels and revealing confabulation patterns when pushed beyond its training data.

The format requires the LLM to generate:

Coherent categories
Sliding difficulty levels
Factual clues

This makes confabulation immediately visible when it fails.

Prompting AI-generated Jeopardy! board

Navigate to Jeopardy!

We'll test three category types to progressively stress the model's knowledge boundaries:

Simple category (accurate baseline)
Obscure real category (mixed results)
Fictitious category (confabulation trigger)

Debrief discussion

Respond in chat:

What did you notice from the Jeopardy board demonstration? What was most striking?

What games offer similar affordances in their ability to expose a large language model for the bullshit machine that it is?

Critical play with Mary Flanagan

Mary Flanagan's "critical play" uses game design to challenge conventions and reveal hidden systems. We apply this iterative design model to expose AI limitations through playful constraints.

Traditional iterative game design model:

Set a design goal
Develop rules
Develop playable prototype
Playtest
Revise goal
REPEAT

Playful interactions with LLMs

Interrogating AI: Characterizing emergent playful interactions with ChatGPT (via r/ChatGPT)

Type	What it does	Try this
Reflecting	Prompting AI to self-represent and express "opinions"	Ask about self-understanding
Jesting	Generating humor and nonsensical exchanges	Request absurd combinations
Imitating	Requesting persona or character mimicry	Ask it to role-play
Challenging	Testing capabilities until failure	Push logical limits
Tricking	Attempting deception/boundary bypassing	Try jailbreak techniques
Contriving	Creating impossible or fabricated content	Request non-existent things

Quick demo session

10 min

LLMs generate responses through vector similarity: finding statistically likely associations from training data. When given "teacher," the model maps to nearby concepts in semantic space: "classroom," "student," "education."

But what happens when we constrain these associations through game rules? Can we force semantic breakdown by limiting the model's ability to select from its most probable outputs?

Exquisite Corpse system prompt

"You are participating in a game of Exquisite Corpse. Respond only to the user's most recent word. Reply with exactly one word per turn, with no punctuation or commentary, and label your response in sequence (Turn 2, Turn 3, etc.). Continue this pattern until Turn 20. When that occurs, stop producing numbered turns and compose a short poem using only the words that appeared in this conversation: nothing added, removed, or borrowed from elsewhere. Arrange them with line breaks and spacing to create a cohesive poem, then ask if the user would like a close reading of it."

https://openwebui.cuny.qzz.io/

Critical design activity

15 min

Use this worksheet to design a game that reveals AI limitations using 2-3 types of playful interactions

Choose 2-3 game formats that might be interesting to combine or explore:

20 Questions
Exquisite Corpse
Two Truths and a Lie
Word Association
Trivia/Quiz Games
Riddles/Puzzles

Chess/Game Annotation
Role Play/Improv
Debate/Argument
Mad Libs
Other: _______

Target AI limitations

Select which AI weakness(es) you want to expose:

Hallucination/confabulation
Logic inconsistency/reasoning failures
Sycophancy (excessive agreement)
Instruction following failures
Semantic breakdown
Other: _______

Example: Sycophancy - Tell the AI an obviously false "fact" and ask it to explain why it's true. AI often agrees with incorrect user statements rather than challenging them.

Craft your prompts

System prompt: Configure the AI's behavior and constraints

Example: "We're playing Mad Libs. I'll give you a sentence with blanks labeled with parts of speech. For each blank, provide two options: (1) a probable word that fits the context, and (2) a statistically improbable word that still matches the part of speech but disrupts semantic coherence. Label them clearly."

User prompt: Your first message to start the game

Example: "The [adjective] teacher walked into the [noun] and began to [verb]."

Expected outcomes

What do you want to reveal about AI abilities/limitations?

What do you predict will happen? What failure modes might emerge? How will game mechanics make limitations visible?

Shareback and playtest

15 min

Navigate to the Open WebUI demo site and sign up: https://openwebui.cuny.qzz.io/

Test your game design:

Select a model from the dropdown menu at the top
Click the configuration icon to open the settings panel
Input your system prompt in the "System Prompt" field
Adjust optional settings (temperature, max tokens) if desired
Begin with your conversation starter in the main chat

Share what you discover about the AI's limitations

Resources and Q&A

Interactive Tools

Critical Play with LLMs - Interactive slideshow presentation for ITP Lab
GitHub Repository - Source code and materials for this workshop
Jeopardy LM Demo - Interactive Jeopardy emulator for testing LLM knowledge boundaries
Open WebUI (CUNY) - Platform for game design and playtest demonstrations

Research & Citations

Flanagan, M. (2009). Critical Play: Radical Game Design. MIT Press.
Petridis, S., Bazhydai, M., Kinzler, K. D., & Ahl, R. E. (2023). Interrogating AI: Characterizing Emergent Playful Interactions with ChatGPT. CHI EA '23: Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems.
Palisade Research (2025). Playing chess against a stronger opponent can trigger frontier AI agents to cheat. TIME Magazine. Article
Nightly-Knight (Chess.com). Playing chess against ChatGPT | It is a cheater! Blog post
Acher, M. (2024). Debunking the Chessboard: Confronting GPTs Against Chess Engines. Research blog
r/ChatGPT community discussions on playful AI interactions