Midterm – Pitchy Bird

following my initial progress on a backend-backed idea, I faced challenges managing the complexities of API usage and LM output, and thus switched to another idea briefly mentioned at the start of my last documentation

Core Concept

For my midterm project, I wanted to explore how a fundamental change in user input could completely transform a familiar experience. I landed on the idea of taking a game—Flappy Bird, known for its simple tap-based mechanics, and re-imagining it with voice control. Instead of tapping a button, the player controls the bird’s height by changing the pitch of their voice. Singing a high note makes the bird fly high, and a low note brings it down.

The goal was to create something both intuitive and novel. Using your voice as a controller is a personal and expressive form of interaction. I hoped this would turn the game from a test of reflexes into a more playful—and potentially, honestly, sillier challenge.

How It Works (and What I’m Proud Of)

At its core, the project uses the p5.js library for all the visuals and game logic, combined with the ml5.js library to handle pitch detection. When the game starts, the browser’s microphone listens for my voice. The ml5.js pitchDetection model (surprisingly lightweight) analyzes the audio stream in real-time and spits out a frequency value in Hertz. My code then takes that frequency and maps it to a vertical position on the game canvas. A higher frequency means a lower Y-coordinate, sending the bird soaring upwards.

click here to access the game

I’m particularly proud of a few key decisions I made that really improved the game feel.

First was the dynamic calibration. Before the game starts, it asks you to be quiet for a moment to measure the ambient background noise, which is measured set a volume threshold, so the game doesn’t react to the hum of a fan or distant chatter. Then, it has you sing your lowest and highest comfortable notes. This personalizes the control scheme for every player, adapting to their unique vocal range, which could be an important design choice for a voice-controlled game.

Another technical decision I’m happy with was implementing a smoothing algorithm for the pitch input. Early on, the bird was incredibly jittery because the pitch detection is so sensitive. To fix this, I stored the last five frequency readings in an array and used their average to position the bird. This filtered out the noise and made the bird’s movement feel much more fluid and intentional. Finally, instead of making the bird fall like a rock when you stop singing, I gave it a gentle downward drift. This “breath break” mechanic makes the game feel like air and acknowledges the physical reality of needing to breathe, which was a small but important game design tweak.

Challenges and Future

My biggest technical obstacle was a recurring bug where the game would crash on replay. It took a lot of console-logging and head-scratching, but it ultimately turned out that stopping and restarting the microphone doesn’t work the way I’d thought. The audio stream becomes invalid after microphone stops, and I couldn’t reuse it. The solution was to completely discard the old microphone object and create a brand new one every time a new game starts.

In addition, there are definitely areas I’d love to improve. The calibration process, while functional, is still based on setTimeout, which can be sort of rigid. A more interactive approach, where the player clicks to confirm their high and low notes, would be a much better user experience. Additionally, the game currently only responds to pitch. It might be fascinating to incorporate volume as another control dimension—perhaps making the bird dash forward or shrink to fit through tight gaps if you sing louder. 

A more ambitious improvement would be to design the game in a way that encourages the player to sing unconsciously. Right now, you’re very aware that you’re just “controlling” the bird in another way. But what if the game’s pipe gaps prompt you to utter a simple melody? The pipes could be timed to appear at moments that correspond to the melody’s high and low notes. This might subtly prompt the player to hum along with the music, and in doing so, they would be controlling the bird without even thinking.

Week 5 – Midterm Progress

After three days of painstaking brainstorming for my midterm, I came up with two directions: one was a game-like networking tool to help people start conversations, and the other was a version of Flappy Bird controlled by the pitch of your voice.

I was undoubtedly fascinated by both, but as I thought more it, I wanted to explore more with generative AI. Therefore, I combined the personal, identity-driven aspect of the networking tool with a novel technical element.

The Concept

“Synthcestry” is a short, narrative experience that explores the idea of heritage. The user starts by inputting a few key details about themselves: a region of origin, their gender, and their age. Then, they take a photo of themselves with their webcam.

From there, through a series of text prompts, the user is guided through a visual transformation. Their own face slowly and smoothly transitions into a composite, AI-generated face that represents the “archetype” of their chosen heritage.

Designing the Interaction and Code

The user’s journey is the core of the interaction design, as I already came across game state design in class. I broke the game down into distinct states, which becomes the foundation of my code structure:

  1. Start: A simple, clean title screen to set the mood.
  2. Input: The user provides their details. I decided against complex UI elements and opted for simple, custom-drawn text boxes and buttons for a more cohesive aesthetic. The user can type their region and gender, and select an age from a few options.
  3. Capture: The webcam feed is activated, allowing the user to frame their face and capture a still image with a click.
  4. Journey: This is the main event. The user presses the spacebar to advance through 5 steps. The first step shows their own photo, and each subsequent press transitions the image further towards the final archetype, accompanied by a line of narrative text.
  5. End: The final archetype image is displayed, offering a moment of finality before the user can choose to start again.

My code is built around a gameState variable, which controls which drawing function is called in the main draw() loop. This keeps everything clean and organized. I have separate functions like drawInputScreen() and drawJourneyScreen(), and event handlers like mousePressed() and keyPressed() that behave differently depending on the current gameState. This state-machine approach is crucial for managing the flow of the experience.

The Most Frightening Part

The biggest uncertainty in this project was the visual transition itself. How could I create a smooth, believable transformation from any user’s face to a generic archetype?

To minimize the risk, I engineered a detailed prompt that instructs the AI to create a 4-frame “sprite sheet.” This sheet shows a single face transitioning from a neutral, mixed-ethnicity starting point to a final, distinct archetype representing a specific region, gender, and age.

To test this critical algorithm, I wrote the startGeneration() and cropFrames() functions in my sketch. startGeneration() builds the asset key and uses loadImage() to fetch the correct file. The callback function then triggers cropFrames(), which uses p5.Image.get() to slice the sprite sheet into an array of individual frame images. The program isn’t fully functional yet, but you can see the functions in the code base.

As for the use of image assets, I had two choices. One is to use a live AI API generation call; the other is to have a pre-built asset library. The latter would be easier and less prone to errors, I agree; but given the abundance of nationalities on campus, I would have no choice but to use a live API call. It is to be figured out next week.

 

Week 5 – Reading Response

After reading Golan Levin’s “Computer Vision for Artists and Designers,” I’m left with a deep appreciation for the creativity that arose from confronting technical limitations. The article pulls back the curtain on interactive art, revealing that its magic often lies in a clever and resourceful dialogue between the physical and digital worlds, not in lines of complex code. Apparently, the most effective way to help a computer “see” is often to change the environment, not just the algorithm.

Levin shows that simple, elegant techniques like frame differencing or brightness thresholding can be the building blocks for powerful experiences, in contrast to my preexisting thought for a powerful CV system. The LimboTime game, conceived and built in a single afternoon by novice programmers who found a large white sheet of Foamcore, pushed the change in my perspective. They didn’t need a sophisticated algorithm; they just needed a high-contrast background. It suggests that creativity in this field is as much about physical problem-solving as it is about writing code. It’s a reminder that we don’t live in a purely digital world, and that the most compelling art often emerges from the messy, inventive bridge between the two.

The article also forced me to reflect on the dual nature of this technology. On one hand, computer vision allows for the kind of playful, unencumbered interaction that Myron Krueger pioneered with Videoplace back in the 1970s. His work was a call to use our entire bodies to interact with machines, breaking free from the keyboard and mouse. In the past or now, it is always joyful that our physical presence can draw, play, and connect with a digital space in an intuitive way.

On the other hand, the article doesn’t shy away from the darker implications of a machine that watches. The very act of “tracking” is a form of surveillance. Artists like David Rokeby and Rafael Lozano-Hemmer confront this directly. Lozano-Hemmer’s Standards and Double Standards, in particular, creates an “absent crowd” of robotic belts that watch the viewer, leaving a potent impression that I would not have expected from visual technology in the early 2000s.

Ultimately, this reading has shifted my perspective. I see now that computer vision in art isn’t just a technical tool for creating interactive effects. It is a medium for exploring what it means to see, to be seen, and to be categorized. The most profound works discussed don’t just use the technology; they actively raise questions about the technology. They leverage its ability to create connection while simultaneously critiquing its capacity for control. I further believe that true innovation often comes from embracing constraints, and that the most important conversations about technology could best be articulated through art.

Week 4 – Reading Response

One thing that drives me crazy is the overcomplicated interface of most modern microwaves. Even as someone who’s pretty tech-savvy, I groan every time I use a new one. 10+ buttons are there for “defrost poultry,” “bake potato,” “reheat pizza,” and more, when all I usually need is to heat up leftovers. Half the time when I use a new microwave, I end up pressing random buttons wasting an equal amount of time to set a basic 2-minute timer. It feels like designers prioritize “showing off” features over usability, exactly as Norman warns. They cram functions to make the microwave seem advanced, but they forget the core user need. Simplicity.

This frustration ties directly to Norman’s principles. The microwaves lack good signifiers, as there’s no clear visual cue (like a large, labeled “Quick Heat” button) to guide basic use. The mapping is muddled, too, as why is “Popcorn” next to “Sensor Cook” when most users reach for quick heating first? They are inherently all part of audience “affordances”—I don’t like this word, as we could simply call it design “friendliness”—which the author argues, sometimes contradicts with the designers’ desire to “show off”.

I agree to a certain extent. On the other hand, however, a designer IS capable of prioritizing human-centered design as their end goal—think of Steves Jobs or a random Apple designer—it is also a process to perfection to excel in design friendliness. How about this simple, intuitive microwave you might have seen: a large digital dial for time (natural mapping) and one “Start” button, plus a small “More Functions” menu for niche uses. This keeps discoverability high, because even a first-time user would know to twist the dial, and reserves extra features for those who need them, without cluttering the experience.

Design doesn’t have to choose between “impressive” and “usable.” Apple proves this by making complex tech feel intuitive, and microwaves could do the same, once designers focused on what users actually do instead of what they think looks good.

Week 4 – Torrent of Transience

For this week’s assignment, I wanted to take the NYC Leading Causes of Death data and turn it into generative text art on p5.js. Initially, I was torn between a “Data Rain” concept where statistics fall like a cascade, and an “Unraveling Text” idea where words literally dissolve, mirroring entropy. 

After some back-and-forth, I settled on combining the two into what I’m calling “Torrent of Transience”. The core idea is a continuous stream of disease names falling from the top of the screen. But it’s not just falling. Each word is actively dissolving, blurring, and fading as it descends, vanishing before it even reaches the bottom. It’s meant to evoke a waterfall of ink words, where the ink itself is dissolving as it flows.

The challenge was mapping the data in a way that felt intuitive and impactful. I decided that the Deaths count for each cause would determine the textSize – larger words for more fatalities, making their presence felt more strongly. The Age Adjusted Death Rate also became useful, as it controls both how fast the word falls and how quickly it dissolves. So, a cause with a high death rate will rush down the screen and disappear rapidly, a stark visual metaphor for its devastating impact.

I also made sure to clean up the data. Those ICD-10 codes in the Leading Cause column were a mouthful, so I’m stripping them out, leaving just the disease name for clarity. And I’m filtering for valid Deaths and Death Rate entries, because a null isn’t going to map to anything meaningful.

For the “unraveling” effect, I knew textToPoints() on every single particle would crash the sketch. My solution was a bit of a cheat, but effective: I draw each word a few times, with slight random offsets, and increase that offset as the word fades. This creates a ghosting, blurring effect that visually implies dissolution. Coupled with a semi-transparent background, it gives the impression of words literally melting into the ether.

Right now, the dataSample is a curated (i.e., selected) list of causes to get the demo running smoothly. If this were a full-blown project, I’d implement a way to dynamically load and parse the entire CSV, allowing the user to select a year and see a completely different torrent. That’s a future enhancement, but for now, the sample gives a good impression of the dynamic effect.

Week 3 – Exquisite Candidate

Inspiration

I found myself thinking about the current state of political discourse—how it often feels chaotic, random, and almost nonsensical. Arguments and personas could go totally random, as if different parts have been stitched together to form a strange new whole.

This immediately brought to mind ancient myths, like the 人头马身 (the centaur), a creature with the head and torso of a human and the body of a horse. This became my core visual metaphor: what if I could create political “centaurs”? I could randomly pair the heads of recognizable political figures with symbolic, abstract bodies to represent the absurdity and randomness of political rhetoric.

The project needed a name that captured this idea. I was inspired by the Surrealist parlor game, “Exquisite Corpse,” where artists collaboratively draw a figure without seeing the other sections. My program does something similar, but with political figures, or “candidates.” The name clicked almost instantly: Exquisite Candidate.

Description

Exquisite Candidate is an interactive artwork that explores the chaotic nature of political identity. By clicking the mouse, the viewer generates a new “candidate”—a hybrid figure composed of a randomly selected head and a randomly selected body.

The heads are abstract but recognizable vector drawings of political figures. The bodies are symbolic and thematic, representing concepts like power (“suit”), vulnerability (“stripped_down”), foolishness (“sheep”), or emotional immaturity (“baby with tears”). The resulting combinations are surprisingly (at least for me the creator) humorous or poignant, creating a visual commentary on the fragmented and performative nature of public personas. To bring these abstract figures to life, Gemini helped me generate part of the many vector-based drawing functions for the assets.

Code

The program is built on an Object-Oriented structure with three main classes: HeadBody, and Creature. This keeps the code clean, organized, and easy to expand.

A challenge I encountered was with the “baby with tears” body. My initial design was simple: the Body object would draw itself, and the Head object would draw itself. But the tears needed to be drawn on the face, which is part of the Head object. How could the Body object know where the head was going to be drawn? Unfortunately, until submission, I haven’t figured out how to implement this successfully.

Week 3 – Reading Response

The author differentiates “tools” for utility, such as the hypothetical Nintendo fridge, from something “fun” and “interactive”. Thus, he raises a question, “Is interactivity utterly subjective?”, only to discuss the process of interactivity as a subjective flow. In particular, he argues that the thinking that spurs creativity–and a certain level of randomness as we discussed last week–is a crucial element of interactivity.

I agree that the thinking for creative responses is the most important part. In the past, even now, some low- and mid-level interactive creations, as the author would categorize, are solely dependent on a set of rules that only attempt to be generative. Their output doesn’t serve a meaning itself, only reflecting part of a bigger scene defined by the rule setter. Ideally, however, every output should prompt some further thinking in the receiver of the response, or the originator of the conversation. It is indeed fairly difficult to achieve so, particularly in the past.

The advent of Generative AI could bring some change, especially when it’s seemingly untouched in the interactive visual art sphere. What if, I say what if, some code is written in real-time, following a certain broader set of rules? What if, in addition to a set of rules, some new impromptu visual rules are created in real time?

Week 2 – Looped

This week we’re asked to create an artwork that incorporates the “loop” concept in code. I saw dynamic squares before, and would personally like to create a grid that gently “breathes” and drifts. Each square’s size, brightness are driven by layered sine waves using a shared time phase so the whole field feels organic and connected, like a low‑key pixel ocean.

Below is the code for core motion + styling logic (the vibe engine).

const w = sin(phase + (x * 0.35) + (y * 0.45));  // wave seed
const s = map(w, -1, 1, cell * 0.25, cell * 0.85);  // size pulse
const dx = sin(phase * 0.7 + x * 0.3) * 6;
const dy = cos(phase * 0.7 + y * 0.3) * 6;
const hueVal = (x * 8 + y * 6 + frameCount * 0.4) % 360;
const bri = map(w, -1, 1, 35, 90);
fill(hueVal, 60, bri, 0.9);
rect(px, py, s, s, 6);
  • What works: Simple math, no arrays or heavy state—scales nicely with window size. The motion feels smooth and unified.
  • Limitations: All squares animate uniformly; interaction is missing. No colors. (In a version, colors follow a fixed formula, so longer viewing gets predictable.)

To be frank, this implementation still lacks the smooth “sea wave” vibe that I was looking for. In particular, I would have liked the edges to transform into non-linear lines like waves. But I would call this a first prototype as a p5.js beginner.

However, I trialed for smaller square size, and I’m surprised that such a minor change created something perceptually different. Finally, I implemented a super cool mouse click effect, which in a way achieved another level of dynamic aesthetics.

Week 2 – Reading Reflection

In the video, Casey Reas starts with the age-old tension between order and chaos. He explains, “chaos is what existed before creation, and order is brought about by god or gods into the world.” For centuries, creation was a divine act of imposing regularity on a chaotic world. As humans ourselves, we sought to illuminate our “godness” through patterns and symmetry.

On the contrary, the early 20th-century Dadaists inverted this relationship. Against a “new” world era confined by scientific laws and societal logic (which had potentially led to the chaos of war), they embraced chance as fundamentally human to take apart what they saw as “the reasonable frauds of men.” The entire focal point of their “chance operations” is to set up the artwork where chance creates beauty in chaos. Artists like Jean Arp and Marcel Duchamp used chance operations not to create chaos, but to rebel against a rigid order they no longer trusted and escape the confines of their own preconceptions, creating something truly unexpected.

Whereas this embrace of randomness, or unexpectation to the human eyes, is not a complete surrender to chaos. The rules–much like the physical laws of the nature–are secretly flowing under. As Reas’s own work demonstrates, his generative systems show that a small amount of “noise” is essential to prevent a static homogeneity. More importantly, why do the simple, inorganic rules create such sophisticated spectacle? I explored the dynamic, emergent complexity–the assembly of the crowd–in the course Robota Psyche.

My presentation, “Power of the Mass”, discussed how simple, inorganic rules governing a crowd can produce an incredibly sophisticated and life-like spectacle. The boundary of rules allows for randomness, but it is the assembly of the crowd that breathes life into the system. It raises the question of whether true creativity lies not in meticulous control, but in designing elegant systems that balance intention with unpredictability.

I would like to end my reflection with a Gerhard Richter quote.

“Above all, it’s never a blind chance, it’s a chance that is always planned but also always surprising and I needed in order to carry on in order to eradicate my mistakes to destroy what I’ve worked out wrong to introduce something different and disruptive I’m often astonished to find how much better chances than I am.”

Week 1 – Settling a Tree

My idea was to create a portrait that represents growth, experience, and the things that shape us. Instead of a face, I wanted to make my self-portrait a fluid, continuous scene, through which we see the diorama of a life being lived.

  • The Tree: Me. It starts as nothing and grows over time, with its branches reaching out in unique, unpredictable directions. Its placement on the field is random, symbolizing the random circumstances we’re all born into.
  • Yellow Lights: These are the fleeting, positive moments. They could be ideas, bursts of inspiration, happy memories, or moments of creativity. They fall gently, glow for a while, and then fade away, leaving a subtle impression but more deeply, imprints on the tree leaves that catch them.
  • Grey Stones: These represent the heavier, more permanent things in life. They could be foundational beliefs, significant life lessons, or more often than not, burdens and responsibilities. They fall with more weight, and once they hit the ground, they settle and become part of the landscape permanently. A sufficient number of stones would pave a road through the screen, from left to right.

The entire process is automated. I press play, and the code “paints” the portrait for a set amount of time before freezing, leaving a final, static image that is unique every time it’s run.

The choice of background was a key point of hesitation during the creative process. I first tried a pure black canvas, but the branches and their few leaves seemed too sparse and lonely. My next step was a semi-transparent black background, which created lovely trails but didn’t feel quite right visually. I finally settled on a semi-transparent dark grey, as it softened the high contrast while preserving the beautiful “ghosting” effect.

Below is the first version of the background.

One of my favorite tiny inventions for this project was a simple interactive feature that lets you control the flow of time: By simply holding down the mouse button, the entire animation slows to a crawl, creating a quiet but reflective moment on the scene as it unfolds. It doesn’t alter the final portrait, but it changes how you experience its creation.

// Mouse press animation, simple but I found quite effective
if (mouseIsPressed === true) {
  frameRate(10); // Slow down the animation
} else {
  frameRate(60); // Resume normal speed
}