Week 5 Reading Response

Computer vision isn’t really “vision” in the way humans experience it, it’s more like a giant calculator crunching patterns in pixels. Where we see a friend’s smile and immediately read context, emotion, and memory, the computer just sees light values and tries to match them against models. It’s fast and can process way more images than a person ever could, but it lacks our built-in common sense. That’s why artists and developers often need to guide it using things like face detection, pose estimation, background subtraction, or optical flow to help the machine focus on what’s actually interesting. Techniques like MediaPipe that can map out your skeleton for gesture-based games, or AR apps that segment your hand so you can draw in mid-air, could let us bridge the gap between human intuition and machine literalism.

But once you start tracking people, you’re also borrowing from the world of surveillance. That’s a double-edged sword in interactive art. On one hand, it opens up playful experiences. On the other, the same tech is what powers CCTV, facial recognition in airports, and crowd analytics in malls. Some artists lean into this tension: projects that exaggerate the red boxes of face detection, or that deliberately misclassify people to reveal bias, remind us that the machine’s gaze is never neutral. Others flip it around, letting you “disappear” by wearing adversarial patterns or moving in ways the system can’t follow. So computer vision in art isn’t just about making the computer “see”, it’s also about exposing how that seeing works, what it misses, and how being watched changes the way we move.

You can also invert the logic of surveillance: instead of people being watched, what if the artwork itself is under surveillance by the audience? The camera tracks not you but the painting, and when you “stare” at it too long, the work twitches as if uncomfortable. Suddenly, the power dynamics are reversed.

Week 5 – Reading Reflection

Reading the essay Computer Vision for Artists and Designers made me realize how differently computers and humans actually “see.” Our eyes and brains process the world in ways that feel natural: we recognize faces instantly, understand depth, guess intentions from gestures, and fill in missing details without even noticing. Computers, on the other hand, don’t have that intuitive grasp. They just see pixels and patterns. A shadow or a little blur can confuse them. Where we understand context,  like knowing a cat is still a cat even if half hidden, computers rely on strict rules or training data, and they often fail when something doesn’t match what they’ve been taught to expect.

To bridge that gap, a lot of effort goes into helping machines track what we want them to notice. Instead of raw pixels, we give them features: edges, colors, corners, or textures. Algorithms can then use those features to keep track of an object as it moves. More recently, deep learning has allowed computers to learn patterns themselves, so they can recognize faces or bodies in a way that feels closer to human intuition (though still fragile). Sometimes, extra sensors like depth cameras or infrared are added to give more reliable information. It’s almost like building a whole toolkit around vision just to get machines to do what we take for granted with a single glance.

Thinking about how this plays into interactive art is both exciting and a little unsettling. On one hand, the ability to track people makes art installations much more engaging — an artwork can respond to where you’re standing, how you move, or even who you are (as I observed in TeamLab). That creates playful, immersive experiences that wouldn’t be possible without computer vision. But the same technology that enables this interactivity also raises questions about surveillance. If art can “see” you, then it’s also observing and recording in ways that feel uncomfortably close to security cameras. I think this tension is part of what makes computer vision so interesting in art: it’s not just about making something interactive, but also about asking us to reflect on how much we’re being watched.

Week 5 – Midterm Project Progress

For my midterm project, I decided to make a little balloon-saving game. The basic idea is simple: the balloon flies up to the sky and faces obstacles on its way, that the player needs to avoid

Concept & Production

Instead of just popping balloons, I wanted to make the balloon itself the main character. The player controls it as it floats upward, while obstacles move across the screen. The main production steps I’ve worked on so far include:

  • Making the balloon move upwards continuously.
  • Adding obstacles that shift across the screen.
  • Writing collision detection so that the balloon “fails” if it hits something.

  • Bringing back the buttons and menu look from the beginning, so the game starts cleanly.

It’s been fun turning the balloon from a simple object into something the player actually interacts with.

The Most Difficult Part
By far, the trickiest part has been the balloon popping without errors. Sometimes, the collisions were detected when they shouldn’t be, which gave me a bunch of false pops. Fixing that took way more trial and error than I expected, but I think I finally have it working in a way that feels consistent (I used help from AI and YouTube).

Risks / Issues
The main risk right now is that the game sometimes lags. Most of the time, it works fine, but once in a while, the balloon pops out of nowhere in the very beginning. I’m not sure if it’s about how I’m handling the objects or just the browser being picky. I’ll need to look into optimizing things as I add more features.

Next Steps
From here, I want to polish the interactions more, add sound effects, and make sure the game is fun to play for longer than a few seconds and looks visually appealing and more aesthetic. But overall, I feel good that the “scariest” part (getting rid of the balloon popping errors) is mostly handled.

Week 5 – Reading Reflection

What stood out to me in the reading is how limited computer vision really is compared to human vision. As humans, we don’t think twice about recognizing objects, adjusting to poor lighting, or making sense of what we see in context. A computer, on the other hand, needs specific rules and conditions to function. It does not actually understand meaning but instead works through pixels, patterns, and features. If something changes in the environment, like lighting or background, the system can easily fail. That made me realize how much of computer vision is not about “seeing” the world the way we do but about narrowing down what the computer is expected to detect.

To make computer vision work, artists and designers often shape the environment so the system has fewer obstacles. This can be done by using clear contrasts, better lighting, or markers that help the camera distinguish what matters. There are also tools like background subtraction and motion tracking that simplify what the computer needs to follow. Hardware choices such as certain cameras, filters, or infrared technology also help in making the vision system more reliable.

In interactive art, I think this creates both opportunities and challenges. On the positive side, computer vision allows art to respond to the presence and movements of people, turning viewers into active participants. It makes installations feel alive and immersive in ways that would not be possible without tracking. At the same time, it carries the same logic as surveillance because the system is always watching and recording behavior. For me, this makes computer vision powerful but also a little unsettling, since it forces us to think about what it means to be observed and how that shapes the experience of art.

Week 5 – Reading Response – Shahram Chaudhry

One thing that really stood out to me from this week’s reading is how different computer vision is from human vision. We take it for granted that we can look at a scene and instantly make sense of it. We can tell if it’s day or night, if there’s someone in the frame, if they’re walking or just waving – all without thinking. But to a computer, a video is just a bunch of colored pixels with no meaning. It doesn’t “know” what a person or object is unless we explicitly program it to. There are several techniques to help computers track. For example, frame differencing which compares two frames and highlights motion could be  helpful in detecting someone walking across a room or background subtraction to reveal new people or objects that appear. These sound simple, but they’re super powerful in interactive media. 

What makes this especially interesting is how computer vision’s ability to track things brings up both playful and serious possibilities. On one hand, it’s fun,  you can build games that react to your body like a mirror or let users move objects just by waving. But on the other hand, it opens doors to surveillance and profiling. Installations like The Sorting Daemon use computer vision not just to interact, but to critique how technology can be used for control. Or take the Suicide Box, which supposedly tracked suicides the Golden Gate Bridge. And it made me wonder, did it actually alert authorities when that happened, or was it just silently recording? That blurred line between passive tracking and ethical responsibility is something artists can explore in powerful ways.

Also, while humans can interpret scenes holistically and adapt to new contexts or poor lighting, computer vision systems tend to be fragile. If the lighting is off, or the background is too similar to a person’s clothes, the system might fail. No algorithm is general enough to work in all cases,  it has to be trained for specific tasks. We process thousands of images and scenes every day without even trying. For a machine to do the same, I am assuming it would need countless hours (or even years) of training. Nevertheless, clever engineering and artistic intuition means that we can still make good interactive art with the current state of computer vision.



Week 5 – Midterm Progress

For my midterm, I knew I wanted to incorporate a machine learning library, specifically for gesture recognition. I initially explored building a touchless checkout interface where users could add items to a cart using hand gestures. However, I realized the idea lacked creativity and emotional depth.

I’ve since pivoted to a more expressive concept: a Mind Palace Experience (not quite a game), where symbolic “memories” float around the screen  – some good, some bad. The user interacts with these memories using gestures: revealing, moving, or discarding them. The experience lets users metaphorically navigate someone’s inner world and discard unwanted memories, ideally the painful ones. Here’s a basic canvas sketch of what the UI could look like.

At this stage, I’ve focused on building and testing the gesture recognition system using Handsfree.js. The core gestures, index finger point, pinch, open palm, and thumbs down, are working and will be mapped to interaction logic as I build out the UI and narrative elements next.

The code for different gestures.

function isPinching(landmarks) {
  const thumbTip = landmarks[4];
  const indexTip = landmarks[8];
  const d = dist(thumbTip.x, thumbTip.y, indexTip.x, indexTip.y);
  return d < 0.05;
}

function isThumbsDown(landmarks) {
  const thumbTip = landmarks[4];
  const wrist = landmarks[0];
  return (
    thumbTip.y > wrist.y &&
    !isFingerUp(landmarks, 8) &&
    !isFingerUp(landmarks, 12) &&
    !isFingerUp(landmarks, 16) &&
    !isFingerUp(landmarks, 20)
  );
}

function isOpenPalm(landmarks) {
  return (
    isFingerUp(landmarks, 8) &&
    isFingerUp(landmarks, 12) &&
    isFingerUp(landmarks, 16) &&
    isFingerUp(landmarks, 20)
  );
}

function isFingerUp(landmarks, tipIndex) {
  const midIndex = tipIndex - 2;
  return (landmarks[midIndex].y - landmarks[tipIndex].y) > 0.05;
}

The sketch link:

https://editor.p5js.org/sc9425/full/n6d_9QDTg

Week 5 – Midterm Assignment Progress

Concept

For my midterm project, I’m building an interactive Hogwarts experience. The player starts by answering sorting questions that place them into one of the four houses. Then they get to choose a wand and receive visual feedback to see which wand truly belongs to them. After that, the player will enter their house’s common room and either explore various components in the room or play a minigame to earn points for their house.

The main idea is to capture the spirit and philosophy of each Hogwarts house and reflect it in the minigames, so the experience feels meaningful and immersive. Instead of just random games, each minigame will be inspired by the core traits of Gryffindor, Hufflepuff, Ravenclaw, or Slytherin.

Design

I want the project to feel smooth and interactive, with a focus on simple controls mostly through mouse clicks. Each stage (from sorting, to wand choosing, to the common room minigames) will have clear visual cues and feedback so the player always knows what to do next.

For the minigames, I’m aiming for gameplay that’s easy to pick up but still fun, and thematically tied to the house’s values. The design will mostly use basic shapes and animations in p5.js to keep things manageable and visually clean.

Challenging Aspect

The part I’m still figuring out and find the most challenging is designing minigames that really match each house’s philosophy but are also simple enough for me to implement within the project timeline. It’s tricky to balance meaningful gameplay with code complexity, especially because I already have a lot of different systems working together.

Risk Prevention

To manage this risk, I’ve been brainstorming minigames that are easy to build, like simple clicking games for Gryffindor’s bravery or Memory games for Ravenclaw, while still feeling connected to the houses’ themes. I’m focusing on minimal input and straightforward visuals so I can finish them reliably without overwhelming the code.

Week 5 – Reading Reflection

What are some of the ways that computer vision differs from human vision?
Computer is really context dependent compared to humans. We have eyes and can generally differentiate objects and perform any actions with any inputs, but no computer vision algorithm is completely autonomous. Each algorithm is dependent on its code and assumptions about the specific scene it is analyzing. If conditions such as absence of movement or poor lightning are present, then algorithm may fail.

What are some techniques we can use to help the computer see / track what we’re interested in?
As mentioned in the abstract, we need to increase the contrast more so the computer vision can differentiate between environment background and people’s movements. Those include lighting that silhouettes people, contrasting costumes. Also, using Infrared Illumination improves signal-to-noise ration in low-light conditions and retroflection marking materials.

Choosing the right imaging hardware is essential too. For example, telecentric lenses so object’s magnification is independent from the distance, polarizing filters to reduce glare from reflective surfaces, and very purposeful choice of cameras for high resolution, frame rate, short exposure, dim light, UV light, or thermals.

How do you think computer vision’s capacity for tracking and surveillance affects its use in interactive art?
It is, for sure, a core mechanism and an engine that gives power to interactive art. I strongly believe that computer vision’s ability to detect, track, measure presence, motion, color, and size completely changed how interactive art is perceived since it’s invention. Techniques such as background subtraction of frame differencing, even though simple, are very profound in how they helped to make thousands of modern interactive art installations. For example, advanced tools like EyesWeb specifically focus on the tracking and surveillance and provide ‘analysis and processing of expressive gesture’. Now, it is not just about detecting movement, but rather interpreting for specific musical or visual artistic purposes. I also think that first interactive piece called Videoplace that I read about on my other IM class gives audience agency and computer vision acts as a bridge between human input and computer’s further output much like a computer mouse, but rather detects human movement and gestures.

Reading Reflection – Week 5

As Levin noted in the article, there is a wide range of opportunities to utilize computer vision for interactive projects in the real world. On the surface level, human vision and computer vision seem similar, but at their core, the differences between them are striking. Human sight is based on context and shaped by years of experience living life, but computer vision is technically just raw pixel data at the start. Computer vision depends on the compatibility of the image with its abilities. If we give it the image of a person in different lighting or a new angle, it can result in unexpected processing outcomes, even though our human vision can easily identify that it’s the same person.

To help computers track what we’re interested in, I think it comes down to building a contrast between the object we wish to scan and its immediate surroundings. The author mentioned several techniques for doing this, such as frame differencing, which compared changes between video frames, background subtraction, which identified what was new compared to a static scene, and brightness thresholding, which isolated figures using light and dark contrasts. What I found most interesting was the use of difference in movement in the Suicide Box project, where it was the odd vertical motion of the persons that was the contrasting event in the image, and what the computer consequently identified as the target.

That said, computer vision’s capacity for tracking and surveillance makes its use in interactive art complicated. On one hand, it can make artworks feel so much more alive, and on the other, like in the Suicide Box project, it leads to significant controversy and even disbelief that the recordings could be real. It’s also interesting to think that what computer vision did in the Suicide Box project, human vision could never do, at least without causing the observer lifelong trauma. So computer vision does not just enable interactive art, but helps raise questions about privacy and control, and reflects cultural unease with the idea of being watched. 

I would also like to add how cool I find it that I’m now learning about these technologies in detail, when as a child I would go to art and science museums to see artworks that would use this technology and leave me feeling like I just witnessed magic; a similar feeling when I got my Xbox One and all the sports games would detect my movement as the characters’.

Week 5 – Reading Reflection

What I enjoyed most in this piece is how it drags computer vision down from the pedestal of labs and military contracts into something artists and students can actually play with. The examples, from Krueger’s Videoplace to Levin’s own Messa di Voce, remind me that vision doesn’t have to mean surveillance or soulless AI pipelines. It can also mean goofy games, poetic visuals, or even awkward belt installations that literally stare back at you. I like this take, it makes technology feel less like a monolith and more like clay you can mold.

That said, I found the constant optimism about “anyone can code this with simple techniques” a little misleading. Sure, frame differencing and thresholding sound easy enough, but anyone who’s actually tried live video input knows it’s messy. Lighting ruins everything, lag creeps in, and suddenly the elegant vision algorithm thinks a chair is a person. The text does mention physical optimization tricks (infrared, backlighting, costumes), but it still downplays just how finicky the practice is. In other words, the dream of democratizing vision is exciting, but the reality is still a lot of duct tape and swearing at webcams.

What I take away is the sense that computer vision isn’t really about teaching machines to “see.” It’s about choosing what we want them to notice and what we conveniently ignore. A suicide detection box on the Golden Gate Bridge makes one statement; a silly limbo game makes another. Both rely on the same basic tools, but the meaning comes from what artists decide to track and why. For me, that’s the critical point: computer vision is less about pixels and algorithms and more about the values baked into what we make visible.