One thing that really stood out to me from this week’s reading is how different computer vision is from human vision. We take it for granted that we can look at a scene and instantly make sense of it. We can tell if it’s day or night, if there’s someone in the frame, if they’re walking or just waving – all without thinking. But to a computer, a video is just a bunch of colored pixels with no meaning. It doesn’t “know” what a person or object is unless we explicitly program it to. There are several techniques to help computers track. For example, frame differencing which compares two frames and highlights motion could be helpful in detecting someone walking across a room or background subtraction to reveal new people or objects that appear. These sound simple, but they’re super powerful in interactive media.
What makes this especially interesting is how computer vision’s ability to track things brings up both playful and serious possibilities. On one hand, it’s fun, you can build games that react to your body like a mirror or let users move objects just by waving. But on the other hand, it opens doors to surveillance and profiling. Installations like The Sorting Daemon use computer vision not just to interact, but to critique how technology can be used for control. Or take the Suicide Box, which supposedly tracked suicides the Golden Gate Bridge. And it made me wonder, did it actually alert authorities when that happened, or was it just silently recording? That blurred line between passive tracking and ethical responsibility is something artists can explore in powerful ways.
Also, while humans can interpret scenes holistically and adapt to new contexts or poor lighting, computer vision systems tend to be fragile. If the lighting is off, or the background is too similar to a person’s clothes, the system might fail. No algorithm is general enough to work in all cases, it has to be trained for specific tasks. We process thousands of images and scenes every day without even trying. For a machine to do the same, I am assuming it would need countless hours (or even years) of training. Nevertheless, clever engineering and artistic intuition means that we can still make good interactive art with the current state of computer vision.