It’s easy to forget that computers don’t actually see anything. When we look at a video feed, we instantly recognize a person walking across a room. A computer just registers a grid of numbers where pixel values shift over time. Because of this, computer vision is incredibly fragile. Every tracking algorithm relies on strict assumptions about the real world. If the lighting in a room changes, a tracking algorithm might completely break. The computer doesn’t see “general” picture with context, since it only knows the math it was programmed to look for.
Basic Tracking Techniques
To avoid this blindness of the computer, some techniques are used to track/react to things the developers are interested in.
-
-
Frame differencing: comparing the current video frame to the previous one. If the pixels changed, the software assumes motion happened in that exact spot.
-
Background subtraction: memorizing an image of an empty room. When a person walks in, it subtracts the “empty” image from the live feed to isolate whatever is new.
-
Brightness thresholding: tracking a glowing object in a dark room by telling the software to ignore everything except the brightest pixels.
-
Simple object tracking: This involves looking at the color or pixel arrangement of a specific object and looking for those same values as they move across the screen.
-
Surveillance in Art
I believe that the fact that people use the technology made for surveillance and military to create art is very interesting. I believe that using technology built for control to create art is truly impressive: flipping the understanding of this technology, or even making it very double-sided. While interactivity that comes with such tracking technology has a huge variety, and sometimes feels magical and extremely emotional, it comes from the computer tracking, analyzing and reacting to every move of the person in front of it. Such art presents the invisible unsettling surveillance we have everyday to a work of art that makes it extremely present.
Honestly, this military baggage explains a lot of computer vision’s blind spots. If you’re designing a system just to monitor crowds or track moving targets, you don’t need it to understand the whole scene and all details. You just need fast analysis of tiny differences, like a shift in pixels.
However, I feel like in interactive media details are very important, and that art runs on them. This way, while computer vision has not yet reached the state when it can analyze everything at once, artists have to come up with algorithms that will try to do it instead.