From reading this I understood just how different “seeing” actually is from one person or entity to the next.
As a human, I don’t have to think about the process of seeing. I see a person, and I immediately understand a face, an expression, perhaps even an intention. But computer vision doesn’t see meaning, it sees numbers. The article clearly states that high-level image understanding is still a very difficult task, whereas low-level image understanding is much more feasible. A computer doesn’t see “a person walking,” it sees the differences in pixel values.
For instance, frame differencing involves subtracting one frame from another to identify movement. This seems so very mechanical in contrast to how easily we, as humans, can identify movement. Background subtraction involves comparing a live image with a pre-stored image of the background to identify what is out of place. As humans, we can easily identify a person regardless of how much the lighting changes. However, for a computer, lighting, contrast, and setup are very important.
One thing that I found particularly interesting is that rather than trying to make computers see the world as we do, the reading proposes that we design the physical world so that it is easier for the computer to see, using high contrast, controlled lighting, and reflective surfaces. This is a reversal of the situation, rather than trying to make the algorithm “smarter,” we are trying to make reality more computable. For me this is fascinating because interaction design is not just about the digital world, it is also about the physical world.
However, with interactive art, this is even more complicated. Tracking technologies can enable very powerful experiences of embodiment, such as in early works such as Videoplace, where silhouettes and motion become interactive elements. I find the concept of being able to have meaningful artistic experiences through simple detection technologies very appealing. The system does not fully “understand” the body but simply tracks enough information to react to it.
However, at the same time, the reading points out works such as Sorting Daemon, which emphasize surveillance and profiling. This was somewhat uncomfortable. The same technology that enables playful interaction can also extract, categorize, and analyze individuals. In the context of interactive art, being tracked can be very engaging. In other contexts, it can be very invasive.
I think it’s this tension that makes computer vision so potent in interactive media. It turns the body into data, but this data can either be expressive and interactive or controlling and analytical. As artists and designers, we’re not simply using tracking as a tool, we’re making choices about visibility and power.
This reading has made me more conscious of the fact that computer vision isn’t about simulating human vision. It’s about finding patterns that the machine can calculate. And perhaps it is interactive art that is where human vision and machine vision intersect.