Reading Reflection – Week#5 – Introduction to Interactive Media

What are some of the ways that computer vision differs from human vision?

No computer vision algorithm is universally able to perform its intended function (eg. recognizing humans vs. background) provided any kind of input video, unlike the human eye and brain which can work together in general to perform its intended function (eg. recognizing humans vs. background). Instead, the object detection algorithm or tracking algorithm crucially relies on distinctive assumptions about the real-world scene that it is to study. If the algorithms’ assumptions are not met, then it could perform poorly, producing not very valuable results, or completely fail in its function. Take a first example: frame differencing is a computer vision technique that detects objects by detecting object movements. This is achieved by comparing the corresponding pixels of two frames by finding the difference in color and/or brightness between all corresponding pixels. Thus, the frame differencing algorithm can perform accurately on “relatively stable environmental lighting,” and “having a stationary camera (unless it is the motion of the camera which is being measured).” Hence, providing videos with much active movement, like in the NBA games, would be much more suitable than providing videos of focused people in the office. In addition to frame differencing, background subtraction and brightness thresholdings are more examples where having some presumptions are important for computer vision tasks. Background subtraction “locates visitor pixels according to their difference from a known background scene” while brightness thresholding uses “hoped-for differences in luminosity between foreground people and their background environment.” Thus, considerable contrast in color or luminosity between foreground and background is important for an accurate recognition of objects; otherwise, in nighttime scenes, the algorithm may detect objects in the video scene incorrectly as background. On the other hand, I personally feel that the human eye remarkably uses a combination of these three, and perhaps more, algorithms to detect objects, which allows it to perform extraordinarily well compared to current computer vision.

What are some techniques we can use to help the computer see / track what we’re interested in?

It is of great importance to design a physical environment with the conditions best suited for the computer vision algorithm, and, in the other way, select software techniques that are best with the physical conditions at hand. There are interesting examples that stood out to me for enhancing the suitability and quality of the video input provided to the computer vision algorithm. I believe that infrared (as used in night vision goggles) should complement conventional black-and-white security cameras, which can massively boost signal-to-noise ratio of video taken in low-light conditions. Polarizing filters are useful to handle glare from reflective surfaces, especially in celebrity shows. Of course, there are lots of cameras to consider as well, optimized for “conditions like high-resolution capture, high-frame-rate capture, short exposure times, dim light, ultraviolet light, or thermal imaging.”

How do you think computer vision’s capacity for tracking and surveillance affects its use in interactive art?

Computer vision’s capacity for tracking and surveillance opens doors for interactivity between the computer and the human body, gestures, facial expressions and dialogue/conversations. Already some difficult algorithms can correctly identify facial expressions, which could be used to detect someone’s emotional levels, and can be used in mental health initiatives to help people suffering emotionally. This might relate to Stills from Cheese, an installation by Christian Möller. Additionally, like in Videoplace, participants could create shapes using gestures, and their silhouettes in different postures can be used to form different compositions. If computer vision were combined with audio and language, then systems could be even more interactive with the increase in affordances.

Leave a Reply Cancel reply