When I think of Computer Vision, the first thing that comes to my head is this coder called the Poet Engineer on social media who uses computer vision to create the most insane visuals purely from the camera capturing their hand movements. They have the coolest programs ever. I also love it when artists make videos of them creating cool things with their hands purely through code, and one of my favourite examples of using code to create art is Imogen Heap’s MiMu gloves. And, also, the monkey meme face recognizer I keep seeing everywhere (photo attached). It still baffles me that we can use our hands and our expressions to control things on a device that usually interacts with touch! So, this reading was one of my favourite readings so far, because it discussed one of the main concepts that hooked me into interactive media in the first place.
From what I understood of the text, the primary difference between computer and human vision is that while a human observer can understand symbols, people or environmental context like whether it’s day or night, a computer (unless programmed otherwise) perceives video simply as pixels. Computer vision uses algorithms now to make assertions about raw pixels, and even then, designers need to optimize the physical environment to make it “legible” to the software, such as using backlighting to create silhouettes or using high-contrast and retroreflective materials. Despite these limitations, is it still not insane that we’ve evolved so much that we can make computers identify specific things now, despite it being a computer? The fact that now computers can have hardware that goes beyond our own capabilities, such as infrared illumination, polarizing filters and more is almost scary to think about. I’d also say that computer vision is much more objective than human vision. Is it possible for computers to suffer from inattentional blindness as much as we do? For example, when we enter a room and fail to see something and then we come back and the object is right there and it never moved, is a computer capable of the same thing?
I liked that this reading stated down the different techniques used in computer vision, because when I originally understood CV, I was overwhelmed by the amount of things it could sense. I understood these techniques (and I’m listing them down so I can refer to them later as well):
- Frame Differencing / Detecting Motion: Detects motion by comparing each pixel in a video frame to the corresponding pixel in the next frame.
- Background Subtraction / Detecting Presence: Detects the presence of objects by comparing the current video frame to a stored image of an empty background.
- Brightness Thresholding: Isolates objects based on luminosity, by comparing brightness to a set threshold. (I did an ascii project a few years ago, where it would capture your image, figure out the contrast and brightness and then replicate the live video input as letters, numbers and symbols. I would like to replicate that project with this concept now!)
- Simple Object Tracking: Program computer to find the brightest or darkest pixel in a frame to track a single point.
- Feature Recognition: Once an object is located, the computer can compute specific characteristics like area or center of mass (this is CRAZY).
There are definitely more techniques that are out there, but I’ll start off with the basics, since I’m a complete beginner at this. I did want to try using feature recognition paired with simple object tracking, something I noticed is used in hand tracking (and the monkey video. LOL).
I mentioned the objectivity of CV earlier, but what happens if the datasets that they are trained on are biased? What if the creator behind the program has their own biases that they implement into the program? I like how Sorting Daemon (2003) mentioned looking at the social and racial environment, because I was wondering about situations where CV could be programmed to unintentionally (or intentionally) discriminate against certain traits such as race, gender, or disabilities. Surveillance is a scary concept to me too, because what happens to the question of consent? While computer vision could be used to reveal hidden data in environments that are often overlooked, create programs that can help people without the need for a human to be present (e.g. Cheese), and so many other cool things, it could also be used in a negative way. I need to make sure to find a way that any programs I create with CV are inclusive and not used for ill intent.