Week 5 – Reading Response

In the article “Computer Vision for Artists and Designers: Pedagogic Tools and Techniques for Novice Programmers”, many existing works, themes and applications of computer vision, and basic methods of representing computer vision.

To now directly address the questions asked of me, computer vision indeed differs gravely from human vision. While the goal of computer vision remains the same as that of human vision, id est to represent physical (extended) objects in a manner where no significant detail is lost, and so to identify and then perform computations upon the representations of this data (if so it may be needed).

Humans are unfathomably complex beings, with there being over 100 million rods (cells for low-light conditions and peripheral vision, source: Cleveland Clinic), and several million cones (cells for detail and color, source: Cleveland Clinic), many machines even by todays standards can never ever ever come close to us biological entities. Furthermore, operating at an average of 20 Watts (source: National Library of Health), our brains are incredibly efficient at managing and responding to input from the incredible complexity of our eyes, and every other sensory system and square inch of skin.

Now that I am done marveling at ourselves, I return to humiliate the computers. Computers are inefficient, slow, blocky, prone to faults and can really only function on binary numbers and logic (though in more recent years other number and logical systems are being explored). The challenge is both in the sensors scanning the environment and relaying this analog data to the computer. Next the challenge is for that analog data to be converted into a digital format (fundamentally 1s and 0s), and then for that data to be processed by a program in an efficient manner. Typically, videos from this sensory data are stored as “a stream of rectangular pixel buffers”, and according to the paper, this doesn’t really tell us much about what the computer is really being fed through the system.

The paper moves to mention several different schemes and standards that computer vision encoding may be analogous to, for representing real world data, and underscores how there is no unified convention when it comes to these schemes. Certain techniques that a basic algorithm may use to discern motion from stillness includes comparing two adjacent frames in a video to see what pixel values changes, as well as background subtraction. I now extend upon this independently, in that it is probably wiser to first subtract the background before measuring any sort of pixel value changes or points of reference, as we don’t want background pixel noise to impact accuracy.

What I really found interesting was how we may be able to implement basic interactions on these two – albeit simple – methods alone. Once the silhouette of a person has been detected, their motion or boundary can be used as a collider for free-falling objects, for example. Alternatively, we may even be able to recolor a person and their environment in grayscale, or B&W for intriguing stylistic effects. Perhaps it is so that it is only I who yearns for aged B&W technology. There is something I find oddly simple yet calming in such technology.

Alas, I have extended upon the core of that in the reading which held significance to me. Though I would like to mention, the more we try to personify computers, id est implementing traditionally biological processes to them, the more I marvel at our own biological complexity!

Leave a Reply