Week #5 Reading – Computer Vision

Introduction

Computer Vision is the amalgamation of various mathematical formulae and computational algorithms accompanied by several computational tools – capable of carrying out the procedure. What was once deemed to expensive and high level (limited to experts such as on AI and signal processing), computer vision now has become readily available. Various software libraries and suites provide student programmers with the ability to run and execute those algorithms required for the object detection to work. The cherry on top, with mass refinement and larger availability of computer hardware, at a fraction of cost of what would have been in the early 1990’s,  now anyone, and by anyone I mean all of the institutions can access it and tinker around with it.

Difference between computer and human vision:

Computer Vision has a designated perimeter, where it scans for array of objects vertically and horizontally. Upon detecting a change in shade of pixel, it infers detection. By using complex algorithmic thinking, which is applied in the back-end, it is able to analyze and detect movement among various other traits such as character recognition etc. Various techniques like “Detection through brightness thresholding” are implemented. Alongs the similar lines happens to be the human vision. Our retinas capture the light reflecting from various surfaces, using which our brain translates the upside down projection into a comprehensible code. Our brain is trained to interpret objects, while computer vision requires algorithmic understanding and aid of Artificial intelligence. With AI, amongst a data set, training is done be it supervised or not, to teach the computer how to react to a certain matrix of pixels i.e image scanned.

Ways to make computer vision efficient:

As mentioned in the reading and the paper,  one of the things that I love is ‘background subtraction’. The capability to isolate the desired object. In my opinion, tracking several objects using this technique, and having a variety on the trained data set helps with more accurate and precise judgment. Especially if many objects are present at the same time. However, other techniques such as ‘frame differencing’ and ‘brightness thresholding’ exist as well. Also, from other readings, the larger the data set, and training time, the more the accuracy. However, to acquire image data, it comes with ethical dilemmas and added cost.

Computer Vision’s surveillance and tracking capability, and its implementation in interactive Media:

Works like Videoplace and Messa di Voce are example of earlier demonstration of interactive media and computer vision’s combination. Installations can track and respond to human input. This ‘feedback loop’ triggers a sense of immersion and responsiveness. In my humble opinion, the use of computer graphics takes away the user from traditional input techniques and gives them freedom to act as they will. Though it is also true, that the computer will make the sense out of the input, adjacent to the trained data set, and a totally random input might lead the system to fail. This is where the idea of ‘degree of control’ comes into play. Personally, I believe, as long as we have a combination of interactive components, user will never get tired of running inside the same tiring maze, but the use of computer vision definitely makes it seem less tangible and more user centered. Hence, I decided to use it for my midterm project as well!

Leave a Reply