Computer Vision is full of bits and bytes. There is no way perfect algorithm to obtain the computer vision of the background. The computer vision seems to only see the whole back as a series of pixels. This means that without human to define what these pixels are, the computer does not know what to do with the image in their vision. In other words, the image data is a stream of nonsense to the computer.
The way the computer analyze the changes in its vision is similar to how the sensors work, I think. For example, in one of the techniques mentioned in the reading, “detecting motion” is done by comparing the pixels of the first frame (F1) and the subsequent frame (F2). Any changes in pixels detected by the processor can be concluded that was a movement in the vision. However, it is clear that this technique is unable to recognize what type of shape is it as well as the color of the moving objects. Similarly, since this algorithm is created to only detect motion, it does not understand the general image that it is capturing.
Therefore, computer vision can only do good at the jobs that t is assigned.