Computer vision seems much harder than the way the article explains. Despite the author breaking down the idea into three elementary techniques used by computer vision, it’s still hard to get a grasp of what is meant by frame differencing, background subtraction, and brightness thresholding. Reading about the idea of how computer vision works is tricky for me (looks like there’s many mathematical equations that aren’t explained to isolate objects), but with a demonstration, it probably won’t seem as hard. Computer vision seems to be very frustrating to work with because there’s many room for error. Regardless, I’m excited to use some computer vision concepts in any interactive projects I might work with since it’s great at tracking people’s activities. Now that I think about it, many of the art gallery places and videos shown in class has a lot to do with computer vision.
Final Project:
For the final project, I worked on the ProcessingJS part of it. It’s basically a class of ripples. For the time being, the ripples only appear when you click on the screen. Other than ripples, I also created one with points coming from where the mouse is pressed. Currently, the way it’s programmed, only one or the other pattern will show up since I’m commenting out code depending on which pattern I want. I am planning on using Arduino to pick which pattern to run when the user plays with the program. It’d probably be mapped with the analog value of a sensor that’s attached to an El wire.
Here are the two patterns so far.
Additionally, I tested out the tilt sensors to see if it would trigger the patterns on Processing. Currently, any tilt will cause a pattern, but it still doesn’t choose decide on which pattern to run. I will test it out more, but I really want to get the El wires working first so I have to wait for it to arrive.
I worked intensively on my final project this weekend and I think I am about 80% done with the project, the rest of what I need to complete being fine tuning animations and creating a few physical components to be used along with my joystick. I initially used a button at the beginning of my creation process, but found that it was not engaging enough. I am now using two force sensors: one for harvesting resources and the other for interacting with the animals.
I plan to put one sensor into a stuffed animal’s head requiring the user to “pet” the animals. In order to harvest materials, users must hit the force sensor, which I plan to hide under a piece of wood, with a mallet I plan to construct.
I have completed the most difficult aspects of the project by coding nearly all the stages and the mechanics and animations. Although some of the timings and animations are not what I intended, I think I can complete this within a few days. I have also completed the Arduino circuit to receive information from the joystick and the two force sensors which are completely usable in the game.
Although I am not incorporating computer vision into my final project, this is still a field of interactive media that I am very interested in. What I found most interesting was that this technology has been used for art since the 1960s/1970s. I thought that computer vision’s use for art and media was relatively new, and that computer vision in general was a field that has only become useful in the past two decades. The reading mentions that its use has been limited to law-enforcement and military purposes, so this really makes me wonder how much computer vision for art would change if this technology was declassified.
I am extremely interested in how computer vision can be further used for media and art, but I cannot help but feel intimidated by the sophistication of it. At least to me, computer vision seems like technology out of a Star Wars movie, but this article allowed me to gain insight into how this technology is not as new as I thought and how it is very accessible to interactive media artists.
For the initial stage of the final project – WaterBox, the main focus for this week was getting the Kinect working for the specific purpose designated towards this project. And, for this project, I have decided to use Kinect v2. After some time trying to figure out how Kinect v2 works, I have encountered several issues that took me few hours to solve.
One of the main problem I encountered was that Kinect v2’s video image had different dimensions compared to the other lenses (the IR, depth, and registered). Since I was looking to use the depth functionality as well as the RGB value of the pixel on different points within the image, my take on using the video image and the depth image was not a viable option – since the index of the depth image did not point towards the same position in the video image. Thus, I had decided to use the RegisteredImage, which combines the RGB and depth view, to analyze the RGB value on a certain pixel.
For both versions, I have set the minimum threshold as 480 and maximum threshold as 830 – but these values are subject to change as I test it on water. Following videos are some of the try-outs and testing with Kinect v2:
The first version is a test to see whether the Kinect accurately captures a certain color value (in this case, when B-value is higher than 200). For the test medium, I have used an empty blue screen on my mobile phone. As you can see in the video above, the Kinect successfully captures the designated values.
import org.openkinect.processing.*;
Kinect2 kinect2;
float minThresh = 360;
float maxThresh = 830;
PImage img, video;
int[][] screen;
void setup() {
size(512, 424);
kinect2 = new Kinect2(this);
kinect2.initRegistered();
kinect2.initDepth();
kinect2.initDevice();
screen = new int[kinect2.depthWidth][kinect2.depthHeight];
for (int x = 0; x < kinect2.depthWidth; x++) {
for (int y = 0; y < kinect2.depthHeight; y++) {
screen[x][y] = 0;
}
}
img = createImage(kinect2.depthWidth, kinect2.depthHeight, RGB);
}
void draw() {
background(255);
img.loadPixels();
video = kinect2.getRegisteredImage();
int[] depth = kinect2.getRawDepth();
for (int x = 0; x < kinect2.depthWidth; x++) {
for (int y = 0; y < kinect2.depthHeight; y++) {
int index = x + y * kinect2.depthWidth;
int d = depth[index];
if (d > minThresh && d < maxThresh && blue(video.pixels[index]) > 230) {
img.pixels[index] = color(0, 0, blue(video.pixels[index]));
screen[x][y] = 1;
} else {
img.pixels[index] = color(255);
screen[x][y] = 0;
}
}
}
img.updatePixels();
noStroke();
for (int x = 0; x < kinect2.depthWidth; x++) {
for (int y = 0; y < kinect2.depthHeight; y++) {
int index = x + y * kinect2.depthWidth;
if (screen[x][y] == 1) {
int mapVal = (int)map(depth[index], minThresh, maxThresh, 255, 0);
fill(mapVal, 0, 0);
ellipse(x, y, 2, 2);
}
}
}
}
Since I have to track where these recognized points are within the canvas, I have created a separate 2-dimensional array that tracks at which x and y coordinates the Kinect recognizes. So, the above video is a test with drawing ellipses where the pixels with blue value higher than 200, and each ellipse has a color gradient based on the depth in between the thresholds. The depth, in the later part of the project, will be used as the value that changes the pitch or speed of the sound created.
import org.openkinect.processing.*;
Kinect2 kinect2;
float minThresh = 360;
float maxThresh = 830;
PImage img, video;
int[][] screen;
void setup() {
size(512, 424);
kinect2 = new Kinect2(this);
kinect2.initRegistered();
kinect2.initDepth();
kinect2.initDevice();
screen = new int[kinect2.depthWidth][kinect2.depthHeight];
for (int x = 0; x < kinect2.depthWidth; x++) {
for (int y = 0; y < kinect2.depthHeight; y++) {
screen[x][y] = 0;
}
}
img = createImage(kinect2.depthWidth, kinect2.depthHeight, RGB);
}
void draw() {
background(255);
img.loadPixels();
video = kinect2.getRegisteredImage();
int[] depth = kinect2.getRawDepth();
int countTL = 0;
int countTR = 0;
int countBL = 0;
int countBR = 0;
for (int x = 0; x < kinect2.depthWidth; x++) {
for (int y = 0; y < kinect2.depthHeight; y++) {
int index = x + y * kinect2.depthWidth;
int d = depth[index];
if (d > minThresh && d < maxThresh && blue(video.pixels[index]) > 230) {
img.pixels[index] = color(0, 0, blue(video.pixels[index]));
screen[x][y] = 1;
if (x < kinect2.depthWidth/2 && y <kinect2.depthHeight/2) {
countTL++;
fill(0, 255, 0);
rect(0, 0, kinect2.depthWidth/2, kinect2.depthHeight/2);
text("hello", 0, 0);
}
if (x > kinect2.depthWidth/2 && y < kinect2.depthHeight/2) {
countTR++;
fill(0, 0, 255);
rect(kinect2.depthWidth/2, 0, kinect2.depthWidth, kinect2.depthHeight/2);
}
if (x < kinect2.depthWidth/2 && y > kinect2.depthHeight/2) {
countBL++;
fill(255, 0, 0);
rect(0, kinect2.depthHeight/2, kinect2.depthWidth/2, kinect2.depthHeight/2);
}
if (x > kinect2.depthWidth/2 && y > kinect2.depthHeight/2) {
countBR++;
fill(100, 100, 100);
rect(kinect2.depthWidth/2, kinect2.depthHeight/2, kinect2.depthWidth, kinect2.depthHeight);
}
} else {
img.pixels[index] = color(255);
screen[x][y] = 0;
}
}
}
img.updatePixels();
noStroke();
for (int x = 0; x < kinect2.depthWidth; x++) {
for (int y = 0; y < kinect2.depthHeight; y++) {
int index = x + y * kinect2.depthWidth;
if (screen[x][y] == 1) {
int mapVal = (int)map(depth[index], minThresh, maxThresh, 255, 0);
fill(mapVal, 0, 0);
ellipse(x, y, 2, 2);
}
}
}
}
For the third version, I wanted to check whether I can track how many pixels are at a certain part within the screen, specifically for this case the sections include Top-Left, Top-Right, Bottom-Left, and Bottom-Right. Here, I am tracking not only where the recognized pixels are located in terms of the screen, but also storing the count of how many recognized pixels are at the position of a certain area.
The next step of the project would be integrating sound and testing on a water surface – which I will need to start building the container.
I feel like this is the breaking point for me. Computer vision is so cool and I want to use it in absolutely everything I do and that’s a horrible mindset. It’s one of those things that should only be used if it’s absolutely essential to the project. If it can be done in another way, do it in another way. But if it’s what makes the project unique or adds a key element, then do it. As I continue working with IM, it’ll be hard not to suppress the urge to incorporate computer vision into everything.
This reading was very technical, but I think it was a good basis for understanding how it works, what the right conditions are for it to work, and other things that we will need to know for going into the very beginnings of computer vision. For example, setting up the physical environment, especially in this sort of festival/installation vibe we have going for the final show. Making sure that the space is high contrast/low noise and is able to communicate properly with the computer will be an essential part of the process.
However, I really want to talk about an experience I had a little less than a year ago at the Museum of Contemporary Art in Montreal. They had recently installed Zoom Pavillion, an interactive art piece by Rafael Lozano-Hemmer and Krzysztof Krzysztof Wodiczko. In this, all of the walls as well as the ceiling were covered in video footage…of yourself. You would walk into this room and cameras would pick up on your face, gestures, gait, a group that you walked in on. Then, the cameras would start pairing you with people to mark your relationship with them. It was absolutely incredible. I walked in with a group of 18 year olds, all of us extremely good friends, so the cameras drew a large square around the entire group. However, as we started exploring the space, it kept us in communication with each other. It noticed that I kept looking at my good friend Adam, so it would draw a line between us and write “friendly”. I would move far away from my other friend and it would write “hostile”.
This was all fun and games, however, until another group of teenagers walked in. At that point, my friend group had gathered in the corner of the room and were making funny faces into the camera. The other teenagers walked in and almost immediately, we had two boxes, each around the groups and a line between us that read, “Potential Friendly”. How had this machine told us that a bunch of people we could easily connect with had entered the room just by observing a few seconds of their movement and interaction with the space? I will never forget this. It was absolutely incredible.
I definitely enjoyed this week’s reading since it brought up numerous critical points and inspiring thoughts from me.
To begin with, I was inspired by the fact that computer vision has detached from an esoteric, high-privileged scheme into something that everyday artists and users can utilize to add to their performances. For instance, I was amazed by “Cheese”, the installation by Christian Moller because it has a very intriguing objective and the process of using computer vision to achieve that is interesting as well. Basically, from what I understood, there is a level-meter that is able to detect the actress’s smile and analyze how happy she seems. Unfortunately, I wish there was a more technical explanation for that since I was very curious on how it exactly worked rather than just what it was.
Other than that, I admired the way that the author attempted to introduce the mechanisms for how each installation worked, along with the fundamentals of computer vision techniques and algorithms at the latter part of the article. Since Paulin and I are also using computer vision in our final project, this article has been so helpful in the sense of what to use, when to use, and how to use. Now that I’ve read the article, maybe we can switch up for our project and even try out Infrared Illumination in a dark panel for our Iron Man glove. It is definitely an option, and we will have to discuss more to find the optimal solution.
I thought it was really interesting how Myron Krueger–who developed Videoplace, one of the first interactive works of art using computer vision– did so because he believed that “the entire human body ought to have a role in our interactions with computers”. There is something very profound about how we’ve come to find both practical and artistic uses for our bodies and their movement in relation to computers. What’s even more fascinating is that Krueger’s work preceded computer mice!
This reading was very helpful, especially as I read it while I was beginning to work on my final project where I’m using computer vision with Processing for the first time. It really helped me narrow down what I wanted my code to do, and since computer vision has many uses and capabilities, I had to think about what would work best with my project’s aim. Since I want to separate the foreground from the background to sense whether the person in the camera’s view is in front of the display or simply walking in the background, brightness thresholding seemed to be the best option. Aaron and I discussed Daniel Shiffman’s OpenKinect library for Processing yesterday, and we looked through the examples to see how I could set a brightness or depth threshold.
Tori made some words in a document and made them randomize on screen. She also figured out how to call specific characters which will be useful for later. Also, we got a dope hangman sketch. Here’s the code and a video:
Text text = new Text();
PImage hangman;
boolean one = false;
boolean two = false;
boolean three = false;
boolean four = false;
void setup() {
fullScreen();
}
void draw() {
background(255);
hangman = loadImage("hangman.gif");
image(hangman, 100, 100, 1000, 1000);
noFill();
strokeWeight(5);
if (one == true) {
ellipse(785, 355, 150, 150);
}
if (two == true) {
line(785, 430, 785, 720);
}
if (three == true) {
line(785, 720, 755, 780);
line(785, 720, 815, 780);
}
if (four == true) {
line(785, 480, 755, 580);
line(785, 480, 815, 580);
text("GAME OVER", width/2, height/2);
}
text.run();
}
void keyPressed() {
if (key == '1') {
one = true;
}
if (key == '2') {
two = true;
}
if (key == '3') {
three = true;
}
if (key == '4') {
four = true;
}
}
class Text {
float data[];
int number = int(random(0, 6));
int size = 60;
Text() {
}
void run() {
String[] message = loadStrings("words.txt");
data = float(split(message[number], ","));
fill(0);
textSize(size);
text(message[number], width/2 + 400, 700);
text(message[number].charAt(3), width/2, height/2); //now how do I get that to respond to an input...I don't think I can rn without connecting it to an input
if (keyPressed && (key == ' ')){
number = int(random(0, 6));
}
}
}
I’m controlling the word randomization using the space bar. The hangman appears using the number keys 1-4 (based on # of errors in the future)
Physical Stuff:
Kyle focused more on the physical stuff this weekend. He prototyped a panel using conductive copper fabric that we got from the Engineering Design Studio and some felt in between the two layers. He then got some poky pins and made a circuit using his arduino. He ran a bunch of trials to make sure that he could complete the circuit multiple times (upwards of 50-100) without the cicuit breaking or wearing down. We expect to complete more of these tests in the future, especially when we recieve the darts, which we ordered today and should be arriving on Monday or Tuesday.
Here is a picture of the failed attempt with conductive foam. We realized that this was not conductive enough to complete the circuit.
Here’s a video and some pictures of the final prototype panel!
Computer Vision for Artists and Designers: Pedagogic Tools and Techniques for Novice Programmers
This week’s reading introduced us to “computer vision” which is a method that allows computers to make use of digital input e.g. videos and images and make inferences. Since the new wave of digital art has come forth, this method of work allows a lot of room for creativity for artists to work with. The 4 ways mentioned in the article about how computer vision can work is: 1). Detecting motion: movements of people within the video frame can be detected and quantified using frame differencing. 2). Detecting presence: background subtraction. 3). Detection through brightness thresholding: objects of interest can be distinguished based on their brightness in a threshold value. 4). Simple object tracking: finds the location of the single brightest pixel in every fresh frame of video.
For my final project, this text really encouraged me to not forget about the physical components of my work as I may get too involved in software and forget about the other constraints. The physical aspects of the project are just as important as the software; therefore, it reminded me to work out which software techniques are going to be best compatible with the available physical conditions. As I am hoping to use IR camera and simple object tracking, I learned that “using IR significantly improves the signal-to-noise ratio of video captured in low-light circumstances”, since light is critical, I need to make sure I test my project out beforehand in the location of the exhibition.