Week 5 – Midterm Progress Zere

Concept of the project: I decided to create an interactive piece called “Day in the Life of a Cat”, where the user gets to experience normal cat stuff that cats get to do throughout their days.

Why did I choose this concept? Well, I have a hairless sphynx cat named Luna, and I love her VERY MUCH. I miss her a lot since she lives with my grandparents, and I decided to remind myself of her by creating this interactive experience.

What and How: The user gets to experience morning, day, and night as a cat, clicking on different items and discovering what they can do with them. I decided to keep the interface very simple, since overcomplicating things would make it hard for me personally to produce a good interactive experience. Here is my sketch for now:
I think one of the most important part of the sketch is the “meow” sound that is played when users click on the paw. That is why I created a test program for sound playback. It may be simple and flawed for now, but I think it’s an easy solution to the problem if it arises. Here is the link for the program: LINK.

Midterm Progress

Concept & Interaction

What I love more than horror games, is psychological story-based visual novels. Games that hold you in place, extremely focused and afraid to even blink to miss something important (or something that will get you in trouble). Also, I really love when innocent and soft, childlike things are framed in the way that makes you really uncomfortable, creating a really two-sided feeling of nostalgia and comfort with unsettling disturbance.

More than that, for a very long time I wanted to try experiment with computer vision and body capturing, so I decided to combine these two things in my midterm.

What I want to do is a game that is controlled by user’s video input. In Russia, we play a clapping game called “Ладушки” (ladushki; I believe in English it’s called Patty Cake), where you need to match the rythm of the other person clapping, as well as their hands (right to right, left to left, two hands to two hands). I want to make the user to play this game with the computer. There will be a girl in a room that will welcome the player to play this game with her. Her clapping will be consequentive, and player just has to match her hands state with their hands.

The twist is that if the player fails to match the girl’s rythm and hands state, she will get angry. And as the user makes more mistakes, she will get more angry. With the anger level increasing, the whole picture and game will become distorted: video glitching (in later phases: disappearing), rythm becoming unstable and/or much faster, unfair hand detection, intentional mistakes in detecting state of the hands, sound becoming distorted, phrases girl says after mistakes turning to be aggressive, her appearance shifting as well. If the user makes it to the anger level = 100, there will be a jumscare with their video distorted (I figured that out of all jumscares it will make the most impact).

Some details about the concept might be found as comments in my code, and the general outline is planned to be like in the picture below. To create creepy atmosphere, I plan to use really soft colors and cute artstyle that won’t match the gameplay and plot.

Code Outline

Right now I decided to focus on the technical side of the project, making the algorithm work, so after this I will have the core mechanic. After that, I will focus on visual and sound design: drawing sprites, finding suitable sounds, creating glitching effects, scales, text etc.

This is the plan of how I (as of now) think the code should include:

class Girl with one object initiated and the following methods:
- talking method (reaction when user fails to match the girl’s tempo)
- changing hands states method
- comparing Girl’s to user’s states method
- drawing girl method (sprites needed)
- anger level scale draw method
Functions in the general code block:
- detecting user’s handpose function
- displaying user’s video
- video distortion function
- sound implementation + sound distortion methods
- final video screamer function
Assembled game in setup() and draw() with restart option (maybe pause + exit buttons ?)

Code Made

Sketch requires camera access!

This is my sketch so far and the code I made. As I said, now I’m focusing on the technical part. Now, the code can:

detecting user’s handpose function
display user’s video in the corner
talking method (reaction when user fails to match the girl’s tempo) (Girl Class)
changing hands states method (Girl Class)
comparing Girl’s to user’s states method (Girl Class)

Instead of drawing, the code now just outputs the anger level and the state of the girl’s hands. The code compares the video input and user’s hands position with girl’s hands state. When user makes mistake, anger level is increased by 10 and there’s text displayed on the screen (3 phrases for each sector: 4 sectors depending on anger level). However, it isn’t staying on it now (to be fixed). Also, the game loop stops once annger level reaches 100.

The base code for hand detection function implementation is copied from ml5 HandPose reference page. I also used this The Coding Train video about Hand Pose.

Complex Part

I believe that the most difficult part of my midterm is working with video input and hand detection. It’s a pretty new concept for me, as well as it’s pretty hard to implement it to use not just as a small interactive component but a core concept around which the game is built. Risk of improper detection of pose, poor video input and glitching is quite high. However, I tried to build this part first and it turned out to not be too difficult. After testing my code for some time, I defined three poses the computer should recognize: two hands open, left hand open, right hand open. Perfectly, to fit my concept and the Ladushki gameplay, I also need to have a pose for clap, but the problem is that when hands are clapped and face their edge to the camera, hand detection disappears. Since it could possisbly break the game and unfairly detect it as user’s mistake when it’s not, I decided to ignore this state fully and check only for claps to the camera, when palm is facing the computer.

Also, to avoid random poses appearing and being detected, I added a confidence level: if the confidence in detection of hands is lower than this set level, the computer won’t register it as a pose. This really helps a lot to not identify some random pose/made up movement as the user’s mistake.

Now, the most challenging part for me would be the visual design. I don’t have much experience in this type of creative work unlike in coding, so creating sprites and building environment that will serve for the goal of the game and suit its atmosphere, as well as arranging everything properly and not overloading, will be a bit hard for me. To check on my progress here and to track the result of the aesthetic impact and suitability, I would ask my friends to give me feedback and interact with AI so it can give me some more theoretical rules about how things should be arranged in final outcome according to some basic design rules.

I believe that this project is really fun and much easier to make than I expected, since the hardest part was mostly completed already!

Some of my inspirations for design, concept and aesthetic are: Needy Streamer Overload and DDLC

Week 5 – Reading Response

What are some of the ways that computer vision differs from human vision?

Previously, I kind of always linked computer vision with machine learning. I always assumed there was some use of machine learning to identify the different objects in a given video, and to really understand the movements and different interactions within the video. However, after reading this article, I feel like I’ve gained a much clearer understanding of how computer vision actually works as well as a better understanding of the limitations of the technology available. While both computers and humans can probably identify where a person is in a video and their movements, humans are also usually able to predict their next movements. Humans are familiar with how humans interact with objects, while computers really depend on data, which can sometimes miss anomalous cases or outliers. An example that may seem a bit far fetched is someone who only has 4 fingers, human vision is obviously able to comprehend that, while I assume computer vision may not be able to tell that there is something missing in the image, and it’s only programmed to work with the norm.

In terms of computer vision’s capacity for tracking and surveillance and it’s effect on its uses in interactive art, I think one of the examples from the article, Suicide Box, combines those two ideas nicely. The tracking and surveillance aspect of computer vision has been used to create an art piece (kind of) about suicide and to emphasize irregularities in data. An issue that immediately comes up for me with computer vision is privacy concerns. A tool that once so heavily used for tracking and surveillance, to now be used in interactive art may be suspicious to viewers. Viewers may be paranoid that these art pieces are collecting data about them, however, I’m not sure if this is a common concern, considering most art pieces we’ve looked at that use computer vision have been well-received.

Week 5 – Reading Reflection

It was interesting to learn about how computers actually see and what stood out for me was the various methods employed by a computer to see and make decision or create art. Selection of a computer vision technique adds complexity to the interactive works and alters how one can interact with the work. The right technique must also be selected to minimize errors and ensure consistency in the art as some techniques are known not to perform well in certain conditions.

One possible application of this could be how an interactive artwork involving computer vision can be placed strategically in an arts exhibition to accentuate or improve the vision of the work. Carefully selected piece of art can be placed around the work to generate the needed contrast, brightness or effects for the computer vision just like how the white Foamcore was used for the LimboTime game.

The use of surveillance to generate arts was also something worth taking a look at. Are there any privacy restrictions or laws protecting the identities of the people in these forms of arts and how are their privacies protected? The work Suicide Box by the Bureau of Inverse Technology makes me question if artist actually have the right to use data or information like this to create a piece of work. It gives me the impression that they are amusing tragedy. I am also left with the question: how do they respect the dignity of those who jumped off the bridge?

Week 5 – Reading Response | COMPUTER VISION FOR ARTISTS AND DESIGNERS

When I think of Computer Vision, the first thing that comes to my head is this coder called the Poet Engineer on social media who uses computer vision to create the most insane visuals purely from the camera capturing their hand movements. They have the coolest programs ever. I also love it when artists make videos of them creating cool things with their hands purely through code, and one of my favourite examples of using code to create art is Imogen Heap’s MiMu gloves. And, also, the monkey meme face recognizer I keep seeing everywhere (photo attached). It still baffles me that we can use our hands and our expressions to control things on a device that usually interacts with touch! So, this reading was one of my favourite readings so far, because it discussed one of the main concepts that hooked me into interactive media in the first place.

From what I understood of the text, the primary difference between computer and human vision is that while a human observer can understand symbols, people or environmental context like whether it’s day or night, a computer (unless programmed otherwise) perceives video simply as pixels. Computer vision uses algorithms now to make assertions about raw pixels, and even then, designers need to optimize the physical environment to make it “legible” to the software, such as using backlighting to create silhouettes or using high-contrast and retroreflective materials. Despite these limitations, is it still not insane that we’ve evolved so much that we can make computers identify specific things now, despite it being a computer? The fact that now computers can have hardware that goes beyond our own capabilities, such as infrared illumination, polarizing filters and more is almost scary to think about. I’d also say that computer vision is much more objective than human vision. Is it possible for computers to suffer from inattentional blindness as much as we do? For example, when we enter a room and fail to see something and then we come back and the object is right there and it never moved, is a computer capable of the same thing?

I liked that this reading stated down the different techniques used in computer vision, because when I originally understood CV, I was overwhelmed by the amount of things it could sense. I understood these techniques (and I’m listing them down so I can refer to them later as well):

Frame Differencing / Detecting Motion: Detects motion by comparing each pixel in a video frame to the corresponding pixel in the next frame.
Background Subtraction / Detecting Presence: Detects the presence of objects by comparing the current video frame to a stored image of an empty background.
Brightness Thresholding: Isolates objects based on luminosity, by comparing brightness to a set threshold. (I did an ascii project a few years ago, where it would capture your image, figure out the contrast and brightness and then replicate the live video input as letters, numbers and symbols. I would like to replicate that project with this concept now!)
Simple Object Tracking: Program computer to find the brightest or darkest pixel in a frame to track a single point.
Feature Recognition: Once an object is located, the computer can compute specific characteristics like area or center of mass (this is CRAZY).

There are definitely more techniques that are out there, but I’ll start off with the basics, since I’m a complete beginner at this. I did want to try using feature recognition paired with simple object tracking, something I noticed is used in hand tracking (and the monkey video. LOL).

I mentioned the objectivity of CV earlier, but what happens if the datasets that they are trained on are biased? What if the creator behind the program has their own biases that they implement into the program? I like how Sorting Daemon (2003) mentioned looking at the social and racial environment, because I was wondering about situations where CV could be programmed to unintentionally (or intentionally) discriminate against certain traits such as race, gender, or disabilities. Surveillance is a scary concept to me too, because what happens to the question of consent? While computer vision could be used to reveal hidden data in environments that are often overlooked, create programs that can help people without the need for a human to be present (e.g. Cheese), and so many other cool things, it could also be used in a negative way. I need to make sure to find a way that any programs I create with CV are inclusive and not used for ill intent.

Midterm Progress Report

Concept:

Throughout the Assignments, I really feel in love with Assignment 3 where I made this mesmerizing colorful display. Even as developing that production, I saw that there is more to be made and even playing around with some of the variables, inspired me to make it the core focus of my midterm project. I think if time allows, I really want to create a magnificient interactive display, on that will be close with the viewer of this.

The main concept is customization of the colored canvas. I plan to add options so that the user can interact with key things of the project, such as sliders for the direction of the balls on screen (both in the X and Y direction). There will also be an option for the user to change around the RGB colors in order to get the desired color they wish. But the main thing I want to incorporate is the text from Assignment 4 and it would be surrounded by the colorful balls. Also I could see the user having so the mouse interrupts the flow of the balls, similar to how the mouse interrupts the text in Assignment 4.

Design

The design process is mainly extending and adding more features to the colorful concoction project. Firstly, there’s going to be an intro screen, which the user will be guided into what exactly the project is, and give them an overview of what’s to come. There will also be instructions for what the user could do to interact further with the project.

Then, when the user is ready, it will switch to the generative artwork. There is going to be sliders or probably text boxes, where the user will add a value and it will change something from the artwork. This includes range of color, the direction and speed of the balls, and a text box so the text can be displayed on screen. Finally there will be a button so that the user can take a picture of their final artwork.

Challenging Aspects:

I think the biggest challenge is mainly implementing the text and getting it to be a blockade for the balls so that it surrounds them. In a sense, the balls need to recognise the letters as a wall, so not only do the balls surround it, but also bounce off if they change. It’ll be a case of playing aronud with direction vectors.

Another challenging aspect is making the sliders, as I do not have any experience with making sliders so that they can dynamically change different parts of the artwork.

Mitigating Risk:

In terms of implementing the text, I plan to experiment with the text and seeing how it will affected by other objects. As a starting place, I could take the code which I used to make it so the balls do not go outside of the walls, and try to implement it for the letters. Then from there, I can manipulate the variables in order to get the desired effect I want.

For the sliders, I will read up upon how they’re implemented. Most likely our friends at the Coding Train, have made a video about how to use sliders so they will be a great starting ground. From there, I can extended them so the sliders can manipulate the variables of say, the color or direction of the balls.

Week 5 – Reading Reflection

It’s easy to forget that computers don’t actually see anything. When we look at a video feed, we instantly recognize a person walking across a room. A computer just registers a grid of numbers where pixel values shift over time. Because of this, computer vision is incredibly fragile. Every tracking algorithm relies on strict assumptions about the real world. If the lighting in a room changes, a tracking algorithm might completely break. The computer doesn’t see “general” picture with context, since it only knows the math it was programmed to look for.

Basic Tracking Techniques

To avoid this blindness of the computer, some techniques are used to track/react to things the developers are interested in.

- Frame differencing: comparing the current video frame to the previous one. If the pixels changed, the software assumes motion happened in that exact spot.
- Background subtraction: memorizing an image of an empty room. When a person walks in, it subtracts the “empty” image from the live feed to isolate whatever is new.
- Brightness thresholding: tracking a glowing object in a dark room by telling the software to ignore everything except the brightest pixels.
- Simple object tracking: This involves looking at the color or pixel arrangement of a specific object and looking for those same values as they move across the screen.

Surveillance in Art

I believe that the fact that people use the technology made for surveillance and military to create art is very interesting. I believe that using technology built for control to create art is truly impressive: flipping the understanding of this technology, or even making it very double-sided. While interactivity that comes with such tracking technology has a huge variety, and sometimes feels magical and extremely emotional, it comes from the computer tracking, analyzing and reacting to every move of the person in front of it. Such art presents the invisible unsettling surveillance we have everyday to a work of art that makes it extremely present.

Honestly, this military baggage explains a lot of computer vision’s blind spots. If you’re designing a system just to monitor crowds or track moving targets, you don’t need it to understand the whole scene and all details. You just need fast analysis of tiny differences, like a shift in pixels.

However, I feel like in interactive media details are very important, and that art runs on them. This way, while computer vision has not yet reached the state when it can analyze everything at once, artists have to come up with algorithms that will try to do it instead.

Reading Reflection Week 5: The visionary difference between a Computer and a Human

I found it quite interesting seeing how computer vision actually is different than human vision. Initally I assumed that computer vision being chock full of the knowledge we would provide from the side of AI, it would be able to, at the very least analyze what the image is. However I was surprised to find out how computers only really see grids of pixel and a fully relient on mathematical algorithms, in order to get a cleaner picture of what is on screen. Whereas uh humans, we’re able to distinguish an object from a background and different lighting, computers have a hard time to tell a shadow passing along a room.

However with regards to the use of tracking and surveilence, I would say it honestly opens up a world of possibilities to make use of body tracking as a controller for many games and loads of interactive media artworks. The coolest one I’ve personally seen so far is Just Dance. It utilizes a camera for motion tracking so that its able to give an accurate assessment if the dance moves match up with the computer’s example. It’s main concept isn’t just a gimmik, but the crux of the main functionality of the game. But it’s the implementation where you get an accurate assessment of whether you follow the dance moves and can give you instant feedback, through the use of sound effects, that is very useful. And I mean, with regards to interactive media, this will allow say, people to interact with our art in a deeper way so that they can genuienly feel immersed in the art in question.

Week 4 – Generative Text

For this assignment I created a kinematic typography sketch using the word “MADINA.” I wanted the word to feel like it is in motion. My main inspiration was Patt Vira’s kinetic typography work, where letters shift in rhythm. I liked how those examples use simple motion to give a word a stronger presence, so I focused on one word and explored movement across time.

I used p5.js together with opentype.js and geomerative. First I loaded the font “BebasNeue-Regular.ttf” and converted the word “MA D I NA” into a vector path. Then I resampled the outlines into many points. In draw, I repeated those points multiple times in vertical layers. I applied a sine function to the x position and a gradual offset to the y position, so each layer moves like a wave. I kept the color palette minimal with a dark blue background, white strokes, and semi transparent blue fills. Patt Vira’s kinetic typography guided my decisions about rhythm and repetition.

I wrote the sketch in p5.js geomerative to work with vector text. In setup, I created the canvas, set angle mode to degrees, and loaded the font file “BebasNeue-Regular.ttf” with opentype.load. After the font loaded, I called font.getPath on the string “MA D I NA” with a large font size, then wrapped the commands in a geomerative Path object. I resampled this path by length so the letters turned into a dense list of points. I looped through the commands and, whenever I encountered a move command “M,” I started a new sub array in points. For each drawing command that was not “Z,” I pushed the x and y coordinates into the current sub array as p5 vectors.

In draw, I cleared the background to a dark blue color, set stroke weight and stroke color, and translated the origin so the word appears centered on the canvas. I used a nested loop. The outer loop moves through the number of layers, from num down to zero. The inner loop moves through each group of points for each letter. For some letter indices I used noFill to keep only outlines, and for others I used a semi transparent blue fill. Inside beginShape and endShape, I looped over the points and applied a sine based offset to the x coordinate with r * sin(angle + k * 20), and a vertical offset of k * 10 to the y coordinate. This creates layered copies of the word that shift in x and y as angle increases. At the end of draw, I incremented angle by 3 so the sine function changes over time and the typography keeps moving.

let font;
let msg = "MA D I NA"; let fontSize = 200; 
let fontPath; let path; let points = [];

let num = 20; let r = 30; let angle = 0;

function setup() {
  createCanvas(700, 400);
  angleMode(DEGREES);
  opentype.load("BebasNeue-Regular.ttf", function(err, f){
    if (err) {
      console.log(err);
    } else {
      font = f;
    }
    
    fontPath = font.getPath(msg, 0, 0, fontSize);
    path = new g.Path(fontPath.commands);
    path = g.resampleByLength(path, 1);
    
    for (let i=0; i<path.commands.length; i++) {
      if (path.commands[i].type == "M") {
        points.push([]);
      }
      
      if (path.commands[i].type != "Z") {
        points[points.length - 1].push(createVector(path.commands[i].x, path.commands[i].y));
      }
    }
    
    
  });
  
}

function draw() {
  background(0, 0, 139);
  strokeWeight(3);
  stroke(255);
  translate(40, 170);
  
  for (let k=num; k>0; k--) {
    for (let i=0; i<points.length; i++) {
      if(i == 1) {
        noFill();
      } else if (i == 3) {
        noFill();
      } else {
        fill(0, 0, 255, 100);
      }
    beginShape();
      for (let j=0; j<points[i].length; j++)      {
        vertex(points[i][j].x + r*sin(angle + k*20), points[i][j].y + k*10);
      }
      endShape(CLOSE);
    } 
  }
  angle +=3;
}

Week 4 – Reading Reflection

One thing that always confuses me is the variety of modes on some household items. When using an iron, I see that spinning the circle increases the steam production, and for people who have no idea which level is needed for which clothes, they write the names of the materials on the same circle respectively. What drives me mad is that washing machines and dryers are NEVER intuitive. What’s the difference between Cupboard Dry and Cupboard Dry+ if they take the same time and operate at the same temperature? What is the difference between Gentle and Hygiene, and why is the time difference there 3 hours? And to actually figure out the difference, you have to find the name of the machine (which will never match its actual name), look it up in some 2008 PDF file on the very last Google page, and it still won’t answer the question. I always use Mixed washing and Cupboard Dry just because it works, and I have no idea how the other regimes work. And as Norman says, it’s not me being stupid, but the design allowing for these mistakes.

“The same technology that simplifies life by providing more functions in each device also complicates life by making the device harder to learn, harder to use”

I think my example perfectly supports this idea, since the bad design of all these items: with no signifiers, no clear affordances, and no clear conceptual model formed either through life experience or through using the item, just creates more confusion and makes the user always choose one method instead of the huge variety of (probably) useful and functional ones.

I think one way to fix it is to provide some sort of manual, even a tiny table on the edge of the machine would help so much to at least understand which method does what and what the difference between them is. Another way is to display something on the small screen that almost every machine has, like all the characteristics and statistics that are unique to each method, or some short warnings/instructions. Another way to solve this problem is to at least make small illustrations near each method that actually depict what the method does. Genuinely, it would help unleash the potential of these machines and help people use them.

Talking about interactive media, I think the principles Norman talks about are really applicable and foundational.

Sometimes great art pieces with very interesting and complex interactions can be overlooked just because people can’t figure out how to interact with them. I believe that it is very important to design the piece in a very intuitive or guiding way, a way that encourages the user to make the interaction that the author created. As Norman says, humans are really predictable, and in this way, some silent guiding design (not notes, not manuals, but the design itself) should trigger the interaction that is meant to be done in order to experience the art.