Week 5 – Reading Response

What are some of the ways that computer vision differs from human vision?

While humans tend to rely on context, experience, and intuition to recognize objects and interpret scenes in front of them, computers process raw pixel data and require algorithms for example to to make sense of whatever visual input they are “seeing”. In addition to that, human vision can naturally adapt to different lighting and lighting changes, whereas computer vision can struggle with color perception under different illumination conditions. Similarly, motion recognition when it comes to humans is more intuitive and predictive, whereas in computers, it depends on frame differencing or object tracking for example to detect movement.

What are some techniques we can use to help the computer see / track what we’re interested in?

According to the paper, to help computers see and track objects we’re interested in, frame differencing can be used. When using this technique, frames are compared, and if pixels change between these frames, the computer sees it as movement. Another technique is brightness thresholding, which separates objects from the background based on their brightness levels. In simple terms, the process involves setting a specific brightness value (aka the threshold), and any pixel brighter or darker than that value is considered part of the object or background. For example, in an image, if the threshold is set to a certain brightness level, pixels brighter than that will be identified as the object, and those darker will be treated as the background.

How do you think computer vision’s capacity for tracking and surveillance affects its use in interactive art?

I think computer vision’s capacity for tracking and surveillance has really  expanded the possibilities of interactive art by allowing artist to create art that allows real time audience engagement and way more personalized experiences. Installations can now respond dynamically to movement and different  gestures, creating really immersive environments that evolve based on the viewer’s presence. By using computer vision, interactive art becomes more fluid and responsive, transforming the more traditional passive viewing of art, into something more active and engaging. I think as a whole, this technology not only enhances and improves  the storytelling and emotional impact of art, but also opens new doors in terms of  large scale public art and immersive installations that really blur the line between the digital and physical worlds.

Mid Term Project

Concept

“Stock Picker Fun” is a fast-paced, simplified stock market simulation game. The player’s goal is to quickly decide whether to buy or sell stocks based on their recent price trends. The game features:

  • Simplified Stocks: Three fictional stocks (AAPL, GOOGL, MSFT) with fluctuating prices.
  • Quick Decisions: Players must make rapid buy/sell decisions based on visual cues.
  • Visual History: Mini-graphs display each stock’s recent price history, aiding in decision-making.
  • Clear UI: A clean and intuitive user interface with color-coded indicators.
  • Progressive Difficulty: The speed of stock price changes increases over time, adding challenge.
  • Profit/Loss Tracking: A simple display of the player’s money and score.

A Highlight of Some Code That You’re Particularly Proud Of

I’m particularly proud of the drawGraph() function:

function drawGraph(data, x, y) {
    stroke('#fff');
    noFill();
    beginShape();
    for (let i = 0; i < data.length; i++) {
        vertex(x + i * 5, y + 40 - (data[i] - data[0]) * 0.5);
    }
    endShape();
    noStroke();

Embedded Sketch

Reflection and Ideas for Future Work or Improvements

Reflection:

This game successfully simplifies the stock market experience, making it accessible and engaging for a wide audience. The visual history and clear UI provide valuable feedback, allowing players to quickly grasp the mechanics and make informed decisions. The progressive speed adds a layer of challenge, keeping the gameplay dynamic.

Ideas for Future Work or Improvements:

  1. More Data Visualization:
    • Add candlestick charts or other advanced visualizations to provide more detailed stock information.
    • Implement real-time data streaming from an API to simulate live market conditions.
  2. Advanced Trading Features:
    • Introduce different order types (limit orders, stop-loss orders).
    • Add the ability to short stocks (bet on price declines).
    • Include options trading.
  3. Dynamic News Events:
    • Generate random news events that impact stock prices, adding an element of unpredictability.
    • Use visual cues or animations to indicate the impact of news.
  4. User Profiles and Persistence:
    • Implement user profiles to save game progress and track performance over time.
    • Use local storage or a database to persist data.
  5. Sound Effects and Animations:
    • Add sound effects for buy/sell actions, price changes, and game events.
    • Incorporate more animations to enhance the visual feedback and create a more immersive experience.
  6. More Stock types:
    • Add more stock types, that have different volatilities.
  7. Game over conditions:
    • Add game over conditions, such as running out of money.
  8. Add a pause feature:
    • Add a pause feature to the game.
  9. Mobile optimization:
    • Optimize the game for mobile devices, using touch controls and responsive design.

By implementing these improvements, the game can be transformed into a more comprehensive and engaging stock market simulation.

Week #5 Reading – Computer Vision

Introduction

Computer Vision is the amalgamation of various mathematical formulae and computational algorithms accompanied by several computational tools – capable of carrying out the procedure. What was once deemed to expensive and high level (limited to experts such as on AI and signal processing), computer vision now has become readily available. Various software libraries and suites provide student programmers with the ability to run and execute those algorithms required for the object detection to work. The cherry on top, with mass refinement and larger availability of computer hardware, at a fraction of cost of what would have been in the early 1990’s,  now anyone, and by anyone I mean all of the institutions can access it and tinker around with it.

Difference between computer and human vision:

Computer Vision has a designated perimeter, where it scans for array of objects vertically and horizontally. Upon detecting a change in shade of pixel, it infers detection. By using complex algorithmic thinking, which is applied in the back-end, it is able to analyze and detect movement among various other traits such as character recognition etc. Various techniques like “Detection through brightness thresholding” are implemented. Alongs the similar lines happens to be the human vision. Our retinas capture the light reflecting from various surfaces, using which our brain translates the upside down projection into a comprehensible code. Our brain is trained to interpret objects, while computer vision requires algorithmic understanding and aid of Artificial intelligence. With AI, amongst a data set, training is done be it supervised or not, to teach the computer how to react to a certain matrix of pixels i.e image scanned.

Ways to make computer vision efficient:

As mentioned in the reading and the paper,  one of the things that I love is ‘background subtraction’. The capability to isolate the desired object. In my opinion, tracking several objects using this technique, and having a variety on the trained data set helps with more accurate and precise judgment. Especially if many objects are present at the same time. However, other techniques such as ‘frame differencing’ and ‘brightness thresholding’ exist as well. Also, from other readings, the larger the data set, and training time, the more the accuracy. However, to acquire image data, it comes with ethical dilemmas and added cost.

Computer Vision’s surveillance and tracking capability, and its implementation in interactive Media:

Works like Videoplace and Messa di Voce are example of earlier demonstration of interactive media and computer vision’s combination. Installations can track and respond to human input. This ‘feedback loop’ triggers a sense of immersion and responsiveness. In my humble opinion, the use of computer graphics takes away the user from traditional input techniques and gives them freedom to act as they will. Though it is also true, that the computer will make the sense out of the input, adjacent to the trained data set, and a totally random input might lead the system to fail. This is where the idea of ‘degree of control’ comes into play. Personally, I believe, as long as we have a combination of interactive components, user will never get tired of running inside the same tiring maze, but the use of computer vision definitely makes it seem less tangible and more user centered. Hence, I decided to use it for my midterm project as well!

Week #5 – Midterm Progress

1. Concept

I was inspired by the ‘coffee shop expo’ project, where the user can explore a place and click on different buttons freely; but I wished for more room for exploration and especially more freedom, in the way that a user can control a sprite  using keys to move around. Then, if this sprite lands on a specific place, a new place of discovery opens up.

I spent a considerable amount of time to develop the concept: a 2D RPG adventure, casual game in a pixelated world with a medieval setting, with story snapshots (as clear, not-so-pixelated images) appearing from time to time as a quest is completed. The user takes on a role as a character (represented with a sprite) that has migrated to a new city and wants a new job to get some money. There is a task board in a tavern villagers come to post various miscellaneous tasks. The user can pick a job from it, some examples below though my goal is to incorporate at least two from the following:

    • fetch water from the well,
    • harvest a number of carrots,
    • doctor’s mission – fetch herbs for medicinal purposes,
    • help me find my dog,
    • blacksmithing job,
    • help deliver letter.

I was wondering whether it would be strange to haveboth pixelated and non-pixelated graphics. After explaining my concept idea to my friend, my friend thought of a game that existed like that: “Omori.” Omori has pixelated sprites but clear CG scene snapshots, as well as iconic music – things to get inspired by – as well as a fan wiki for sprites which I could try to make good use of.

Interactive elements are to be present, such as a mouse click for opening a door and revealing a character’s speech. Audio is also heard for different story scenes, which should be chosen appropriately based on the atmosphere – eg. music with a sense of urgency on the task  board, relaxing music in the village, music with a sense of triumph and excitement when the quest is completed, etc. On-screen text could be used to beckon and inform the user of narration, instructions and dialogue.

After the experience is completed, there must be a way to restart the experience again (without restarting the sketch).

2. Code Highlights

I think the main challenge would be in the OOP, specifically making the character classes. I found a useful resource for collecting sprites: Spriter’s Resource. In particular, I would like to use village sprites from Professor Layton and the Curious Village. Here are the Character Profiles. I selected the following from the various characters in Professor Layton and the Curious Village:

  • Luke (user’s character)
  • Franco (farmer who needs help with harvesting carrots),
  • Ingrid  (neighbour grandma who needs help with delivering a letter),
  • Dahlia (noblewoman with a lost cat),
  • Claudia (Dahlia’s cat),
  • Lucy (young girl who needs help with getting herbs for her sick mother),
  • Flora (mysterious herb seller).

Since the challenge lies in OOP, I would like to practise making an object, namely the user’s character “Luke.”

In the development of the code, I found it challenging to adapt the walk animation code (discussed in class with Professor Mang) it in two ways: (1) into an OOP format and (2) to adapt it to my spritesheet, which does not have different frames of walking and does not have different different directions that the sprite faces. With (1):

  • I decided to have variables from the walk animation placed into the constructor as the class’s attributes.
  • Instead of keyPressed() as in the walk animation, I used move() and display() since keyPressed() is not an allowed function within a constructor (probably it’s an issue because of it being local when it should be a global function)
class User_Luke {
  constructor() {
    this.sprites = [];
    this.direction = 1;  // 1 = right, -1 = left
    this.step = 0;
    this.x = width/20;
    this.y = height/15;
    this.walkSpeed = 3;
    this.scaleFactor = 0.2; // Scaling factor
    
    // 12 images across, 4 down, in the spritesheet

    let w = int(luke_spritesheet.width / 6);
    let h = int(luke_spritesheet.height / 2);

    for (let y = 0; y < 2; y++) {
      this.sprites[y] = [];
      for (let x = 0; x < 6; x++) {
        this.sprites[y][x] =
          luke_spritesheet.get(x * w, y * h, w, h);
      } // iterate over rows
    } // iterate over columns

    this.x = width / 2;
    this.y = height / 2;

    imageMode(CENTER);

    // Display first sprite
    image(this.sprites[0][0], this.x, this.y);
  }
  move() {
    ...
  }
  display() {
    ...
}

 

With (2), I set conditions if direction is 1 (representing right direction) or -1 (representing left direction). Since my spritesheet only shows the sprites in one direction, I used an image transformation:

class User_Luke {
  constructor() {
    ...
  }
  move() {
    ...
  }
  display() {
    let spriteWidth = this.sprites[0][0].width * this.scaleFactor;
    let spriteHeight = this.sprites[0][0].height * this.scaleFactor;
    
    // Finally draw the sprite
    // The transparent areas in the png are not
    // drawn over the background
    if(this.direction === -1) {
      image(this.sprites[0][0],this.x,this.y, spriteWidth, spriteHeight)
    } 
    else if(this.direction === 1) {
      // We will use the scale() transformation to reverse the x-axis.
      // The push and pop functions save and reset the previous transformation.
      push();
      // Scale -1, 1 means reverse the x axis, keep y the same.
      scale(-1, 1);
      // Because the x-axis is reversed, we need to draw at different x position.
      image(this.sprites[0][0], -this.x, this.y, spriteWidth, spriteHeight);
      pop();
    }
  }
}

I also noticed my sprite appear at a huge size – to deal with this, I used a scale factor in the spriteWidth and spriteHeight (already shown in above code).

3. Embedded Sketch

4. Reflection and Next Steps

I experienced multiple challenges along the way, but I gained valuable experience with OOP. I feel that making the next sprites won’t be so challenging, since I can use Luke’s code as reference and adapt it. I think it will be important for me to plan deadlines for me since there are big subtasks for this midterm project including:

  • Finding background(s)
  • Finding snapshots
  • Coding all the sprites
  • Interactive elements – door open animation, ‘choose a job’ buttons

Reading Reflection – Week#5

  • What are some of the ways that computer vision differs from human vision?

No computer vision algorithm is universally able to perform its intended function (eg. recognizing humans vs. background) provided any kind of input video, unlike the human eye and brain which can work together in general to perform its intended function (eg. recognizing humans vs. background). Instead, the object detection algorithm or tracking algorithm crucially relies on distinctive assumptions about the real-world scene that it is to study. If the algorithms’ assumptions are not met, then it could perform poorly, producing not very valuable results, or completely fail in its function. Take a first example: frame differencing is a computer vision technique that detects objects by detecting object movements. This is achieved by comparing the corresponding pixels of two frames by finding the difference in color and/or brightness between all corresponding pixels. Thus, the frame differencing algorithm can perform accurately on “relatively stable environmental lighting,” and “having a stationary camera (unless it is the motion of the camera which is being measured).” Hence, providing videos with much active movement, like in the NBA games, would be much more suitable than providing videos of focused people in the office. In addition to frame differencing, background subtraction and brightness thresholdings are more examples where having some presumptions are important for computer vision tasks. Background subtraction “locates visitor pixels according to their difference from a known background scene” while brightness thresholding uses “hoped-for differences in luminosity between foreground people and their background environment.” Thus, considerable contrast in color or luminosity between foreground and background is important for an accurate recognition of objects; otherwise, in nighttime scenes, the algorithm may detect objects in the video scene incorrectly as background. On the other hand, I personally feel that the human eye remarkably uses a combination of these three, and perhaps more, algorithms to detect objects, which allows it to perform extraordinarily well compared to current computer vision.

 

  • What are some techniques we can use to help the computer see / track what we’re interested in?

It is of great importance to design a physical environment with the conditions best suited for the computer vision algorithm, and, in the other way, select software techniques that are best with the physical conditions at hand. There are interesting examples that stood out to me for enhancing the suitability and quality of the video input provided to the computer vision algorithm. I believe that infrared (as used in night vision goggles) should complement conventional black-and-white security cameras, which can massively boost signal-to-noise ratio of video taken in low-light conditions. Polarizing filters are useful to handle glare from reflective surfaces, especially in celebrity shows. Of course, there are lots of cameras to consider as well, optimized for “conditions like high-resolution capture, high-frame-rate capture, short exposure times, dim light, ultraviolet light, or thermal imaging.”

 

  • How do you think computer vision’s capacity for tracking and surveillance affects its use in interactive art?

Computer vision’s capacity for tracking and surveillance opens doors for interactivity between the computer and the human body, gestures, facial expressions and dialogue/conversations. Already some difficult algorithms can correctly identify facial expressions, which could be used to detect someone’s emotional levels, and can be used in mental health initiatives to help people suffering emotionally. This might relate to Stills from Cheese, an installation by Christian Möller. Additionally, like in Videoplace, participants could create shapes using gestures, and their silhouettes in different postures can be used to form different compositions. If computer vision were combined with audio and language, then systems could be even more interactive with the increase in affordances.

Week 5 – Computer Vision for Artists and Designers

Week 5 – Reading Response

->  Computer Vision for Artists and Designers

I believe that computer vision differs from human vision in several ways. First, unlike human vision, which is connected to cognition, experience, and contextual understanding, computer vision processes images as raw pixel data. This means that it does not know what it is looking at unless it is programmed to recognise specific patterns. Second, humans are able to recognise objects in various lighting conditions and angles, while computer vision usually has a hard time with these unless trained on extensive datasets. Additionally, human vision uses sensory data in real time with prior knowledge, while computer vision relies on predefined algorithms to take out meaningful information. Lastly, on instinct, humans are able to infer meaning from abstract images or recognise emotions in facial expressions, but computer vision needs complex models to achieve basic levels of recognition.

Due to the fact that computer vision lacks human intuition, we can improve its ability to recognise and track objects. This may be done by using lighting conditions to ensure the subject is distinguishable from the background (Myron Krueger’s Videoplace). Also, you could enhance visibility in low-light conditions by using infrared light, which is not visible to humans but detectable by cameras. Computer vision’s ability to track people and objects has a significant impact on interactive art. This means that artists can now use vision-based systems to create interactive installations that respond to body movement, voice, or gestures. It can also be used to highlight issues of privacy, surveillance, and control (Sorting Daemon). Overall, computer vision can reshape interactive art by allowing new forms of engagement, however, its surveillance capabilities also raise questions about privacy and ethics. This means that while this technology enables creative expression, it can also be a tool for control, highlighting the importance for artists/designers to handle these implications thoughtfully.

Week 5 – Midterm Progress

Concept 

For my midterm project, I am developing a game called Balloon Popper, which incorporates everything we have learned so far. In this game, balloons fall from the top like rain, and the player must pop them before they reach the bottom. The more balloons the player pops, the faster they fall and the more balloons appear, increasing the challenge dynamically. The score is based on the number of balloons popped.

Code Structure

The game will be structured around object-oriented programming (OOP) principles, utilizing classes and functions to manage different elements:

Balloon Class: Defines properties such as position, speed, size, and color. Handles movement and collision detection.

// Balloon Class
class Balloon {
  constructor(x, y, size) {
    this.x = x;
    this.y = y;
    this.size = size;
    this.color = color(random(255), random(255), random(255));
  }

  move() {
    this.y += 2; // moving ballon downwards
  }

  display() {
    fill(this.color);
    noStroke();
    ellipse(this.x, this.y, this.size, this.size * 1.3); // Oval balloon shape

 

Shooter Class: Represents a player-controlled shooter at the bottom of the screen, used to aim and pop balloons.

Game Manager: Handles overall game logic, including score tracking, difficulty scaling, and user interactions.

Interactivity: The player moves the shooter left and right and fires projectiles to pop balloons.

Challenges and Uncertain Aspects

One of the most complex aspects of this project is implementing multiple difficulty levels (Easy, Medium, Hard). I am unsure of how feasible it will be within the project timeline. Additionally, I was initially uncertain about whether to allow players to pop balloons using a shooter at the bottom or direct mouse clicks.

Risk Mitigation and Adjustments

To ensure feasibility, I decided to focus on dynamic speed increase as the main difficulty progression instead of distinct levels. This allows the game to scale naturally in difficulty without the need for predefined level transitions. However, I may still explore the possibility of adding a multi-level aspect if time permits. Additionally, I have chosen to implement shooters at the bottom rather than mouse clicking, as this adds an extra layer of interactivity and skill to the game.

Week 5 – Reading Reflection

  • What are some of the ways that computer vision differs from human vision?

Human vision involves cognitive processes that allow us to interpret context, recognize objects without any limitation to the lighting conditions and angles, and also to make intuitive inferences. In contrast, computer vision relies on complex algorithms that analyze pixel data without context or intuition. Unlike human vision, which naturally adapts to varying conditions, computer vision relies on structured methods such as frame differencing, background subtraction, and brightness thresholding to detect motion, presence, or objects of interest​.

  • What are some techniques we can use to help the computer see / track what we’re interested in? 

As recorded in the paper, one of the greatest challenges in computer vision is enabling computers to make accurate detections and distinguish between “what is” and “what was”— key factor in motion and presence detection. Several techniques help achieve this: Frame Differencing: This method detects motion by comparing differences between consecutive frames, identifying areas where pixel values have changed. Background Subtraction: This technique captures an image of an empty scene as a reference and then compares incoming frames against it. Any changes are flagged as new objects. However, it is highly sensitive to lighting variations. Brightness Thresholding: Controlled illumination and surface treatments (such as using high-contrast materials or backlighting) help distinguish objects based on their brightness levels, making tracking more effective in interactive environments. By combining these methods, computer vision can better track motion, recognize objects, and adapt to artistic applications​

  • How do you think computer vision’s capacity for tracking and surveillance affects its use in interactive art?

The surveillance capacity and tracking ability of computer vision can be used to store and present anomalous data in a creatively artistic way. Many artists have integrated these capabilities to create interactive installations that respond to human movement and behavior. Myron Krueger’s Videoplace (1969-1975), for example, allowed participants to interact with digital graphics using only their silhouettes, demonstrating how computer vision can enable body-driven interaction. Similarly, Messa di Voce (2003) used head-tracking and speech analysis to create a dynamic visual experience where graphics appeared to emerge from performers’ mouths, merging performance with real-time digital augmentation.

Week 5 – Reading Response

Computer vision differs from human vision in many ways. One of the key differences would be that human vision is naturally adaptive to differences in lighting and can naturally understand objects, track motion and use context to recognize different emotions and patterns. On the other hand computer vision heavily relies on mathematical algorithms do detect objects and track movement. Difference in lighting usually causes computer vision to break or not work properly and it is generally impossible for computers to understand context without the use of advanced ai.

There are several ways with which we can help the computer “see” and track what we want it to track. One of the ways is frame comparing, where we tell the computer to compare consecutive frames and with that detect changes which indicate movement. To further improve this concept we could also use background extraction techniques which help us isolate the objects we want to track or see and ignore everything that is in the background.

Computer vision has a wide potential use in interactive media. Many artists have used it to create amazing interactive art which in my opinion feels more interactive then just clicking buttons. Artist use computer vision to create playful interactive experiences which fully immerse the user who feels in complete control of the movement of the object. I believe in the future in combination with ai, computer vision will completely take over in the interactive media industry.

Week 5 – Midterm Progress 

Week 5 – Midterm Progress

Concept:
For my midterm project, I wanted to create a game based on something I love, cats! Growing up and living in Abu Dhabi, I’ve noticed that there are a lot of stray cats, so I wanted to design a game where the player drives around a city, rescues stray cats, and takes them to a shelter. I got inspired by a photography project I did last semester about the spirit of street cats in Abu Dhabi. I went around the city in Abu Dhabi  and captured these cats lives and the environment they are in. (link to the photos). The game will combine movement mechanics, object interactions, and a simple pet care system. The goal of the game is to rescue and rehome all the stray cats before the game ends.

User Interaction and Design:
For the interaction, I would like to implement a way for the player to control the car using arrow keys to move around. The stray cats will be at random locations in the city and if the car touches a cat, it is rescued and sent to a shelter. I was also thinking of adding more to the game, where inside the shelter, the player can click on the cat to heal or feed them. Finally, once all the cats are healthy, the game ends and displays a win screen.
→ Visual:
Start Screen: Shows the game instructions and a “start” button.
Game Screen: Has a city background with a moving car, stray cats, and a shelter section.
End Screen: Congratulates the player and has a restart button.

Code Structure:
In order to ensure the code is organized, I plan to use Object-Oriented Programming by creating three main classes.
Car Class:
– Player movement (arrow keys).
– Checks for collisions with pets.
Pet Class:
– Stores pet location and condition (hungry, injured).
– Moves to the shelter when rescued.
Shelter Class:
– Displays rescued pets.
– Tracks pet status and healing progress.

Challenging Part & How I’m Addressing It:
I think that the most frightening part of this project is implementing collision detection between the car and pets. Because the game involves movement, I need to find a way to detect when the car “rescues” a pet. To try to solve this collision detection, I wrote a small sample using dist() function to check if two objects are close enough to interact. This will reduce my risk by confirming that object detection works before I use it in the full game.

function checkCollision(car, pet) {
let d = dist(car.x, car.y, pet.x, pet.y);
return d < 30;  // If distance is small, they collide
}