I was interested in the Detecting Motion 9code listing 1) concept in this week’s reading. Frame differencing is a simple technique that may be used to detect and quantify motions of humans (or other objects) within a video frame.
The reading emphasizes how computer vision algorithms are not naturally capable of comprehending the subtleties of various physical surroundings, despite their advanced capacity to process and analyze visual input. The physical environments in which these algorithms function can greatly increase or decrease their effectiveness. The use of surface treatments like high contrast paints or controlled illumination like backlighting, for instance, is discussed in the reading as ways to enhance algorithmic resilience and performance. This suggests that the software and the real environment need to have a symbiotic connection in which they are both designed to enhance one another.
This concept reminded me of The Rain Room. In my opinion, this installation has motion sensors that act like computer vision that allow people to move through a space where it is raining everywhere but where they are standing. The computer vision system’s exact calibration with the actual environment is crucial to the immersive experience’s success because it allows the sensors to recognize human movement and stop the rain from falling on the people.
I am super jealous 😤 of people who are extremely good at one thing. Hayao Miyazaki doesn’t have to think that much, but just keep making animated movies.. he’s the greatest artist I look up to. Slash and Jimi Hendrix does not get confused, they just play guitar all time, because they are the greatest musicians. Joseph Fourier does just Mathematics and Physics, with a little history on the side… but he’s still a mathematician.
My lifelong problem is that I specialize in everything, in extreme depths. When you are a competent artist, engineer, musician, mathematician, roboticist, researcher, poet, game developer, filmmaker and a storyteller all at once, it’s really really hard to focus on one thing….
which is a problem that can be solved by doing EVERYTHING.
Hence, for my midterm, I am fully utilizing all a fraction of my skills to create the most beautiful interactive narrative game ever executed in the history of p5js editor, where I can control the game by improvising on my guitar in real time. The story will be told in the form of a poem I wrote.
Ladies and gents I present you G-POET – the Guitar-Powered Operatic Epic Tale. 🎸 the live performance with your host Pi.
This is not a show off. This is Devotion to prove my eternal loyalty and love of arts! I don’t even care about the grades. For me, arts is a matter of life or death. The beauty and the story arc of this narrative should reflect my overflowing and exploding emotions and feelings I have for arts.
Also, despite it being a super short game made using JavaScript, I want it on the same level, if not better than the most stunning cinematic 2D games ever made in history – the titles like Ori and the Blind Forest , Forgotton Anne, Hollow Knight. They are produced by studios to get that quality, I want to show what a one man army can achieve in two weeks.
Design
I am saving up the story for hype, so below are sneak peaks of the bare minimum. It’s an open world. No more spoilers.
(If the p5 sketch below loads, then use Arrows to move left and right, and space to jump )
And below, is my demonstration of controlling my game character with the guitar. If you think I can’t play the guitar… no no no Pi plays the guitar and narrate you the story, you interact with him, and tell him, oh I wanna go to that shop. And Pi will go “Sure, so I eventually went to that shop… improvise some tunes on the spot to accompany his narrations, and the game character will do that thing in real time”.
See? There’s a human storyteller in the loop, if this is not interactive, I don’t know what is.
People play games alone… This is pathetic.
~ Pi
💡Because let’s face it. People play games alone… This is pathetic. My live performance using G-POET system will bring back the vibes of a community hanging out around a bonfire listening to the stories by a storyteller…same experience but on steroids, cranked up to 11.
No, such guitar assisted interactive performance in real-time Ghibli style game has not been done to my knowledge.
(In the video, you might notice that low pitch notes causes player to go right, and high pitch notes is for going left. There are some noises, I will filter it later.)
To plug my guitar to my p5js editor, I wrote a C++ native app to calculate the frequency of my guitar notes through Fast Fourier Transform and map particular ranges of frequencies to some key press events, which is propagated to the p5js browser tab. A fraction of the C++ code to simulate key presses is
Of course, I need the characters. Why do I need to browse the web for low quality graphics (or ones which does not meet my art standards), while I can create my own graphics tailored to this game specifically?
So I created a sprite sheet of myself as a game character, in my leather jacket, with my red hair tie, leather boots and sunglasses.
But is it not time consuming? Not if you are lazy and automated it 🫵. You just model Unreal metahuman yourself, plug the fbx model into mixamo to rig, plug into Unity, do wind and clothes simulation and animate. Then, apply non-photorealistic cel shading to give a hand-drawn feel, and utilize Unity Recorder to capture each animation frame, do a bit more clean up of the images through ffmpeg, then assemble the spritesheet in TexturePacker and voilà … quality sprite sheet of “your own” in half an hour.
Also when improvising the guitar during the storytelling performance, I need the game background music to (1) specifically tailored to my game and (2) follow a particular key and chord progression so that I can improvise on the spot in real time without messing up. Hence, I am composing the background , below is a backing track from the game.
In terms of code, there are a lot a lot a lot of refactored classes I am implementing, including the Data Loader, player state machine, Animation Controllers, Weather System, NPC system, Parallax scrolling, UI system, Dialgoue System, and the Cinematic and Cutscene systems, and Post Processing Systems and Shader loaders. I will elaborate more on actual report, but for now, I will show an example of my Sprite Sheet loader class.
And I am also writing some of the GLSL fragment shaders myself, so that the looks can be enhanced to match the studio quality games. An example of the in game shaders is given below (this creates plasma texture overlay on the entire screen).
Yes, there were a lot of frightening aspects. I frightened my computer by forcing it to do exactly what I want.
Challenges? Well, I just imagine what I want. In the name of my true and genuine love for arts, God revealed all the codes and skills required to me through the angels to make my thoughts into reality.
Hence, the implementation of this project is like Ariana Grande’s 7 Rings lyrics.
I see it, I like it, I want it, I got it (Yep)
Risk Prevention
Nope, no risk. The project is completed so I know there is no risks to be prevented, I am just showing a fraction of it because this is the midterm “progress” report.
The game I am planning to create will take inspiration from “Diary of a Wimpy Kid,” but with a twist of our university’s culture. That said, I titled it “Diary of an NYUAD Kid,” with each game representing a struggle or something we, students, relate to. At first, the player will be introduced to a menu screen with options to play. To avoid simplicity, I decided to gather a group of games, 4 to be specific (or 3.5, as one of them is a doodling notepad), and implement them to provide users with a variety of choices. All games will be single-player.
Design & Implementation
The first game is a simple game of trying to avoid “the cheese touch,” referenced from the movie. If you failed to avoid the cheese touch, you lose points, but if you succeed, you gain points. The goal here is to gather as many falcon points as possible.
The second game is a memory game titled “Have We Met?” It aims to depict the struggle of being new to campus and meeting so many people, with each card representing a character.
The third game is an elevator game, which I’m planning to title “Rush Hour” or “Elevator Rush.” The goal is to get as many students in the elevator as possible to prevent them from being late to class, inspired by the slow elevators on our campus, specifically C2.
Finally, the fourth semi-game is a student’s notebook where you can sketch or take notes, and then save your sketch as a PNG to your laptop. With the vision set & the base code established, my next step is to digitally design the game aesthetics.
CHALLENGES
Although I have the base code designed for most of the games (with many bugs), I believe it will be a challenge to implement them all perfectly without errors. My goal is to complete all four games, which I know is ambitious, yet I have faith. Another challenge I expect is trying consistency while trying to recreate the same aesthetic as the sketches in the original “Diary of a Wimpy Kid,” but with time and effort, I believe it is possible.
So far I got my game running but it’s still in process. I still have to imbed sound and change a couple of things, but you get the point of the game.
I found it hard to make the sprite and the ball move within the same function, but with the help of Pi, thanks Pi, I was able to do that.
Overall, I still have to add a couple of things, as I have a lot of ideas that I will try to put into action. But so far, I’m happy with my progress, even though my game is simple.
This is some of my code that’s related to the game:
var ball_diameter = 30;
var bomb_diameter = 10;
var xpoint;
var ypoint;
var zapperwidth = 6;
var numofbombs = 20;
var bombposX = [];
var bombposY = [];
var bombacceleration = [];
var bombvelocity = [];
var time = 0;
var timeperiod = 0;
var score = 0;
var posX;
function setup() {
createCanvas(640, 480);
var temp00 = 0, temp01 = -20;
while(temp01 < height){
temp00 += 0.02;
temp01 += temp00;
timeperiod++;
}
posX = zapperwidth + 0.5*ball_diameter - 2;
xpoint = 0.5 * width;
ypoint = height - 0.5*ball_diameter + 1;
initbombpos();
}
function draw() {
background(137, 209, 245);
fill(239, 58, 38);
rect(0,0, zapperwidth, height);
scoreUpdate();
fill(255);
noStroke();
for(var i=0; i<numofbombs; i++){
ellipse(bombposX[i], bombposY[i], bomb_diameter, bomb_diameter);
}
updatebombpos();
fill(31, 160, 224);
ellipse(xpoint, ypoint, ball_diameter, ball_diameter);
xpoint -= 3;
if(mouseIsPressed && (xpoint + 0.5 * ball_diameter) < width) {
xpoint += 6;
}
if(xpoint <= posX || bombCollistonTest()) {
gameover();
}
time += 1;
}
function updatebombpos(){
for(var i=0; i<numofbombs; i++){
bombvelocity[i] += bombacceleration[i];
bombposY[i] += bombvelocity[i];
}
if( time > timeperiod){
initbombpos();
time = 0;
}
}
function initbombpos (){
for(var i=0; i<numofbombs; i++){
bombacceleration[i] = random(0.02, 0.03);
bombvelocity[i] = random(0,5);
bombposX[i] = random(zapperwidth+(0.5*ball_diameter),width);
bombposY[i] = random(height/4, height/2); // Adjust Y position to be between one-fourth and half of the canvas height
}
}
function bombCollistonTest(){
var temp = 0.5*(ball_diameter+bomb_diameter)-2;
var distance;
for(var i=0; i<numofbombs; i++){
distance = dist(xpoint, ypoint, bombposX[i], bombposY[i])
if(distance < temp) {
return true;
}
}
return false;
}
function gameover(){
fill(255);
textSize(32);
textAlign(CENTER, CENTER);
text("GAME OVER", width/2, height/2 - 20);
textSize(15);
text("Press space to restart", width/2, height/2 + 20);
noLoop();
}
function scoreUpdate(){
score += 10;
fill(255);
text("SCORE: " + int(score/timeperiod), width - 65, 15);
}
function keyPressed() {
if (keyCode === 32) {
restartGame();
}
}
function restartGame() {
time = 0;
score = 0;
posX = zapperwidth + 0.5 * ball_diameter - 2;
xpoint = 0.5 * width;
ypoint = height - 0.5 * ball_diameter + 1;
initbombpos();
loop();
}
This week’s reading revisits a familiar situation from the Interactive Media Lab, where a shadow projected on the lab’s TVs welcomes us as we enter. It represents the combination of technology and creativity, opening the way for human journey.
Building on our prior conversations, the text emphasises a key point: technology alone is not enough for achievement. The individuals who drive innovation make a true impact. Myron Krueger’s Videoplace is a good illustration of how skill and vision can transcend both time and technology, leaving an enduring effect on the industry.
Furthermore, the reading goes into the many ways that artists use interactive media. From experimental installations to immersive experiences, it demonstrates the limitless potential of artistic expression in the digital era. It emphasises the significance of user experience and technological affordability in producing meaningful interactive art pieces.
Overall, this reading talks about the different approaches that artists take in regards to Interactive Media. Also, it takes into account some of the things that you have to consider in order for your art piece to work.
10 years since Flappy Bird has been taken down. What a time!
So, when we were required to make a game this is exactly where my mind went. The concept of the game would be the same but I want to make it NYU themed. Instead of the bird, it will be the falcon: FLAPPY FALCON!
In terms of potential challenges I will have to be careful with the code, but I have faith in myself! Let’s get to work!
Levin’s approach to computer vision in the arts serves as a potent democratizing force, effectively breaking down barriers that have traditionally separated the realms of advanced technology and creative expression. In a field that might appear daunting due to its technical complexities, Levin’s narrative fosters an inclusive environment. By presenting computer vision as an accessible tool for artistic exploration, he invites individuals from diverse backgrounds to engage with technology in a creative context. This democratization is crucial because it empowers a wider array of voices and perspectives to contribute to the evolving dialogue between technology and art. It challenges the notion that one must have an extensive background in computer science or fine arts to participate in this innovative intersection, thus fostering a more diverse and vibrant community of creators. The implication is clear: the future of art and technology is not reserved for a select few but is an open field for exploration by anyone with curiosity and creativity.
Moreover, Levin delves into the ethical landscape encountered by artists who utilize this technology to craft pieces that interact with and react to human actions. Issues of privacy, consent, and surveillance emerge as critical considerations. As such the capability of computer vision to potentially breach personal spaces or to be deployed in manners that could exploit or inaccurately portray individuals warrants careful scrutiny.
On Feb 16, 2024, OpenAI released a preview of SORA, a text-to-video diffusion transformer model. With that, almost everyone will be able to (to an extent) generate videos they imagine. We have come a long long way since Myron Kreuger’s 1989 Video Place (Gosh, his implementation makes all my VR experiences weak). In the previous years, a lot of public computer vision models came out and became accessible – YOLO, GANs, stable diffusion, DALL-E mid journey.etc .The entire world is amazed when DALL-E shows its in-painting functionalities. However, it should be noted that such capabilities (or at least theories behind it) were in existence since antiquity (i.e. PatchMatch is a 2009 inpainting algorithm, which later got integrated into Photoshop as the infamous content aware fill tool).
What a time to be alive.
And back in 2006, Golan Levin, another artistic engineer wrote the Computer Vision for Artists and Designers. He gave a brief overview of the state of computer vision, and discussed frame differencing, background subtraction, brightness thresholding as extremely simple algorithms which the artists can utilize. Then gave us the links to some Processing code at the end as examples. I wish that the writing contains a bit more how-to guide and figures on how to set up the Processing interface and so on.
Golan wanted to stress that, in his own words, a number of widely-used and highly effective techniques can be implemented by novice programmers in as little as an afternoon and bring the power of computer vision to the masses. However, in order to get computer vision to the masses, there are certain challenges… mainly not technology, but digital literacy.
The Digital Literacy Gap in Utilizing Computer Vision
From observation, a stunning amount of people (including the generation which grew up with ipads) lack basic digital literacy. There are some “things” you have to figure out yourself once you have used the computer for some time, for instance to select multiple different files at once, hold the Ctrl key and click on the files. On windows, your applications are most likely installed in C:\Program Files (x86). If the app is not responding, fire up the task manager and kill the process in windows, or force quit in mac, or use the pkill command in linux. If you run an application and the GUI is not showing up, it is probably running as a process in the system tray.etc .etc.
However, the masses who use computers on a daily basis for nearly a decade (a.k.a. my dad, and a lot more people, even young ones) struggle to navigate around their computer. For such masses, Golan Levin’s article is not a novice programmer tutorial, but already an intermediate tutorial – one has to have installed Processing in their computer, set up the Java prior to that and so on. Personally, I feel that a lot of potential artists give up the integration of technology due to the barrier of entry of the environment setup (for code based tools and computer vision). Hence, as soon as any enthusiastic artist tries to run an OpenCV code from github on their computer, and when their computer says “Could not find a version that satisfies the requirement opencv”, they just give up.
Nevertheless, things are becoming a lot more accessible. Nowadays, if you want to do such computer vision processing, but don’t want the code, there are Blender Geometry Nodes, Unity Shader Graphs where you can drag around stuff to do stuff. For code demonstrations, there is Google Colaboratory where you can run python OpenCV code without dealing with any python dependency errors (and even get GPUs if your computer is not powerful enough).
Golan mentioned “The fundamental challenge presented by digital video is that it is computationally “opaque.” Unlike text, digital video data in its basic form — stored solely as a stream of rectangular pixel buffers — contains no intrinsic semantic or symbolic information.” This no longer exists in 2024, since you can either use either Semantic Segmentation, or plug your image into any transformer model to have each of your pixels labeled. Computers are no longer dumb.
The Double-Edged Sword of User-Friendly Computer Vision Tools
With more computer vision and image generation tools such as Dall-E, you can type text to generate images, of course with limitations. I had an amusing time watching a friend try to generate his company logo in Dall-E with the text in it, and it failed to spell it correctly, and he keeps typing the prompt again and again and gets frustrated with the wrong spelling.
In such cases, I feel that technology has gone too far. This is the type of computer vision practitioners that these new generations of easy tools are going to produce. Ones who will never bother to open up an IDE and try coding a few lines, or to just get Photoshop or GIMP and place the letters by themselves. Just because the tools get better does not mean that you don’t have to put in any effort to get quality work. The ease of use of these tools might discourage people from learning the underlying principles and skills, such as basic programming or graphic editing.
However…
The rate of improvement of these tools is really alarming.
Initially, I was also gonna say the masses need to step up the game, and also upgrade their tech skills, but anyway…at this rate of improvement in readily available AI based computer vision tools, computer vision may really have reached the masses.
I personally found Don Norman’s view on the design world and its rules valid. The main goal of a designer is to find innovative creative ways to make designs easier to use and efficient, so when something as simple as how to open a door gets too complicated I too would write about it.
What makes a good design is the level of how fast users understand how it works. However, nowadays Norman argues that as technology is developing design must play a role in easing the interaction between technology and people. This idea hits close to home, in every family gathering I am the designated technology expert whenever my grandparents want to post something on instagram or update their facebook status, it would be nice if these platforms would measure user experience and user feedback — which was also a principle of design that Norman wrote about. This issues of technology not being inclusive to those of older generations must be addressed because it is a global issues.
I wanted to take the route of generative text, I wanted something that has a motion of swaying side to side having some sort of hypnotic effect. Additionally, I wanted the text to be in a pixelated font mainly for the aesthetic element, I used the font Silkscreen to achieve that look.
As a beginner in p5.js I had to use online resources such as the p5.js reference page, and YouTube: Patt Vira’s videos.
I am especially proud of being able to figure out the following formulas and that block of code, it might seem basic but it took me a while to wrap my head around them. The following block is what gives the element of swaying as there are two texts overlapping each other using, the top layer mimicking the motion of the layer under it. Updating the angle based on a sinusoidal function, creating a dynamic movement in the pattern which is what creates the swaying motion. And, I added the interaction element of being able to switch the colors of the text randomly – the colors are randomized and the letters are transparent.
function draw() {
background(0);
stroke(255);
let x = r * cos(angle);
let y = r * sin(angle);
translate(20, 300);
for(let i = 0; i < points.length; i++) {
line(points[i].x, points[i].y, points[i].x + x, points[i].y + y);
}
fill(textColor);
textSize(size);
textFont(font);
text(msg, x, y);
let increment = 2 * sin(t);
t++;
angle += increment;
}
function mousePressed() {
// Change text color on mouse click
textColor = color(random(255), random(255), random(255), 100);
// Introduce noise to the points
for (let i = 0; i < points.length; i++) {
points[i].x = originalPoints[i].x + random(-10, 10);
points[i].y = originalPoints[i].y + random(-10, 10);
}
}
I initially wanted to add the Dubai landscape behind the text, however that was a complete failure, I couldn’t figure out what went wrong, but that is the only thing that I would change with my code.
Here is my entire code:
let font;
let points = [];
let originalPoints = [];
let msg = "dubai";
let size = 250;
let r = 15;
let angle = 0;
let t = 0;
let textColor;
function preload() {
font = loadFont("fonts/Silkscreen-Regular.ttf");
}
function setup() {
createCanvas(850, 400);
points = font.textToPoints(msg, 0, 0, size);
originalPoints = points.map(point => createVector(point.x, point.y));
angleMode(DEGREES);
textColor = color(255, 100);
}
function draw() {
background(0);
stroke(255);
let x = r * cos(angle);
let y = r * sin(angle);
translate(20, 300);
for(let i = 0; i < points.length; i++) {
line(points[i].x, points[i].y, points[i].x + x, points[i].y + y);
}
fill(textColor);
textSize(size);
textFont(font);
text(msg, x, y);
let increment = 2 * sin(t);
t++;
angle += increment;
}
function mousePressed() {
// Change text color on mouse click
textColor = color(random(255), random(255), random(255), 100);
// Introduce noise to the points
for (let i = 0; i < points.length; i++) {
points[i].x = originalPoints[i].x + random(-10, 10);
points[i].y = originalPoints[i].y + random(-10, 10);
}
}