Pi : Week 10 Reading – Aliens, Tom Igoe and converting Monologues to Dialogues

Aliens

I’ve always had this theory that if aliens ever visited Earth 👽, they’d mistake our art galleries for some silent, sacred worship spaces. I mean, where else will you find humans voluntarily hushed, tiptoeing about like they’re scared of disturbing the air itself?

Then I read the article by Tom Igoe, “Making Interactive Art: Set the Stage, Then Shut Up and Listen,” and it actually made me totally reconsider my previously formed opinion but from another point of view. Maybe our future cosmic friends would understand that in the quietude of an art space, there’s a loud, unspoken conversation between human and creation. And that’s precisely what Igoe is advocating for – an unscripted, raw dialogue where the artwork doesn’t just speak but listens too.

In “Making Interactive Art: Set the Stage, Then Shut Up and Listen,” Igoe takes a poke at would-be über-ambitious creators who helicopter-parent their creations. I can’t help smiling at his blunt advice not to interpret your own work.

It sounds very much as if one would tell a parent, “Yeah, you gave birth to it, but don’t you dare tell it what to be!” 😂

What an interesting notion, as I have to do the complete opposite more often than not in my professional life as a computer engineer: spoon-feeding users about what to do with my creation. I once preached in the IM class on exactly that, remember? “If an app needs a manual, you’ve done it wrong.” Yes. That point Tom made about the art being “a conversation,” not a monologue, is a beauty. You don’t dictate; you just help guide the process.

Now, moving on to “Physical Computing’s Greatest Hits (and misses),” it was really hard not to chuckle at the analogy Igoe is trying to make of evergreen themes of projects and how they could be recycled.

It’s like that old joke about there being only seven original plots  in all of literature 😂 (Yes, there are only seven types of stories in the world, according to some) . Sure, we’ve seen a hundred theremin-like instruments or video mirrors, but it’s the individual spin—that touch of personal madness—that makes them fresh and edgy. It’s like cooking; the same stuff can come up with many different flavors. And, as a computer engineer, I’ve seen too many cliché tech implementations. This idea is literally a breath of fresh air.

What really floats my boat, and tickles my pickles in this article is that Igoe gives a subtle nod toward the creative process.Even in the most clichéd themes, there’s a window for innovation. This is somewhat comforting, especially when one has to look at yet another “innovative” app idea that feels like déjà vu. This makes me think that maybe it’s not what you build but how you spin it.

In both articles, Igoe offers insightful perspectives on art and interactivity that echo many of my own beliefs. As creators, our job is not to impose but to propose. We set the stage, provide the tools, and in the most beautiful act of humility, we step back and let the symphony of interaction play. In that symphony is learning, growth—a perennial scope of innovation. Time to make more systems, in art and technology, that don’t just talk at us, but with us.

Because the best kind of conversation isn’t the one in which you’re the only one speaking.

Pi : Fwitch – A Flute Controlled Steampunk Switch

Guitar is overrated 🎸😒. Anyone who got impressed by a electric guitar controlled cyberpunk game should seriously raise their standards.

When I am not working on brain controlled robots, I look after my child, called The Tale of Lin & Lang,  which is a fictional universe where Industrial Revolution began in East Asia (China/Korea/Japan), and a re-imagination of alternative history. In that world, there are steampunk inventors who invent computers, clockwork, machines … and there are also artisans who plays the traditional bamboo flute (笛子 – dízi).

Well, that’s fiction…. or is it 🤔? Well, in real life, I am also a bamboo flute player and an inventor, and a steampunk enthusiast…. so I present Fwitch, a flute controlled steampunk switch.

Below is the complete demonstration.

HOW IT WORKS

It’s a switch, so nothing complicated. One end of the wire needs to go and meet another wire… I am just driving the motion using 2 steampunk style gears I 3D printed and painted.

When I blow the flute, the laptop mic listens to my flute volume, and above a particular threshold, will establish a serial connection to arduino and tell the servo motor to rotate to a particular degree. And with another note from the flute, it will toggle the switch. Simple.

The servo I am using is quite large (because the gears are large), hence I need an external power supply. It is hidden in the container below to be neat and tidy.

And yes, I am using a Chinese clone mega board.

Below are close up shots.

CODE

The following python code listens to the microphone on my computer, and above a particular volume threshold, it will send switch on and off signal through serial to arduino. I could have used a mic and do everything on arduino, but could not find one, so decided to use my laptop mic.

import pyaudio
import numpy as np
import os
import time
import serial
import serial.tools.list_ports

switch_on = False
volume_threshold = 30  # Configurable threshold
switch_toggled = False  # Flag to track if the switch was toggled

def clear_screen():
    # Clear the console screen.
    os.system('cls' if os.name == 'nt' else 'clear')

def list_serial_ports():
    ports = serial.tools.list_ports.comports()
    return ports

def get_volume(data, frame_count, time_info, status):
    global switch_on, switch_toggled

    audio_data = np.frombuffer(data, dtype=np.int16)
    if len(audio_data) > 0:
        volume = np.mean(np.abs(audio_data))
        num_stars = max(1, int(volume / 100))

        if num_stars > volume_threshold and not switch_toggled:
            switch_on = not switch_on
            ser.write(b'180\n' if switch_on else b'0\n')
            switch_toggled = True
        elif num_stars <= volume_threshold and switch_toggled:
            switch_toggled = False

        clear_screen()
        print(f"Switch:{switch_on}\nVolume: {'*' * num_stars}")

    return None, pyaudio.paContinue

# List and select serial port
ports = list_serial_ports()
for i, port in enumerate(ports):
    print(f"{i}: {port}")
selected_port = int(input("Select port number: "))
ser = serial.Serial(ports[selected_port].device, 9600)
time.sleep(2)  # Wait for serial connection to initialize
ser.write(b'0\n')  # Initialize with switch off

# Audio setup
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 44100
CHUNK = 1024

audio = pyaudio.PyAudio()

# Start the stream to record audio
stream = audio.open(format=FORMAT, channels=CHANNELS,
                    rate=RATE, input=True,
                    frames_per_buffer=CHUNK,
                    stream_callback=get_volume)

# Start the stream
stream.start_stream()

# Keep the script running until you stop it
try:
    while True:
        time.sleep(0.1)
except KeyboardInterrupt:
    # Stop and close the stream and serial
    stream.stop_stream()
    stream.close()
    ser.close()
    audio.terminate()

And this arduino part listens to the serial from python, and rotates the servo accordingly.

#include <Servo.h>

Servo myservo;  
int val;        

void setup() {
  myservo.attach(9); 
  Serial.begin(9600);
  Serial.println("Servo Controller Ready");
}

void loop() {
  if (Serial.available() > 0) {
    String input = Serial.readStringUntil('\n'); // read the string until newline
    val = input.toInt(); // convert the string to integer

    val = constrain(val, 0, 180);

    myservo.write(val);      
    Serial.print("Position set to: ");
    Serial.println(val);     
    delay(15);               
  }
}

Hope you enjoy.

Remarks

Well, since the assignment rubric required use of Arduino, I am using the Arduino. Had it been the original assignment without Arduino, things could have gotten more interesting 🤔. Arduino is a tool, transistors are tools. Many people are so inclined to believe that in order to implement a programmable logic, we need electronics.

🚫 NOOOO!!!!

My inner computer engineer says logic can be implemented anywhere with proper mechanism, and if you can implement logic, anything is a computer.

  • 🧬 Human DNA is just a program to read lines of nucleobases and produce proteins based on that.
  • 💧We can use water and pipes to implement logic gates and design our hydro switch.
  • 🍳If I wanted to, even the omelette on my breakfast plate can be a switch.

We don’t even need 3 pin transistors , we can design purely “Mechanical” logic gates and design the switch. But oh well… putting back my celestial mechanics into my pocket.

Week 8a Reading : Norman and the Nuances of Aesthetic Practicality

Written on my dorm wall, there is a manifesto that reads “Aesthetical Practicality Manifesto…There is no right or wrong, only what is practical and what is not…Anything which is not of utmost practicality, intense beauty, or ideally, a blend of both, is bullshit, and must be always avoided.”

Reading through Norman’s “Emotion & Design: Attractive things work better,” to me, was like I heard a harmonic echo of my thoughts. Juxtaposition by Norman to show the weaving of practicality and aesthetics in design, is what I so strongly support. In short, my life is super simple. I just follow the following with almost everything in my life.

First up, Norman’s opening on affect and design captures a simple truth: “Advances in our understanding of emotion and affect have implications for the science of design.” This one has always resonated with me. In my quest to blend the roles of a engineer with an artist, I understood that emotion is not a byproduct but the guiding force in the process. Norman words that nicely, “Positive affect can make complex tasks feel easier; negative affect does the opposite. His story of the color displays in computers, which was once thought to be superfluous but later turning out to be very necessary and practical, though not making any practical benefit at all, was absolutely representative of this principle. I still think about his words whenever I try to beautify my computer terminals with themes and eveything, while in reality, there is nothing of practical value… but having a Batman/Ironman like computer User Interface improves my motivation to work by hundredfold.

Norman’s teapot examples are a treasure trove of insights. The pot by Carelman, unus unusable teapot by Nanna, and a tilting pot by Ronnefeldt represent, in a nutshell, the whole spectrum of design philosophies: from functionally unpractical, through aesthetically awkward and unwieldy but serving its purpose, all the way to functional sophistication. Just the way Norman believes, I think form is function.

I do not see aesthetics and usability in opposition but as dance partners. One reinforces the other’s strength. When Norman poses, “Why not beauty and brains, pleasure and usability?” I find myself nodding in agreement. This mirrors the core tenet of my manifestoᅳthat of the achievement of practicality so palpable but in the most beautiful aesthetic way possible. Such is very key in my work at the intersection between technology and art. For example, take the development in software engineering for user interfaces. A visually appealing interface that confounds the user is as ineffective as a drab but functional one.

We really need interfaces that are pleasant to the eyes, at the same time taking the user around effectivelyᅳbasically, nice skeuomorphic designs that I have always admired.

Another very deep reflection I like is Norman’s argument about affect in design. He shows how our emotions and fears make us change our minds altogether about the difficulty of the task, using the example of the plank at different heights. In the professional world, I have seen firsthand that the right emotional design can make the use of even a complex piece of software feel approachableᅳor perhaps even actually fun. Norman’s statement, “Positive affect can make it easier to do difficult tasks,” exactly supports my belief in the attractive design improving the interaction with the product and hence letting the difficulty of the task be gone. But if anything, that’s a critique I might level against Norman: It would be how he treats the friction of usability against aesthetics. While he does fully grasp said friction, I think there’s leeway for a much more intricate exploration of how the two can be married together more cohesively in the design process.

My manifesto demands that once tangible practicality is demonstrated, achieving aesthetics is paramount.

This is where the art of design truly lies – in not just balancing but synthesizing function and form. Norman reflecting on the importance of good human-centered design in stressful situations is something I absolutely agree with. He underscores the importance of empathetic designs that take into account the emotional state of the user. This principle can be transferred from physical products to digital interfaces, where ease and clarity in navigation can do a lot not to stress users. In summary, the discourse of Norman reinstated to me what I had believed all along: aesthetic and practicality is basically a conglomeration.

Pi Midterm : The Deal

(Instructions : If you have a guitar, you can plug in your guitar and play the game. However, without a guitar, you can play with arrow keys. More instructions inside the game.)

Pi’s P5 Midterm : https://editor.p5js.org/Pi-314159/full/FCZ-y0kOM
If something goes wrong with the link above, you can also watch the full gameplay below.

Overall Concept

The Deal is a highly cinematic cyberpunk narrative I am submitting as my midterm project.

Meet Pi, the devil’s engineer, in a world where technology rules and danger lurks in every shadow. After receiving a mysterious call from a woman offering a lucrative but risky job., he’s plunged into a world of corporate espionage over brain-controlled robots.

The inspiration for this comes from my daily life, where I have to deal with tech clients, and … yes I actually do Brain controlled robots in my lab, you scan the brain waves through electroencephalogram (EEG) and use that feed that signal through a neural network to translate to robot movements. It’s in very early stage, but it is moving.

How the project works

The game is supposed to be played with the guitar. In fact, it is to be used not as a game, but as a storytelling tool by a guitar playing orator to a live audience. Below is the demonstration of the game.

A number of tools and a not so complicated workflow is used to create this game in a week. The full workflow is illustrated below.

For a midterm project, the project is rather huge. I have 16 individual Javascript Modules working together to run the player, cinematics, background music, enemies, rendering and parallax movements. Everything is refactered into classes for optimization and code cleanliness. For example, my Player class begins as follows.

class Player {
  constructor(scene, x, y, texture, frame) {
    // Debug mode flag
    this.debugMode = false;
    this.debugText = null; // For storing the debug text object
    this.scene = scene;
    this.sprite = scene.matter.add.sprite(x, y, texture, frame);
    this.sprite.setDepth(100);
    // Ensure sprite is dynamic (not static)
    this.sprite.setStatic(false);

    // Setup animations for the player
    this.setupAnimations();

    // Initially play the idle animation
    this.sprite.anims.play("idle", true);

    // Create keyboard controls
    this.cursors = scene.input.keyboard.createCursorKeys();

    // Player physics properties
    this.sprite.setFixedRotation(); // Prevents player from rotating

    // Modify the update method to broadcast the player's speed
    this.speed = 0;

    this.isJumpingForward = false; // New flag for jump_forward state

    // Walking and running
    this.isWalking = false;
    this.walkStartTime = 0;
    this.runThreshold = 1000; // milliseconds threshold for running
    // Debugging - Enable this to see physics bodies
  }

  //Adjust Colors
  // Set the tint of the player's sprite
  setTint(color) {
    this.sprite.setTint(color);
  }

What I am proud of

I am super proud that I was able to use a lot of my skills in making this.

For almost all the game resources, including the soundtrack, I either created it myself or generated it through AI, then plugged into the rest of the workflow for final processing. Here, in the image above, you can see me rigging and animating a robot sprite in Spine2D, where the original 2D art is generated in Midjourney.

Problems I ran Into

During the testing, everything went well. But during the live performance, my character got confused between moving left and moving right. When I played guitar notes for the character to move left, it moved right instead . This is because I declared the key mappings from the fft (Fast Fourier Transform) signal as frequency bands, which maps to right and left arrow keys accordingly.

In theory, it should work, but in practice, as in diagram below, once the left key mapping stops, the signal inadvertently passes through the right key mapping region (due to not having an exact vertical slope), causing unintentional right key presses.

I had to resort to the fft-> keymapping workflow since I cannot directly access the javascript runtime through my external C++ fft program. However, had the game been implemented as a native game (i.e. using Unity,Godot), then I can directly send unique UDP commands instead of keymapping regions. This would resolve the issue.

Rubric Checklist :

  • Make an interactive artwork or game using everything you have learned so far (This is an interactive Game)
  • Can have one or more users (A single player game, with the player acting as the storyteller to a crowd)
  • Must include

At least one shape

The “BEGIN STORY” button is a shape, in order to fulfill this requirement. Everything else are images and sprites.

At least one image

We have a whole lot of images and Easter Eggs. The graphics are generated in Midjourney, and edited in Photoshop.

At least one sound

Pi composed the original soundtrack for the game. It is in A minor pentatonic for easy improvisation over it. In addition there are also loads of ambience sounds. 

For the prologue monologue, I used this : https://youtu.be/Y8w-2lzM-C4

At least one on-screen text

We got voice acting and subtitles.

Object Oriented Programming

Everything is a class. We have 18 Classes in total to handle many different things, from Cinematics to Data Preloader to Background Music Manager to Parallax Background Management. Below is the Cinematic implementation.

class Cinematic {
  constructor(scene, cinematicsData, player) {
    this.scene = scene;
    this.cinematicsData = cinematicsData;
    this.player = player; // Store the player reference
    this.currentCinematicIndex = 0;
    this.subtitleText = null;
    this.isCinematicPlaying = false;
    this.collidedObject = null;
    this.lastSpawnSide = "left"; // Track the last spawn side (left or right)
    // Game objects container
    this.gameObjects = this.scene.add.group();
    this.phonecallAudio = null; // Add this line
  }

  create() {
    // Create the text object for subtitles, but set it to invisible
    this.subtitleText = this.scene.add
      .text(this.scene.scale.width / 2, this.scene.scale.height * 0.5, "", {
        font: "30px Arial",
        fill: "#FFFFFF",
        align: "center",
      })
      .setOrigin(0.5, 0.5)
      .setDepth(10000)
      .setVisible(false);

    // Setup collision events
    this.setupCollisionEvents();
  }

  executeAction(action) {
    switch (action) {
      case "phonecall":
        console.log("Executing phone call action");
        // Play the 'nokia' audio in a loop with a specified volume
        if (!this.phonecallAudio) {
          this.phonecallAudio = this.scene.sound.add("nokia", { loop: true });
          this.phonecallAudio.play({ volume: 0.05 });

 

  • The experience must start with a screen giving instructions and wait for user input (button / key / mouse / etc.) before starting (The main menu waits for the user click)
  • After the experience is completed, there must be a way to start a new session (without restarting the sketch) (After the story, it goes back to main menu)

Interaction design (is clear to user what they are controlling, discoverability, use of signifiers, use of cognitive mapping, etc.)

(We have a super simple Keyboard or Guitar input instructions)

 

 

Pi Week 5 Midterm Progress : G-POET – Guitar-Powered Operatic Epic Tale

Concept

I am super jealous 😤 of people who are extremely good at one thing. Hayao Miyazaki doesn’t have to think that much, but just keep making animated movies.. he’s the greatest artist I look up to. Slash and Jimi Hendrix does not get confused, they just play guitar all time, because they are the greatest musicians. Joseph Fourier does just Mathematics and Physics, with a little history on the side… but he’s still a mathematician.

My lifelong problem is that I specialize in everything, in extreme depths. When you are a competent artist, engineer, musician, mathematician, roboticist, researcher, poet, game developer, filmmaker and a storyteller all at once, it’s really really hard to focus on one thing….

which is a problem that can be solved by doing EVERYTHING.

Hence, for my midterm, I am fully utilizing all a fraction of my skills to create the most beautiful interactive narrative game ever executed in the history of p5js editor, where I can control the game by improvising on my guitar in real time. The story will be told in the form of a poem I wrote.

Ladies and gents I present you G-POETthe Guitar-Powered Operatic Epic Tale. 🎸 the live performance with your host Pi.

This is not a show off. This is Devotion to prove my eternal loyalty and love of arts! I don’t even care about the grades.  For me, arts is a matter of life or death. The beauty and the story arc of this narrative should reflect my overflowing and exploding emotions and feelings I have for arts.

Also, despite it being a super short game made using JavaScript, I want it on the same level, if not better than the most stunning cinematic 2D games ever made in history – the titles like Ori and the Blind Forest , Forgotton Anne, Hollow Knight. They are produced by studios to get that quality, I want to show what a one man army can achieve in two weeks.

Design

I am saving up the story for hype, so below are sneak peaks of the bare minimum. It’s an open world. No more spoilers.

(If the p5 sketch below loads, then use Arrows to move left and right, and space to jump )

And below, is my demonstration of controlling my game character with the guitar. If you think I can’t play the guitar… no no no Pi plays the guitar and narrate you the story, you interact with him, and tell him, oh I wanna go to that shop. And Pi will go “Sure, so I eventually went to that shop… improvise some tunes on the spot to accompany his narrations, and the game character will do that thing in real time”.

See? There’s a human storyteller in the loop, if this is not interactive, I don’t know what is.

People play games alone… This is pathetic.

~ Pi

💡Because let’s face it. People play games alone… This is pathetic.  My live performance using G-POET system will bring back the vibes of a community hanging out around a bonfire listening to the stories by a storyteller…same experience but on steroids, cranked up to 11.

No, such guitar assisted interactive performance in real-time Ghibli style game has not been done to my knowledge.

(In the video, you might notice that low pitch notes causes player to go right, and high pitch notes is for going left. There are some noises, I will filter it later.)

To plug  my guitar to my p5js editor, I wrote a C++ native app to calculate the frequency of my guitar notes through Fast Fourier Transform and map particular ranges of frequencies to some key press events, which is propagated to the p5js browser tab. A fraction of the C++ code to simulate key presses is

char buffer[1024];
    bool leftArrowPressed = false;
    bool rightArrowPressed = false;
    CGKeyCode leftArrowKeyCode = 0x7B; // KeyCode for left arrow key
    CGKeyCode rightArrowKeyCode = 0x7C; // KeyCode for right arrow key

    while (true) {
        int n = recv(sockfd, buffer, 1024, 0);
        if (n > 0) {
            buffer[n] = '\0';
            int qValue = std::stoi(buffer);
            // std::cout << "Received Q Value: " << qValue << std::endl;

            if (qValue > 400 && qValue <= 700 && !leftArrowPressed) {
                // Debug Log
                std::cout << "Moving Left" << std::endl;
                simulateKeyPress(leftArrowKeyCode, true);
                leftArrowPressed = true;
                if (rightArrowPressed) {
                    // Debug Log
                    std::cout << "Stop" << std::endl;
                    simulateKeyPress(rightArrowKeyCode, false);
                    rightArrowPressed = false;
                }
            } else if ((qValue <= 400 || qValue > 700) && leftArrowPressed) {
                // Debug Log
                std::cout << "Stop" << std::endl;
                simulateKeyPress(leftArrowKeyCode, false);
                leftArrowPressed = false;
            }

 

Of course, I need the characters. Why do I need to browse the web for low quality graphics (or ones which does not meet my art standards), while I can create my own graphics tailored to this game specifically?

So I created a sprite sheet of myself as a game character, in my leather jacket,  with my red hair tie, leather boots and sunglasses.

But is it not time consuming? Not if you are lazy and automated it 🫵. You just model Unreal metahuman yourself, plug the fbx model into mixamo to rig, plug into Unity, do wind and clothes simulation and animate. Then, apply non-photorealistic cel shading to give a hand-drawn feel, and utilize Unity Recorder to capture each animation frame, do a bit more clean up of the images through ffmpeg, then assemble the spritesheet in TexturePacker and voilà … quality sprite sheet of “your own” in half an hour.

Also when improvising the guitar during the storytelling performance, I need the game background music to (1) specifically tailored to my game and (2) follow a particular key and chord progression so that I can improvise on the spot in real time without messing up. Hence, I am composing the background , below is a backing track from the game.

In terms of code, there are a lot a lot a lot of refactored classes I am implementing, including the Data Loader, player state machine, Animation Controllers, Weather System, NPC system, Parallax scrolling, UI system, Dialgoue System, and the Cinematic and Cutscene systems, and Post Processing Systems and Shader loaders. I will elaborate more on actual report, but for now, I will show an example of my Sprite Sheet loader class.

class TextureAtlasLoader {
  constructor(scene) {
    this.scene = scene;
  }

  loadAtlas(key, textureURL, atlasURL) {
    this.scene.load.atlas(key, textureURL, atlasURL);
  }

  createAnimation(key, atlasKey, animationDetails) {
    const frameNames = this.scene.anims.generateFrameNames(atlasKey, {
      start: animationDetails.start,
      end: animationDetails.end,
      zeroPad: animationDetails.zeroPad,
      prefix: animationDetails.prefix,
      suffix: animationDetails.suffix,
    });

    this.scene.anims.create({
      key: key,
      frames: frameNames,
      frameRate: animationDetails.frameRate,
      repeat: animationDetails.repeat,
    });
  }
}

And I am also writing some of the GLSL fragment shaders myself, so that the looks can be enhanced to match the studio quality games. An example of the in game shaders is given below (this creates plasma texture overlay on the entire screen).

precision mediump float;

uniform float     uTime;
uniform vec2      uResolution;
uniform sampler2D uMainSampler;
varying vec2 outTexCoord;

#define MAX_ITER 4

void main( void )
{
    vec2 v_texCoord = gl_FragCoord.xy / uResolution;

    vec2 p =  v_texCoord * 8.0 - vec2(20.0);
    vec2 i = p;
    float c = 1.0;
    float inten = .05;

    for (int n = 0; n < MAX_ITER; n++)
    {
        float t = uTime * (1.0 - (3.0 / float(n+1)));

        i = p + vec2(cos(t - i.x) + sin(t + i.y),
        sin(t - i.y) + cos(t + i.x));

        c += 1.0/length(vec2(p.x / (sin(i.x+t)/inten),
        p.y / (cos(i.y+t)/inten)));
    }

    c /= float(MAX_ITER);
    c = 1.5 - sqrt(c);

    vec4 texColor = vec4(0.0, 0.01, 0.015, 1.0);

    texColor.rgb *= (1.0 / (1.0 - (c + 0.05)));
    vec4 pixel = texture2D(uMainSampler, outTexCoord);

    gl_FragColor = pixel + texColor;
}

 Frightening / Challenging Aspects

Yes, there were a lot of frightening aspects. I frightened my computer by forcing it to do exactly what I want.

Challenges? Well, I just imagine what I want. In the name of my true and genuine love for arts, God revealed all the codes and skills required to me through the angels to make my thoughts into reality.

Hence, the implementation of this project is like Ariana Grande’s 7 Rings lyrics.

I see it, I like it, I want it, I got it (Yep)

Risk Prevention

Nope, no risk. The project is completed so I know there is no risks to be prevented, I am just showing a fraction of it because this is the midterm “progress” report.

Week 5 Reading : The Evolution of Computer Vision – From Myron Kreuger to OpenAI’s SORA

On Feb 16, 2024, OpenAI released a preview of SORA, a text-to-video diffusion transformer model. With that, almost everyone will be able to (to an extent) generate videos they imagine. We have come a long long way since Myron Kreuger’s 1989 Video Place (Gosh, his implementation makes all my VR experiences weak). In the previous years, a lot of public computer vision models came out and became accessible –  YOLO,  GANs, stable diffusion, DALL-E mid journey.etc .The entire world is amazed when DALL-E shows its in-painting functionalities. However, it should be noted that such capabilities (or at least theories behind it) were in existence since antiquity (i.e. PatchMatch  is a 2009 inpainting algorithm, which later got integrated into Photoshop as the infamous content aware fill tool).

What a time to be alive.

And back in 2006, Golan Levin, another artistic engineer wrote the Computer Vision for Artists and Designers. He gave a brief overview of the state of computer vision, and discussed frame differencing, background subtraction, brightness thresholding as extremely simple algorithms which the artists can utilize. Then gave us the links to some Processing code at the end as examples. I wish that the writing contains a bit more how-to guide and figures on how to set up the Processing interface and so on. 

Golan wanted to stress that, in his own words, a number of widely-used and highly effective techniques can be implemented by novice programmers in as little as an afternoon and bring the power of computer vision to the masses. However, in order to get computer vision to the masses, there are certain challenges… mainly not technology, but digital literacy.

The Digital Literacy Gap in Utilizing Computer Vision

From observation, a stunning amount of people (including the generation which grew up with ipads) lack basic digital literacy. There are some “things” you have to figure out yourself once you have used the computer for some time, for instance to select multiple different files at once, hold the Ctrl key and click on the files. On windows,  your applications are most likely installed in C:\Program Files (x86).  If the app is not responding, fire up the task manager and kill the process in windows, or force quit in mac, or use the pkill command in linux. If you run an application and the GUI is not showing up, it is probably running as a process in the system tray.etc .etc.

However, the masses who use computers on a daily basis for nearly a decade (a.k.a. my dad, and a lot more people, even young ones) struggle to navigate around their computer. For such masses, Golan Levin’s article is not a novice programmer tutorial, but already an intermediate tutorial – one has to have installed Processing in their computer, set up the Java prior to that and so on. Personally, I feel that a lot of potential artists give up the integration of technology due to the barrier of entry of the environment setup (for code based tools and computer vision). Hence, as soon as any enthusiastic artist tries to run an OpenCV code from github on their computer, and when their computer says “Could not find a version that satisfies the requirement opencv”, they just give up.

Nevertheless, things are becoming a lot more accessible. Nowadays, if you want to do such computer vision processing, but don’t want the code, there are Blender Geometry Nodes, Unity Shader Graphs where you can drag around stuff to do stuff. For code demonstrations, there is Google Colaboratory where you can run python OpenCV code without dealing with any python dependency errors (and even get GPUs if your computer is not powerful enough).

Golan mentioned “The fundamental challenge presented by digital video is that it is computationally “opaque.” Unlike text, digital video data in its basic form — stored solely as a stream of rectangular pixel buffers — contains no intrinsic semantic or symbolic information.” This no longer exists in 2024, since you can either use either Semantic Segmentation, or plug your image into any transformer model to have each of your pixels labeled. Computers are no longer dumb.

The Double-Edged Sword of User-Friendly Computer Vision Tools

With more computer vision and image generation tools such as Dall-E,  you can type text to generate images, of course with limitations. I had an amusing time watching a friend try to generate his company logo in Dall-E with the text in it, and it failed to spell it correctly, and he keeps typing the prompt again and again and gets frustrated with the wrong spelling. 

In such cases, I feel that technology has gone too far. This is the type of computer vision practitioners that these new generations of easy tools are going to produce. Ones who will never bother to open up an IDE and try coding a few lines, or to just get Photoshop or GIMP and place the letters by themselves. Just because the tools get better does not mean that you don’t have to put in any effort to get quality work. The ease of use of these tools might discourage people from learning the underlying principles and skills, such as basic programming or graphic editing.

However…

The rate of improvement of these tools is really alarming. 

Initially, I was also gonna say the masses need to step up the game, and also upgrade their tech skills, but anyway…at this rate of improvement in readily available AI based computer vision tools, computer vision may really have reached the masses.

Week 3: [Data Viz] Some sort of data visualization

Below is the p5 sketch.

Conceptualization

Pi’s Some sort of data visualization is literally some sort of data visualization and trying to remove things that are not necessary. Inspired by Saturay Morning Breakfast Cereal, this data shows true, but useless facts.

2) ⚙️ Technical Plan of Attack & Implementation

Once we get the data, drawing the bar is just defining some parameters and working with the rectangles accordingly.

  // Calculate dynamic dimensions
  let padding = 200; 
  let graphWidth = width - 2 * padding;
  let barWidth = graphWidth / data.length; 
  let colors = []; // Array to hold the bar colors
  for (let i = 0; i < data.length; i++) {
    colors.push(color(255, 105 + i * 10, 0)); // Gradually changing the color
  }
  // Draw the bars
  for (let i = 0; i < data.length; i++) {
    fill(colors[i]);
    noStroke();
    rect(padding + i * barWidth, height - padding - data[i] * barWidth, barWidth - 1, data[i] * barWidth);
  }
// ... and so on

I could have loaded from the csv file, but the data is small enough.

3) 🎨 Artistic Plan of Attack & Implementation

Just to keep things not boring, I played with some automatic orange gradient for the bar colors by using

  let colors = []; // Array to hold the bar colors
  for (let i = 0; i < data.length; i++) {
    colors.push(color(255, 105 + i * 10, 0)); // Gradually changing the color
  }

4) 💪 Challenges

No challenge.

5) 💡 Potential Improvements

No further improvements are needed, I need to learn to restrain yourself.

6) 🖥️ Source code

🖥️ Source code is just a single sketch.js file at : https://github.com/Pi-31415/Intro-To-IM/blob/main/Assignment-4/sketch.js

📖 References :

Week 4 : Breastmilk, flies and piano stairs – Pi’s Top Human-Centered Design Examples

“The problem with the designs of most engineers is that they are too logical.”

😱 Ouch ouch! Don Norman’s quote above from “The Psychopathology of Everyday Things” was harsh enough to get my blood boiling. Despite feeling personally attacked, both the devil and the angel on my shoulders say “Wait, wait, he has got a point”. I fully join Don in standing ovation for his idea of Human-Centered Design. When I wrote in my last post “The good design speaks for itself, the learning curve is so smooth such that the users enlightens themselves without any guidance and hitting F1 for user manuals.” I was saying exactly the same points. Discoverability is a vital concept… the user should be able to discover how the thing works, without it being explicitly stated… and how do you mainly achieve this? Nudges!! Although Don has not mentioned it in Chapter 1, I would like to highlight the Nudge Theory to the rescue… and some examples.

The Nudge Theory

I don’t have a human kid, but I know for a fact that at one point, they have to stop breast milk 🍼 and start to eat actual solid food 🍗. This process is called weaning and back home, they start to train babies to stop feeding on breast milk around 6 months to 1 year of age. Like cocaine addicts with withdrawal symptoms, the babies will cry endlessly and become super desperate whenever they come into close contact with their mother… then these little zombies will reach out for the breast.

[Image Source]

This behavior has to change, of course. Back home, they have a traditional method of weaning… where you literally “spice 🌶️ things up”  by putting turmeric powder on the mother’s nipples so that when the baby gets its breastmilk next time, it goes “What the blistering barnacles?” with eyes popping out 😳. Next time, it learnt its lesson… “Chicken wings are tastier than breastmilk from now on”.

[Turmeric Powder – Source]

Cruel, effective, spicy…🔥

Woah woah Pi, how is this spice powered weaning method related to Human-Centered Design?

Wait wait, I am just explaining the idea of a “nudge theory”.

This is an example of nudge theory in action – a nudge is a gentle push towards a desirable behavior without any force or mandate. Here, the baby discovers on its own that it should switch from breastmilk to chicken by itself.

Don Norman’s discoverability in action!

In similar, but less spicy ways, the nudges can be applied to aid discoverability in a lot of human centered designs, to gaslight the humans into figuring stuff out on their own. In the rest of the post, I would like to share 3 of my favorite examples of this discoverability in action.

Applying Nudge Theory to Everyday Design

Flies 🪰

My personal favorite of all time solves the age-old problem of men’s restrooms: the messy urinal area. The aftermath is a nightmare for anyone tasked with cleanup. But here comes the nudge solution, as simple as it is ingenious.

A tiny sticker of a fly, or sometimes a target, is placed strategically in the urinal. It’s almost laughable how such a small thing can redirect a grown man’s attention. Yet, it works!

Men, either out of amusement or subliminal inclination, aim at the sticker.

The result? A cleaner urinal area, less spillage, and a sigh of relief from janitors everywhere.
It’s fun, it’s effective, and the best part? It doesn’t need a user manual or a ‘how-to’ guide. Men just get it, and they go along with it, often without even realizing they’re being nudged into better behavior.

World’s Deepest Bin 🗑️

Traditional bins are all but invisible to the average person. Enter the world’s deepest bin – not literally the deepest, but it sure sounds like it.

The nudge here is a bin that, when used, emits a humorous, exaggerated deep sound. It’s like dropping a piece of trash into a never-ending well. The sound is so unexpected, so comically over the top, that it draws people in. The result is as effective as it is entertaining: people actually look for things to throw away just to hear that sound again.

It turns an ordinary act of disposing of trash into a mini-adventure. And just like that, littering is reduced.

People are engaged, amused, and more importantly, they are nudging themselves and others to keep the surroundings clean.

Piano Stairs 🎹

The last example is a delightful play on human nature’s love for music: the piano stairs. The problem is clear: given the choice, most people opt for escalators or elevators, shunning the stairs, missing out on an easy opportunity for some physical activity.

The nudge solution? Transform a staircase next to an escalator into a giant working piano. Each step is a piano key that makes a sound when you step on it. The result is magical. People are drawn to the stairs, curious and excited. They hop, skip, and jump on the stairs, creating music as they go. What was once a mundane climb turns into a playful experience.

People actually go out of their way to use the stairs, sometimes repeatedly.

It’s human-centered design at its most whimsical and effective.

Conclusion

In each of these examples, the key factor is the design’s ability to communicate and guide behavior without explicit instructions. The fly in the urinal doesn’t need a sign explaining what to do. The World’s Deepest Bin doesn’t have a manual on how to use it. Piano Stairs don’t come with a user guide. They work because they tap into human instincts and make the desired action the more appealing choice. This is the essence of human-centered design – creating solutions that are so in tune with human behavior and needs that they guide us subtly towards better habits.

Week 3: [Object Oriented] Coding my own p5js Game Engine Part 1

Below is the p5 sketch, hover 👆 anywhere to interact with the robot.

In case p5js Editor website is down, below is the recording of working demo on YouTube.

TL;DR : Conceptualization

Pi’s Practicality Walker is an Inverse Kinematics powered procedurally animated simulation of a giant mechanical walker in p5.js. All the animations are not hard coded by Pi, but are calculated on the spot on demand. You can hover the mouse pointer to move around the body of the walker robot, and the leg movements will adjust to how it should be.

1) 🤔 Long-Winded Conceptualization

I was watching Doctor Strange in the Multiverse of Madness, and having a good time fantasizing myself deep in the movie…as Dr. Stephen Strange Pi , the Sorcerer  Engineer Supreme , the Master of the Mystic Arts Engineering Arts.

The only difference is that unlike Doctor Strange, I am the perfect boyfriend to all my ex-girlfriends.

And then I suddenly saw this delicious octopus.

In the class, we are learning Object Oriented Programming, and I am feeling the urge to write my own little mini game engine in p5js (in preparation for my midterm project). And I love mechanical things soooo sooo much. Hence, a giant mechanical octopus walking over the land, controllable with the mouse is a perfect idea.

Hence the piece “Pi’s Practicality Walker” is born.

To get such walking animation, Japanese Animation master Hayao Miyazaki will pour his heart and soul to his artwork and draw all the frames of the animation (that is 24 frames for 1 second of motion). But I am not Hayao Miyazaki.

But I am not Hayao Miyazaki.

~ Pi (2024)

Hence, I need to utilize my super lazy sneaky hacks to make this happen. Luckily, if you have a robotics background, the Inverse Kinematics and Procedural Animation techniques come in handy. Instead of going through the blood, sweat and tears of drawing/hard coding the animations, we can automatically generate animation in real-time to allow for a more diverse series of actions than would otherwise be tedious using predefined animations using these mathematical goodies.

2) ⚙️ Technical Plan of Attack & Implementation

The part of the code I am very proud of is of course, objectifying my octopus/spider 🕷️. Since they have a central body, which is composed of multiple legs, I can easily define the Mechanical Leg class, and the Body class as follows.

//This is the class for the individual legs
class MechanicalLeg {
  constructor(numSegments, segmentLength, isRightFacing = true) {
    this.numSegments = numSegments;
    this.segmentLength = segmentLength;
    this.isRightFacing = isRightFacing; // New parameter to determine the facing direction
    this.angleX = 0;
    this.angleY = 0;
    this.points = [];
    this.totalLength = this.segmentLength * (this.numSegments - 1);
  }

  update(targetX, targetY, canvasWidth, canvasHeight) {
    this.totalLength = this.segmentLength * (this.numSegments - 1);
    this.angleX = 0;
    this.angleY = 0;
    this.legLength = max(
      dist(targetX, targetY, canvasWidth / 2, canvasHeight / 2),
      2
    );

    let initialRotation = atan2(
      targetY - canvasHeight / 2,
      targetX - canvasWidth / 2
    );
    let rotation
// ... and so on

Then you just spawn the legs on the body, fulfilling the object and instance creation.

//Then, attach the legs to the body instance from the body class below
//Spider is walking and draggable
class SpiderBody {
  constructor(x, y) {
    this.position = createVector(x, y);
    this.baseY = y; // Base y-position to oscillate around
    this.dragging = false;
    this.dragOffset = createVector(0, 0);
    this.oscillationAmplitude = 30; // Amplitude of the up-and-down movement
    this.oscillationSpeed = 0.05; // Speed of the up-and-down movement
  }

  update() {
    this.position.x = mouseX - 50;
    // Apply a sin motion when not dragging
    this.position.y =
      mouseY +
      sin(frameCount * this.oscillationSpeed) * this.oscillationAmplitude;
  }
//...

As per project requirement, the arrays were used to hold the leg objects within the walker robot body instance.

// Line 436
function setup() {
  createCanvas(windowWidth, windowHeight);
  gaitHorizontalDistance = windowWidth / 0.7;
  spiderBody = new SpiderBody(width / 2, height / 2 + 100);
  // Initialize leg instances and add them to the legs array
  legs.push(new MechanicalLeg(4, 180, true)); // Right-facing leg
  legs.push(new MechanicalLeg(4, 180, false)); // Left-facing leg
  legs.push(new MechanicalLeg(5, 150, true)); // Another right-facing leg
  legs.push(new MechanicalLeg(5, 150, false)); // Another left-facing leg
  legs.push(new MechanicalLeg(4, 200, true)); // And so on...
  legs.push(new MechanicalLeg(4, 200, false));

Now we have a giant machine with legs, and the code is reusable and modular, but it is not moving yet. Inverse Kinematics is the art of calculating the joint angles so that given a particular end-effector coordinate in 3D space, the robot knows which joint angles it should move to to get to that target point. Hence, the animations can be automated thsi way.

Inverse Kinematics & Procedural Animation

I stole the mathematical model from the University of Illinois lecture slides here : Introduction to Robotics Lecture 11: Inverse Kinematics (https://publish.illinois.edu/ece470-intro-robotics/files/2021/10/ECE470Lec11-2.pdf)

The key idea is to have an algorithm which will iteratively adjusts the angles of each segment to ensure the end effector reaches or points towards the target. The mathematics primarily involves trigonometry to calculate angles and positions of each segment in 2D space. The model I am using is below.

Step 1 : Initial Tangent Alignment

Step 2 :  Desired Leg Length Calculation
Step 3 : Iterative Angle Adjustments
    • Initialize for all segments.
  • Iteratively adjust to stretch or contract the leg.
  • Incremental angle change:
  • Updatefor each segment.
Step 4 : Segment Position Calculation

Step 5 : Check Total Leg Length

The iterative process continues until the total length of the leg aligns with the desired length  L, hence we keep checking. This is the stopping condition.

Step 6 : Rotation Adjustment

If the legs are at the back, we have to measure the angle in the mirrored way, so mirror the angles.

The formal definitions of funky symbols above are

Number of segments in the leg (numSegments).
Length of each segment (segmentLength).
Angle of the th segment.
Coordinates of the end of the th segment.
Target coordinates (mouse position).
Width and height of the canvas (canvasWidth, canvasHeight).
Desired total length of the leg (legLength).
Initial rotation angle to point towards the target.
Total rotation angle of the leg.
Incremental angle change per iteration.

 

3) 🎨 Artistic Plan of Attack & Implementation

Once the robot is working, we enhance the aesthetics through adding a parallax grass, adding the previous swarms and gears, and playing a good old western slide guitar cowboy song and visualizing it through fast Fourier transform (FFT) in the style of Ryoichi Kurokawa .

4) 💪 Challenges

Again, no challenge. This was an enjoyable exercise.

5) 💡 Potential Improvements

To make the movements of the walker more realistic, as always, I could have used proportional–integral–derivative (PID) controllers. My current model goes with constant speed:

6) 🖥️ Source code

🖥️ Source code is just a single sketch.js file at : https://github.com/Pi-31415/Intro-To-IM/blob/main/Assignment-3/assignment3.js

📖 References :

Good artists copy, great artists steal. Of course, I stole everything by googling these publicly available stuff below 🤫😉.

Week 3 : Chris Crawford Reading – “Redefining Interactivity” by Pi

Once, a very thirsty traveler came to the bar and asked, “Water!”

The bartender, raising an eyebrow, says “Sure, sire, would you like room temperature or icy chill water 🥛?”

“Uh, cold please”

“Do you lean towards distilled, mineral, or perhaps a sparkling variety?”

The traveler scratching his head says, “Just regular water is fine”.

The bartender goes “In terms of regular water, we have classic spring regular water or purified tap…”


Judging by Chris Crawford’s Interactive Design definition, this is an interactive process.

  • There are two actors – the bartender and the traveler
  • They actively listen to each other,
  • and think (Whether bartender thinks or not is debatable)
  • and speak

The only catch here is that this interactivity “did not solve the problem”.  They did interact, there is flow of information between them, but the problem remains unsolved.

Just like how Crawford ranted about people in that day and age re-brands “The Same Old Stuff” as “New Interactive Technology” with hype and criticized how the “plays” ranks about a 0.01 on a 10-point Crawford Scale of Interactivity, I am also going to use this writing to rant about how his interactivity definition ranks pretty low (say around 3.14ish) on the 100-point Pi’s scale of Aesthetic Practicality. This definition of [Interactivity = “two actors” AND “listen” AND “think” AND “speak”] ought to be, at least expanded to be applicable.

Expanding the definition of Interactivity

Personally, when I encounter “Interactivity” I see it not as a “process” (unless you are dealing with human to human problems, where you have to Talk It Out Loud ™). Normally in the context of Human-Software Interactions, UI/UX design, interactivity is about how efficiently you can give the user the complete tutorial so that they can utilize the system with minimal guidance.

On more formal terms, if we ignore the video game industry (because by definition, games have to be interactive), I see interactivity as a measure of “the rate of transfer of information between two agents (i.e. Human-Computer), where this transferred information helps solve human problems using the computer with minimal input in minimal time.” just as in the diagram below.

Note that my definition explicitly states that the more interactive the system is,

  1. the more time it saves and
  2. the less guidance it needs to give the user in the future.

Otherwise if we go by Crawford’s definition, we fall into the danger of “Impractical Interactivity Deadlock Situation” where two parties keep on interacting without results, just like the bartender joke above.

In short, the holy grail of “Interactivity” is, ironically, to minimize more “interactivity” in the future. Because if you have to keep interacting… if you have to keep going to the bank because the bank customer service keeps tossing you back and forth with other departments and your credit card issue is still not solved, the “interactivity” is simply … not “interacting”.

In short, the holy grail of “Interactivity” is, ironically, to minimize more “interactivity” in the future.

~ Pi

Best Interactivity is Implicit Interactivity, change my mind

Personally, I agree with Crawford, that “Good Interactivity Design integrates form with function.” However, the only pebble in my shoe is that Crawford has the explicit “speak” component in his definition. In a well-designed software, you don’t necessarily have to explicitly speak. The good design speaks for itself, learning curve is so smooth such that the users enlightens themselves without any guidance and hitting F1 for user manuals.

There was a good old time in the UI design when “Skeuomorphism” – a user interface design which adds real-world design cues to virtual objects to make those objects more understandable.

This is the perfect marriage of form and function.

For instance, just look at the Garage Band guitar User Interface.

Super short, sweet and simple. Anyone who have intimately slid their fingers up the fretboard , do not need additional tutorial in order to play the Garage Band guitar. It is intuitive. There is no need to explicitly have the expanding speech bubble saying “In order to use the overdrive/flanger pedal, tap here.”

Also, the interface is just beauty in purest form 😌👌.

The design itself is already intuitive and interactive.

However, just like the average American marriage [source], after 8 years, the form and function got a divorce ☠️… and the entire world catches minimalism/flat design virus, to the extent that where intuition is murdered (Yes, I am looking at you, Material Design and Neumorphism).

The best example of the such worst UI nightmare is the audio mute/unmute icons.

 

After years of experience during COVID, and after using Zoom countless professional settings, my dad still cannot tell whether the audio is muted or not just by looking at the button. (Does red color mean that it is already muted? Or I click the red and it will mute?)

Whereas a sensible, more intuitive audio on/off button will look more like this.

(Flip the switch, light means it is currently on, no light means it’s currently off… Everyone know this from interacting with other electronic gadgets, there is no need to specially train the user)

Hence, when you don’t have this auto intuition built into the original design, explicit interactivity (a.k.a. helper texts here and there, or your IT support guy) has to come in unnecessarily. This interactivity is just a bloat in the system, and a waste of space, and resources.

Well, as they say “Communication is key”, I appreciate the importance of interactivity in human-software interactions. However, in the context of Good software, such “talk-back” explicit Interactivity should be the last resort a good designer should fall back to.

A good doctor doesn’t easily prescribe antibiotics… this is supposed to be the last resort.

Hence, from the artistic engineer point of view…. When designing anything, intuitive function has to come first, then the form, and only then throw in the “explicit” when there is no other way out.

Perhaps, it is time we rebrand Crawford’s definition to Practically-Aesthetic-Interactivity (abbreviates to PAI? Hahaha, very punny Pi, very punny.), and we may be…. just may be… see better intuitive software in the future.