Assignment 4: Data Visualization of World Cities

Inspiration/ Concept

For this assignment, I opted for data visualization. During the pre-coding phase, I stumbled upon a GitHub repository consisting of numerous datasets. The data has been scraped from Aneki World Cities site and comprises information on 10, 567 unique cities (a total of 170 countries). Thus, I cloned the repository and used its CSV file as my primary data source. Using the CSV file, I have displayed a simple world map and plotted all the cities and their information on the map; furthermore, inspired by last week’s reading, the project intends to provide a level of user interaction to receive additional information on each city.

 

Code

The project was relatively simple in nature. Before I started writing code, I divided the project into three primary sections, where each functionality depends on one or more than one functions. 

  1. Data Processing
  2. Data Representation
  3. User Interactions

Data Processing

This is the building block of the entire project. In this section, using the inbuilt preload() function of p5.js, the dataset is loaded into the system (in a variable named ‘table’), and using one single for loop (thus, the time complexity is maintained on a linear scale), data are extracted from each row and stored in their respective arrays. For instance, when the program goes through line one, names of countries and cities are stored in country_arr[] and city_arr[] arrays respectively, whereas latitude, longitude and elevation data are stored in lat_arr[], long_arr[] and elevation[] arrays respectively. 

Then, the findMinMaxVal() function is used to find the minimum and the maximum values for each numerical element (longitude, latitude and elevation). 

function findMinMaxVal()
{
  for (let r = 0 ; r < num_rows; r++)
  {
    // If latitude information in the current row is greater than the value stored in max_lat, update max_lat
    // Similary for longitude and elevation
    if (table.getNum(r, lat_col) > max_lat)
      max_lat = table.getNum(r, lat_col);
    
    // If latitude information in the current row is lower than the value stored in min_lat, update min_lat
    // Similary for longitude and elevation
    if (table.getNum(r, lat_col) < min_lat)
      min_lat = table.getNum(r, lat_col);
    
    if (table.getNum(r, long_col) > max_lat)
      max_long = table.getNum(r, long_col);
    
    if (table.getNum(r, long_col) < min_long)
      min_long = table.getNum(r, long_col);
    
    if (table.getNum(r, elevation_col) > max_elevation)
      max_elevation = table.getNum(r, elevation_col);
    
    if (table.getNum(r, elevation_col) < min_elevation)
      min_elevation = table.getNum(r, elevation_col);
  }
}

 

As seen in the code, the program loops through individual arrays mentioned above and use if-conditions to determine if the values being read are greater or less than the values stored in the variables — if the condition is true, the variable is updated and the program continues to the next iteration. For example, in the first conditional, the program checks if the latitude currently being read is greater than the maximum latitude recorded so far; if there is a new maximum (i.e. the condition evaluates to true), max_lat is updated. In the same way, the function checks for minimum and maximum values for longitude, latitude and elevation data. 

After going through the entire CSV file, the global variables (max_lat, min_lat, max_long, min_long, max_elevation and min_elevation) represent the maximum and the minimum values of their own types. 

Finally, using an additional for-loop, the program iterates through each array, maps the value onto a new range (such that it is representable on the canvas of size height by width) and stores them in new arrays. For instance, elevation is mapped to a new range from 0 to 100 and the modified value is stored in the elevation_arr_mapped[] array, longitude is mapped onto a new range of 10 to width divided by 1.2 and so forth. 

Data Representation and User Interactivity 

All the data processed in the first step is used here to graphically represent the data. I included all the code for this step in the draw() function. Although it makes the overall program more processing-intensive, it assists in the next step (which is user interaction). Thus, each line of code in this step is refreshed a certain number of times every second. 

At first, the program checks if the elevation is higher than 80% or not – thus dividing the entire dataset into two halves. Then, different fill() colors are selected for each half — red color is used for the former and green for the latter. 

Then, the program checks if the cursor is close to any x and y positions of cities (x position is analogous to longitude and y position to latitude). If the distance between them is less than 0.5 pixels, the radius of the circle is increased to 20 and the fill color is changed to purple; otherwise, the radius is kept at 5 and the fill color is either green or red. 

// Store x and y position for each city using mapped arrays, where longitude = x and latitude = y
let point_x = long_arr_mapped[i];
let point_y = lat_arr_mapped[i];
let circle_radius;


// HOVER EFFECT
// find the distance between cursor and each city
let distance = dist(mouseX, mouseY, point_x, point_y);

// If the distance is less than 0.5, call hover_info() function, increase circle radisu and use different fill color
// Otherwise, lower the circle radius and use green as fill color
if (distance < 0.5)
{
  fill("purple");
  circle_radius = 10;
  circle(long_arr_mapped[i], lat_arr_mapped[i], circle_radius);
  
  hover_info(i);
}
else
{
  circle_radius = 5;
  circle(long_arr_mapped[i], lat_arr_mapped[i], circle_radius);
}

 

 

Afterward, the inbuilt circle() function is called using the aforementioned parameters and a map of the world, with city information, is displayed on the canvas. 

The user-interactivity involves a hovering feature on the canvas. If the user wants to receive more information about a city, the cursor can be moved above a particular city and as soon as it is close to the desired location, the dot pops up (i.e. its radius increases as described earlier) and a text box appears displaying its name, location and elevation. The functionality is implemented using a function titled ‘hover_info()’ which is called inside the draw() function. It takes the index of a city as a parameter and the said index is used to locate values (city name, country name and elevation data) located at the position “index” in city_arr_mapped[], country_arr_mapped[] and elevation_arr_mapped[] arrays respectively. 

function hover_info(index)
{
  // Text Settings
  textSize(20);
  textAlign(LEFT);
  textFont(myFont);
  fill("white");
  stroke(2);
  
  let text_box_pos_x;
  let text_box_pos_y;
  
  // Conditionals to ensure text stays within the canvas. 
  // If cursor is to the extreme right, reset x and y positions for text accordingly
  if (mouseX < (w/2))
  {
    text_box_pos_x = (150);
  }
  else
  {
    text_box_pos_x = (w - 350);
   
  }
  text_box_pos_y = (40);
  

  // Call text() to display the required information
  // Displaying just the name of the country and its elevation
  text("Location: " + country_arr[index] + "\n" + "Elevation: " + elevation_arr[index] + " m", text_box_pos_x, text_box_pos_y, w/4);
    
}

 

Reflection

The project involved straightforward coding; however, the part that took a significant amount of time was finding an appropriate dataset. Since the dataset found either lacked longitude-latitude information or elevation data, research took a noticeable amount of time. I was initially planning to plot information on at least 30,000 cities, but finding enough data was another nuisance, as a result, I had to settle for 16,000 rows (approximately). Thus, one improvement could be finding a fresh dataset that covers more cities. 

Similarly, the dataset seems to be concentrated on a particular location. Consequently, while hovering over each location, different points overlap and the text being displayed seems buggy — along with the draw() function, the hover_info() is also called 24 times every second, thus removing this bug proved to be a task. In the future iteration of this project, an appropriate algorithm can be developed to either remodify the hover_info() function or to opt for other modes of interactivity such as data input using text input area and using it to display additional information on each city. For now, the cursor needs to be kept steady as each circle has a radius of 5 pixels to avoid clustering of points. With more investment of time, the project can be further polished.  

For better interactivity, zoom in to differentiate between points and use full-screen mode using this link here.

 

That said, this was an interesting project. I learned a lot of back-end things apart from the actual programming – whether it be going through countless dataset collections or refining my code to make it more interactive. Overall, I am happy with the final outcome, and I look forward to working on my research as well as my coding skills. 

 

One thought on “Assignment 4: Data Visualization of World Cities”

  1. You could try only updating the hover info every second or half second to improve the performance. The hover info wasn’t working for me at all (Chrome / OSX). It would be great to have something in the sketch that lets the user know what is being visualized. You could consider having +/- buttons or some other UI in the sketch to let the user zoom in / out.

Leave a Reply