# Chapter 6 Multidimensional scaling

## 6.1 Introduction

In this Session, we will start looking at dimension reduction strategies. In general, these strategies are used to turn a dataset with N dimensions into a dataset with N-x dimensions. Why is this important? A very practical concern is that in our data-rich urban environment of today many of our datasets have many variables. How can we make sense of a dataset where we have dozens and dozens of variables for every single observation? As it turns out, in many cases our variables have certain degrees of correlation: they share certain information. We can use this to express the same amount of information (or variance in statistical terms) with, potentially, fewer dimensions or variables. We will see how this works in practice with two specific techniques. In this first Chapter we will focus on multidimensional scaling (MDS), and in the next we will move on to Principal Components Analysis (PCA).

## 6.2 Multidimensional Scaling of MRT stations

The goal of MDS is relatively straightforward on a conceptual level: given a set of distances between objects, create a map (i.e. find the position of each object) that displays the relative position of each object correctly. In mathematical terms, we are trying to minimize the *stress* between the real or observed distances (\(d\)) and the distances in the MDS map (\(\hat{d}\)): \(\sqrt{ \frac{\sum (d_{ij} - \hat{d}_{ij})^2}{\sum d_{ij}^2} }\).

Let’s put this into practice by loading a dataset consisting of 10 MRT stations in Singapore and the distance (as the crow flies, in meters) between them. Note that the actual location of each station is NOT included in this dataset.

```
tibble::tribble(
station_distance <-~STN_NAME, ~YISHUN_MRT_STATION, ~PASIR_RIS_MRT_STATION, ~JURONG_EAST_MRT_STATION, ~TUAS_LINK_MRT_STATION, ~BENCOOLEN_MRT_STATION, ~PUNGGOL_MRT_STATION, ~MARINA_BAY_MRT_STATION, ~DHOBY_GHAUT_MRT_STATION, ~ROCHOR_MRT_STATION, ~UPPER_CHANGI_MRT_STATION,
"YISHUN_MRT_STATION", "0", "14165.8028309003", "14826.8234223324", "24144.58370996", "14539.8253530485", "7955.17551381608", "17097.0983148766", "14495.1289232166", "14014.7715236133", "17091.9107945143",
"PASIR_RIS_MRT_STATION", "14165.8028309003", "0", "23451.9973340811", "34950.8745491473", "13721.3248153691", "6306.08987347808", "14983.0092441387", "14095.3770091796", "13209.4818715068", "3712.4090725179",
"JURONG_EAST_MRT_STATION", "14826.8234223324", "23451.9973340811", "0", "11758.6400823558", "12613.5429467971", "19455.8643741636", "14047.8304857995", "12175.0907736314", "12676.1241992151", "24411.4671044188",
"TUAS_LINK_MRT_STATION", "24144.58370996", "34950.8745491473", "11758.6400823558", "0", "24202.2573770791", "30351.6324743735", "25305.1676395598", "23749.9738806756", "24325.8635170677", "36124.0377992668",
"BENCOOLEN_MRT_STATION", "14539.8253530485", "13721.3248153691", "12613.5429467971", "24202.2573770791", "0", "13024.8987292155", "2565.66801256045", "461.323513564845", "602.910926685915", "13240.7330980557",
"PUNGGOL_MRT_STATION", "7955.17551381608", "6306.08987347808", "19455.8643741636", "30351.6324743735", "13024.8987292155", "0", "15129.4446906282", "13237.1986125019", "12423.6186258471", "9588.13908018742",
"MARINA_BAY_MRT_STATION", "17097.0983148766", "14983.0092441387", "14047.8304857995", "25305.1676395598", "2565.66801256045", "15129.4446906282", "0", "2695.76728986719", "3084.57512743706", "13883.5893006481",
"DHOBY_GHAUT_MRT_STATION", "14495.1289232166", "14095.3770091796", "12175.0907736314", "23749.9738806756", "461.323513564845", "13237.1986125019", "2695.76728986719", "0", "887.79255969772", "13673.2848886229",
"ROCHOR_MRT_STATION", "14014.7715236133", "13209.4818715068", "12676.1241992151", "24325.8635170677", "602.910926685915", "12423.6186258471", "3084.57512743706", "887.79255969772", "0", "12836.5477607036",
"UPPER_CHANGI_MRT_STATION", "17091.9107945143", "3712.4090725179", "24411.4671044188", "36124.0377992668", "13240.7330980557", "9588.13908018742", "13883.5893006481", "13673.2848886229", "12836.5477607036", "0"
)## tibble creation code created with datapasta: very useful for quickly copying small datasets to/from R.
```

Let’s check what’s in there:

`%>% head() station_distance `

```
## # A tibble: 6 x 11
## STN_NAME YISHUN_MRT_STAT… PASIR_RIS_MRT_S… JURONG_EAST_MRT… TUAS_LINK_MRT_S…
## <chr> <chr> <chr> <chr> <chr>
## 1 YISHUN_… 0 14165.8028309003 14826.8234223324 24144.58370996
## 2 PASIR_R… 14165.8028309003 0 23451.9973340811 34950.8745491473
## 3 JURONG_… 14826.8234223324 23451.9973340811 0 11758.6400823558
## 4 TUAS_LI… 24144.58370996 34950.8745491473 11758.6400823558 0
## 5 BENCOOL… 14539.8253530485 13721.3248153691 12613.5429467971 24202.2573770791
## 6 PUNGGOL… 7955.17551381608 6306.08987347808 19455.8643741636 30351.6324743735
## # … with 6 more variables: BENCOOLEN_MRT_STATION <chr>,
## # PUNGGOL_MRT_STATION <chr>, MARINA_BAY_MRT_STATION <chr>,
## # DHOBY_GHAUT_MRT_STATION <chr>, ROCHOR_MRT_STATION <chr>,
## # UPPER_CHANGI_MRT_STATION <chr>
```

All our variables are of type `chr`

(character), so we will need to change some of them to `numerical`

. Additionally, the first column is the name of the MRT station. We want to use it as a “label” for the row, like we use column names as labels for columns: this is what `column_to_rownames`

does.

Once transformed, we can use the `cmdscale`

function (available in base R) to perform our MDS process. We convert our tibble to matrix first, as that is the format that `cmdscale`

is expecting.

```
# convert to matrix (making sure all distances are numeric not character)
station_distance %>%
station_distance_matrix <- column_to_rownames(var = "STN_NAME") %>%
mutate_all(as.numeric) %>%
as.matrix()
# perform MDS
cmdscale(station_distance_matrix) %>%
mds_station <- as_tibble() %>%
rename(Y1 = V1, Y2 = V2)
# add station name to each position
station_distance %>%
mds_station <- select(STN_NAME) %>%
bind_cols(., mds_station)
mds_station
```

```
## # A tibble: 10 x 3
## STN_NAME Y1 Y2
## <chr> <dbl> <dbl>
## 1 YISHUN_MRT_STATION 46.7 -9930.
## 2 PASIR_RIS_MRT_STATION 12108. -2501.
## 3 JURONG_EAST_MRT_STATION -11242. -317.
## 4 TUAS_LINK_MRT_STATION -22842. -2243.
## 5 BENCOOLEN_MRT_STATION 371. 4606.
## 6 PUNGGOL_MRT_STATION 7213. -6476.
## 7 MARINA_BAY_MRT_STATION 653. 7157.
## 8 DHOBY_GHAUT_MRT_STATION -88.5 4565.
## 9 ROCHOR_MRT_STATION 650. 4072.
## 10 UPPER_CHANGI_MRT_STATION 13130. 1068.
```

Those coordinates are difficult to read! Let’s create a map instead.

```
ggplot(mds_station, aes(x = Y1, y = Y2)) +
geom_text(aes(label = STN_NAME))
```

If you know Singapore a little bit, you immediately recognize that the stations are more or less in the correct position! Not a bad result per se. But what if we didn’t exactly know the ‘real’ location of each station - how would we assess how well our MDS process has performed? One way would be to compare the real distances with the predicted distances. We can do this visually, as well as by calculating a stress metric.

```
cmdscale(station_distance_matrix) %>%
distance_pred <- dist(upper = T) %>%
as.matrix()
station_distance_matrix
distance <-
tibble(
distance = as.vector(distance),
distance_pred = as.vector(distance_pred)
%>%
) ggplot(aes(x = distance, y = distance_pred)) +
geom_point()
```

Visually things seem very close together. We see this reflected in the stress metric as well. The stress metric is *very* close to 0. 0 would be a perfect match, whereas anything much over 0.1 would warrant some closer inspection. Similar to goodness of fit measures for regression, what is ‘good’ ultimately really depends on your research question and dataset.

```
function(d, d_hat) {
mds_stress <-sqrt(
sum((d - d_hat)^2) / sum(d^2)
)
}
mds_stress(distance, distance_pred)
```

`## [1] 2.447984e-15`

## 6.3 MDS with travel distance

The map we created in the previous section might seem quite trivial. After all, if we already know the exact position of each MRT station, why would we need to re-compute this position based on the distances between MRT stations? In fact, the real power of MDS comes from the fact that you can use *any* variable or indicator as the distance metric. For example, we can look at the travel distance by train instead of the geographic distance.

```
tibble::tribble(
station_mrt_distance <-~STN_NAME, ~BENCOOLEN_MRT_STATION, ~DHOBY_GHAUT_MRT_STATION, ~JURONG_EAST_MRT_STATION, ~MARINA_BAY_MRT_STATION, ~PASIR_RIS_MRT_STATION, ~PUNGGOL_MRT_STATION, ~ROCHOR_MRT_STATION, ~TUAS_LINK_MRT_STATION, ~UPPER_CHANGI_MRT_STATION, ~YISHUN_MRT_STATION,
"BENCOOLEN_MRT_STATION", 0, 476, 2077, 660, 2347, 1975, 640, 3312, 1918, 2237,
"DHOBY_GHAUT_MRT_STATION", 476, 0, 2018, 364, 1966, 1484, 287, 3253, 2243, 1837,
"JURONG_EAST_MRT_STATION", 2641, 2001, 0, 1887, 3676, 3389, 2152, 1739, 3699, 2427,
"MARINA_BAY_MRT_STATION", 754, 362, 1880, 0, 2164, 1933, 570, 3115, 2556, 2143,
"PASIR_RIS_MRT_STATION", 2347, 1974, 3679, 2174, 0, 2582, 1825, 4914, 1452, 3755,
"PUNGGOL_MRT_STATION", 1913, 1484, 3455, 1763, 2617, 0, 1475, 4690, 2622, 2222,
"ROCHOR_MRT_STATION", 640, 382, 1937, 487, 1814, 1630, 0, 3172, 2081, 1826,
"TUAS_LINK_MRT_STATION", 3875, 3186, 1738, 3121, 4910, 4574, 3464, 0, 5016, 3678,
"UPPER_CHANGI_MRT_STATION", 1779, 2104, 3632, 2326, 1217, 3726, 1916, 4867, 0, 3814,
"YISHUN_MRT_STATION", 2253, 1804, 2395, 2112, 3714, 2167, 1690, 3641, 4076, 0
)
station_mrt_distance %>%
station_mrt_distance_matrix <- column_to_rownames(var = "STN_NAME") %>%
mutate_all(as.numeric) %>%
as.matrix()
```

### 6.3.1 Exercise

Now apply the same procedure to construct the MDS map for the travel distance, working through the following steps:

- Calculate the positions of each station with
`cmdscale`

. We saved this in`mds_station`

in the previous section: save it in a new object`mds_mrt_station`

. - Bind the station names to the resulting table.
- Plot the table with
`ggplot`

. - Plot a scatterplot of the real distances vs predicted distances and calculate the stress metric. What could be potential reasons for the difference in the metric with the previous section?

To help interpret the differences between travel distance and the geographic distance, you can also add *both* MDS results into a single map. Try to find a way to do this (bonus points for drawing arrows (see `geom_segment`

) between the two sets of points): your end result should look something like this. Can you explain the differences between the two MDS maps? (You can transform the data if you find that the axes are flipped, e.g., North is South or East is West)

```
mds_mrt_station %>%
mds_mrt_station <- mutate(
y1_norm = rescale(Y1),
y2_norm = rescale(Y2)
)
mds_station %>%
mds_station <- mutate(
y1_norm = rescale(Y1),
y2_norm = rescale(Y2)
)
ggplot(mds_station, aes(x = y1_norm, y = y2_norm)) +
geom_label(aes(label = STN_NAME), color = "red", fill = rgb(1, 1, 1, 0.5)) +
geom_label(
data = mds_mrt_station, aes(x = y1_norm, y = y2_norm, label = STN_NAME),
color = "blue", fill = rgb(1, 1, 1, 0.5)
)
```

## 6.4 Multidimensional Scaling of Arbitrary Variables (two dimensions)

In the previous Section, you have explored the application of MDS to distances (both geographics and travel time) between MRT stations. While the travel time distances might be interesting in and of itself, MDS can be applied to any type of distance. Furthermore, until now we have only used two variables as input for our distance calculations so we can’t really speak of dimension *reduction* as of yet! In this Section and the next, we’ll mitigate that by analyzing a dataset of Singapore’s planning areas and their various characteristics.

**You can find the dataset on the lecture notes course repository. Copy it to the data folder of your own project.**

```
read_csv(here::here("data/planning_areas.csv"))
planning_areas <-glimpse(planning_areas)
```

```
## Rows: 28
## Columns: 22
## $ planning_area <chr> "ANG MO KIO", "BEDOK", "BISHAN…
## $ area_hdb <dbl> 0.0382151906, 0.0320619695, 0.…
## $ area_res <dbl> 0.0400515955, 0.0835754816, 0.…
## $ area_other <dbl> 0.05772511, 0.04450232, 0.0797…
## $ schools_secondary <dbl> 0.4303735, 0.4601258, 0.525008…
## $ hotel_rooms <dbl> 0.000000, 18.266994, 0.000000,…
## $ hh_income_lt_2000 <dbl> 0.25920000, 0.20995671, 0.1588…
## $ hh_income_gt_17500 <dbl> 0.1296000, 0.2012987, 0.263537…
## $ ethn_malays <dbl> 0.07472678, 0.15178602, 0.0414…
## $ ethn_indians <dbl> 0.08096355, 0.08666091, 0.0771…
## $ hh_1person <dbl> 0.16613419, 0.13636364, 0.0967…
## $ hh_5person_plus <dbl> 0.1900958, 0.2326840, 0.258064…
## $ age_0_9 <dbl> 0.08268009, 0.08659488, 0.0855…
## $ age_10_19 <dbl> 0.10076100, 0.10933941, 0.1128…
## $ age_gt_65 <dbl> 0.16255650, 0.14550977, 0.1344…
## $ edu_no_qual_primary <dbl> 0.25653370, 0.19679849, 0.1365…
## $ edu_university <dbl> 0.2434663, 0.2961394, 0.372980…
## $ occupation_senior_manager_professional <dbl> 0.3218164, 0.3845642, 0.479757…
## $ occupation_cleaners_labourers_plant <dbl> 0.18756170, 0.13439787, 0.0769…
## $ dwelling_1_2_room <dbl> 0.089456869, 0.043336945, 0.01…
## $ dwelling_condo_landed <dbl> 0.15814696, 0.34236186, 0.2913…
## $ status_unemployed <dbl> 0.02926209, 0.02542373, 0.0249…
```

Most of the variables are relatively self-descriptive. You should know that all variables are relative (i.e. % relative to total population) except for two variables: `hotel_rooms`

and `schools_secondary`

are both expressed in units per square kilometre. This isn’t so important now but it will come back soon so do take note.

We will get started by applying MDS to only two variables in our dataset: the % of household with income greater than $17.5k; and the % of area used by non-HDB residences. Both relate very much to economic class. You already know how to create a custom ‘economic class’ space and position each planning area within that space. This is what a scatterplot does!

```
ggplot(planning_areas, aes(area_res, hh_income_gt_17500)) +
geom_text(aes(label = planning_area))
```

It is clear there is a relationship between the two variables but you can also clearly see some specific ‘areas’ within the chart (e.g. Tanglin + Bukit Timah; Novena + Bishan + Marine Parade + Bedok; Serangoon). Let’s see if we can reproduce this plot through multi-dimensional scaling as well.

To do so, we need to calculate the distances between each point and all other points. As distance calculations in R generally operate on matrices (AFAIK there are no `tidyverse`

compatible libraries yet), we convert to a matrix first, as we have done in the first two sections.

```
planning_areas %>%
pln_area_matrix <- select(area_res, hh_income_gt_17500) %>%
as.matrix()
%>% head() pln_area_matrix
```

```
## area_res hh_income_gt_17500
## [1,] 0.040051596 0.1296000
## [2,] 0.083575482 0.2012987
## [3,] 0.055684550 0.2635379
## [4,] 0.017558208 0.1526196
## [5,] 0.003096563 0.1270417
## [6,] 0.017867289 0.1452785
```

Great! Once we have a matrix, we can calculate the distance with the `distances`

function from the library by the same name. It doesn’t do anything too fancy: it calculates the Euclidean distance between each set of points. If you remember the Pythagorean Theorem, the Euclidean distance is relatively intuitive to calculate. For a two-dimensional (\(x\), \(y\)) space: \(distance(p_i, p_j) = \sqrt{(x_i - x_j)^2 + (y_i - y_j)^2}\).

```
distances(pln_area_matrix) %>%
pln_area_dist <- as.matrix()
%>% head() pln_area_dist
```

```
## 1 2 3 4 5 6 7
## 1 0.00000000 0.08387510 0.13484714 0.032184686 0.03704348 0.027165369 0.4044788
## 2 0.08387510 0.00000000 0.06820281 0.082023999 0.10950321 0.086347177 0.3274620
## 3 0.13484714 0.06820281 0.00000000 0.117288068 0.14627611 0.124158947 0.2700641
## 4 0.03218469 0.08202400 0.11728807 0.000000000 0.02938308 0.007347643 0.3867945
## 5 0.03704348 0.10950321 0.14627611 0.029383081 0.00000000 0.023468103 0.4152234
## 6 0.02716537 0.08634718 0.12415895 0.007347643 0.02346810 0.000000000 0.3938374
## 8 9 10 11 12 13
## 1 0.02983052 0.05521157 0.009743021 0.02865862 0.039625795 0.03985095
## 2 0.10363638 0.05440185 0.093012674 0.05537553 0.106882993 0.12194128
## 3 0.14290053 0.08280851 0.144576242 0.10829750 0.140872392 0.16525466
## 4 0.02563138 0.03470522 0.038703122 0.03531690 0.025252657 0.04832371
## 5 0.00723530 0.06402710 0.035987427 0.05714278 0.007240508 0.02258667
## 6 0.01883376 0.04139305 0.032559379 0.03631191 0.020656097 0.04120097
## 14 15 16 17 18 19
## 1 0.034893079 0.14937245 0.19204491 0.028481532 0.05751171 0.04020559
## 2 0.087459358 0.07029106 0.12205517 0.090219149 0.06929495 0.11875160
## 3 0.120923080 0.03196319 0.05728708 0.127404216 0.09003238 0.15845698
## 4 0.005650180 0.13751202 0.17305310 0.010132856 0.02965275 0.04118410
## 5 0.025353066 0.16689345 0.20148596 0.019700723 0.05693430 0.01292848
## 6 0.007796717 0.14375890 0.18012398 0.003873727 0.03699904 0.03455143
## 20 21 22 23 24 25 26
## 1 0.03136031 0.02701852 0.036362813 0.16015314 0.03391935 0.3600459 0.018353435
## 2 0.09748907 0.10364471 0.104059460 0.07855645 0.09521800 0.2852168 0.085759667
## 3 0.13401908 0.14483924 0.139231635 0.08933905 0.12984416 0.2252277 0.128198570
## 4 0.01689693 0.02792676 0.022830641 0.16052132 0.01346144 0.3411439 0.015060369
## 5 0.01250847 0.01053080 0.007272423 0.18798908 0.01643606 0.3692545 0.024548091
## 6 0.01115344 0.02075878 0.017721253 0.16489815 0.00959725 0.3482697 0.008927303
## 27 28
## 1 0.05511848 0.05492234
## 2 0.13854146 0.13874887
## 3 0.18313382 0.18524216
## 4 0.06618326 0.06888574
## 5 0.03923593 0.04319390
## 6 0.05908557 0.06165084
```

We can now use these distances in the same way as we used the distances between MRT stations.

```
cmdscale(pln_area_dist) %>%
mds_planning_area <- as_tibble() %>%
rename(Y1 = V1, Y2 = V2)
# add planning area name back
planning_areas %>%
mds_planning_area <- select(planning_area) %>%
bind_cols(., mds_planning_area)
%>% head() mds_planning_area
```

```
## # A tibble: 6 x 3
## planning_area Y1 Y2
## <chr> <dbl> <dbl>
## 1 ANG MO KIO 0.0507 0.0197
## 2 BEDOK -0.0301 0.0424
## 3 BISHAN -0.0825 -0.00119
## 4 BUKIT BATOK 0.0345 -0.00814
## 5 BUKIT MERAH 0.0631 -0.0152
## 6 BUKIT PANJANG 0.0415 -0.00588
```

```
ggplot(mds_planning_area, aes(x = Y2, y = -Y1)) + # you might need to flip the axes
geom_text(aes(label = planning_area))
```

### 6.4.1 Exercise

Is the resulting scatterplot a faithful reproduction of the original? What could be potential reasons for this? Plot the actual distance versus the distance in the MDS plot and calculate the stress metric, like you have done in the previous Section.

## 6.5 Multidimensional Scaling of Arbitrary Variables (more than two dimensions)

In the exact same way, we add additional variables to our MDS analysis. All it takes is adding these variables to our original matrix. For example, to add 3 additional variables related to economic class, you could do:

```
planning_areas %>%
pln_area_matrix <- select(area_res, hh_income_gt_17500, edu_university, occupation_senior_manager_professional, status_unemployed) %>%
as.matrix()
```

Or do include all variables (except `planning_area`

):

```
planning_areas %>%
pln_area_matrix <- select(-planning_area) %>%
as.matrix()
```

`distances`

will calculate the Euclidean distances across as many dimensions as you’d like. For n-dimensional space (\(x\), \(y\), …, \(n\)), Euclidean distance is simply: \(distance(p_i, p_j) = \sqrt{(x_i - x_j)^2 + (y_i - y_j)^2 + ... + (n_i - n_j)^2}\).

```
distances(pln_area_matrix) %>% as.matrix()
pln_area_dist <-
cmdscale(pln_area_dist) %>%
mds_planning_area <- as_tibble() %>%
rename(Y1 = V1, Y2 = V2)
# add planning area name back
planning_areas %>%
mds_planning_area <- select(planning_area) %>%
bind_cols(., mds_planning_area)
ggplot(mds_planning_area, aes(x = Y1, y = Y2)) +
geom_text(aes(label = planning_area))
```

Wow - what happened to the plot and the coordinates of the x-axis? We have one final issue to contend with. Not all of our variables are on the same scale. While most are in relative percentages, the `hotel_room`

variable is measured in the number of rooms per square kilometer. In Outram there are over 3000 rooms per square kilometer! This dramatically affects our distances. Whenever we need to calculate non-geographic distances (and distances are often an input for a variety of machine learning techniques), it is generally good practice to normalize all the variables first. This gets rid of another issue: variables with lower variance will have less impact on the distance as well. There are quite a few ways to normalize a variable. Here we use a built-in feature of `distances`

that normalizes by the variance (also called *studentizing*).

```
distances(pln_area_matrix, normalize = "studentize") %>% as.matrix()
pln_area_dist <-
cmdscale(pln_area_dist) %>%
mds_planning_area <- as_tibble() %>%
rename(Y1 = V1, Y2 = V2)
# add planning area name back
planning_areas %>%
mds_planning_area <- select(planning_area) %>%
bind_cols(., mds_planning_area)
ggplot(mds_planning_area, aes(x = Y1, y = Y2)) +
geom_text(aes(label = planning_area))
```

### 6.5.1 Exercise

Calculate the stress metric and plot the distance as predicted by MDS against the actual distance calculated above. What happened to the stress as we increased the number of variables?

A few more questions to consider:

- Based on your own knowledge of Singapore - does the map of Singapore in ‘socio-economic’ space make sense?
- Can you think of more descriptive labels for the x- and y-axes?

The last question is a difficult one! Multidimensional scaling projects an *n*-dimensional dataset to 2 (or any other number) of dimensions. This can be very useful as it allows you to see similar observations grouped together in space. However, it does not necessarily give ‘meaning’ to the remaining two dimensions. It also does not tell you if perhaps a 3-dimensional representation would capture a much larger portion of the variance (thus decreasing stress) than a 2-dimensional map. To deal with this problem, we have another set of techniques available that we often refer to as *principal component analysis*, which we will continue with in the next Chapter.

N.B. If you are interested in additional techniques focused on reducing a large number of dimensions to two dimensions for easy visualization and potentially spotting ‘clusters’, you can check out two more recent algorithms that are potentially better able to preserve the ‘structure’ of a multi-dimensional dataset: `t-sne`

and `UMAP`

. You can find an easy to use implementation in the smallvis and uwot packages.

```
## Bonus Section
library(uwot)
umap(planning_areas, n_neighbors = 8, scale = TRUE, init = "random") %>%
mds_planning_area <- as_tibble() %>%
rename(Y1 = V1, Y2 = V2)
planning_areas %>%
mds_planning_area <- select(planning_area) %>%
bind_cols(., mds_planning_area)
ggplot(mds_planning_area, aes(x = Y1, y = Y2)) +
geom_text(aes(label = planning_area))
```