Chapter 1 Syllabus

1.1 Class Schedule

January Term 2020
Tue 12.30-2pm Think Thank 17 (2.202)
Thu 11.00-12.30pm Think Thank 17 (2.202)
Office Hours: By appointment

1.2 Course Description

This course focuses on quantitative and computational approaches to urban analytics and data science. It exposes students to new ways of collecting large datasets (“big data”) and innovative methods of analysing such datasets. It draws on both more conventional methods such as (spatial) statistics, as well as how to appropriately use methods from data science and machine learning within an urban context.

1.3 Format

The course is structured around two 1.5 hour classes per week that integrate lecture, discussion and in-class activities and exercises in an interactive manner. The class is further structured around 6 blocks (each lasting two weeks).

1.4 Expectations

Students are expected to be present and actively participate in each class, as well as on the class online forum (Slack). Before coming to class, you will have read the assigned readings and you are coming to class prepared to participate in discussion and exercises.

You are also expected to produce your own work, whether individually or in groups. Do not copy work from the internet or other published sources without proper citations. This is plagiarism and if a student is found to be doing so, he or she will be subject to disciplinary measures including potentially failing the course.

1.5 Assessment

There will be a variety of assessments throughout the semester. Emphasis is on your performance overall, with relatively low weight placed on individual items. Continued participation throughout the semester will enable you to do well in this course.

Assessment Items Percentage Period
Class Participation 15% Throughout term
Assignments (five) 45% Throughout term
Final Project 40% Week 14

1.5.1 Assignments

Each block consists of a series of exercises that culminate in an assigment/report that will be due before the start of the next block (Monday 23.59).

1.5.2 Final Project

For the final project, you will develop a computational analysis of a topic and dataset of your own choice. This is an opportunity to explore (an aspect of) your final term project in more detail – or take a deep-dive in a topic or dataset that you’re interested in. You can choose to use already existing datasets, or collect your own but you must use one or more computational methods to help answer your reseach question. In Week 6, you will hand in a research proposal of a maximum of 1000 words. It should discuss your motivation for the project; its objectives and research question; data requirements (does the data already exist? where do you get it from? does it need a lot of post-processing); and the methods you plan on utilizing. The remainder of the term is spent on collecting and analyzing data. A prototype of your analysis is due at the end of Week 13 and the final version at the end of Week 14. You are required to hand in both the code, a written paper and (where appropriate) visualizations as part of your Github repository. You can do this in a single form factor through one or more RMarkdown documents.

1.6 Deadlines

Deadlines are as noted in the course syllabus or on the specific assignment. If something is due on a specific date, you have until midnight on that day to submit the assignment.

1.7 Software

We will use R, RStudio and a series of packages (most of them compatible with the Tidyverse). All software used in the course is open-source and freely available.

1.8 Detailed Outline

1.8.1 A math refresher

  1. Reading:
  • Reading mathematics, PDF
  • Handbook for spoken mathematics, PDF

1.8.2 Block 1: Analysis of HDB Resale Prices I

  1. Reproducible Science & Project Management (+ re-view of tidyverse basics)
  1. Exploratory Data Analysis (Univariate Statistics & Visualization)

1.8.3 Block 2: Analysis of HDB Resale Prices II

  1. Sampling, Bootstrapping & Confidence Intervals
  1. Correlation & Linear Regression

1.8.4 Block 3: Geodemographics of SG Neighborhoods I

  1. Dimension Reduction I: Multidimensional Scaling (MDS)


  1. Dimension Reduction II: Principal Component Analysis (PCA)

1.8.5 Block 4: Geodemographic of SG Neighborhoods II

  1. Clustering (non-spatial)
  1. Clustering (spatial, including refresher on spatial data structures)

1.8.6 Block 5: Spatial Statistics

  1. Spatial Statistics I (Spatial Autocorrelation)
  1. Spatial Statistics II (Spatial Regression Modeling)

1.8.7 Block 6: Towards Machine Learning

  1. Towards Machine Learning (logistic regression & Random forests)

1.8.8 Week 14: Final Project Studio