pacman::p_load(sf,tidyverse)In Class Exercise 2
2.0 Overview
In this exercise, we learn the various practices of importing data,
Before we start the exercise, we will need to import necessary R packages first. We will use the following packages sf and tidyverse.
2.1 Importing data
2.1.1 Dataset
We will be using the below datasets for this exercise.
Master Plan 2014 Subzone Boundary (Web) from data.gov.sg
Master Plan 2019 Subzone Boundary (Web) from data.gov.sg
Singapore Residents by Planning Area / Subzone, Age Group, Sex and Type of Dwelling, June 2023 from singstat.gov.sg
2.1.2 Master Plan 2014 Subzone Boundary
This code chunk imports in shapefile.
mpsz14_shp <- st_read(dsn = "data/MPSZ2014/MasterPlan2014SubzoneBoundaryWebSHP/",
layer = "MP14_SUBZONE_WEB_PL")Reading layer `MP14_SUBZONE_WEB_PL' from data source
`/Users/georgiaxng/georgiaxng/is415-handson/In-class_Ex/In-class_Ex02/data/MPSZ2014/MasterPlan2014SubzoneBoundaryWebSHP'
using driver `ESRI Shapefile'
Simple feature collection with 323 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21
Converting the Master Plan 2014 Subzone Boundary shapefile to a kml file.
#! output: false
mpsz14_kml = st_write(mpsz14_shp,"data/MPSZ2014/MasterPlan2014SubzoneBoundary_WEB_PL.kml",delete_dsn = TRUE)Deleting source `data/MPSZ2014/MasterPlan2014SubzoneBoundary_WEB_PL.kml' using driver `KML'
Writing layer `MasterPlan2014SubzoneBoundary_WEB_PL' to data source
`data/MPSZ2014/MasterPlan2014SubzoneBoundary_WEB_PL.kml' using driver `KML'
Writing 323 features with 15 fields and geometry type Multi Polygon.
2.1.3 Master Plan 2019 Subzone Boundary
The below chunk of code is used to import Master Plan 2019 shapefile and also project it to the 3414 crs system:
mpsz19_shp <- st_read(dsn = "data/MPSZ2019",
layer = "MPSZ-2019") %>%
st_transform(crs = 3414)Reading layer `MPSZ-2019' from data source
`/Users/georgiaxng/georgiaxng/is415-handson/In-class_Ex/In-class_Ex02/data/MPSZ2019'
using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS: WGS 84
Refer to https://epsg.io/ for the crs code when you need to reproject, if the coordinates are in geographic coordinate system, it may be necessary to convert it to the projected coordinate system and vice versa. It will depend on the usecase so it is important to check it.
st_crs()can be used to check the crs currently used. eg.st_crs(mpsz19_shp)
Importing Master Plan 2019 in kml format:
mpsz19_kml <- st_read("data/MPSZ2019/MasterPlan2019SubzoneBoundaryNoSeaKML.kml")Reading layer `URA_MP19_SUBZONE_NO_SEA_PL' from data source
`/Users/georgiaxng/georgiaxng/is415-handson/In-class_Ex/In-class_Ex02/data/MPSZ2019/MasterPlan2019SubzoneBoundaryNoSeaKML.kml'
using driver `KML'
Simple feature collection with 332 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension: XY, XYZ
Bounding box: xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
z_range: zmin: 0 zmax: 0
Geodetic CRS: WGS 84
2.1.4 Population Data
The below code imports the population data.
popdata <- read_csv("data/respopagesextod2023/respopagesextod2023.csv")Aggregating the data and grouping them by area, subzone and age group.
popdata2023 <- popdata %>%
group_by(PA,SZ,AG) %>%
summarize(`POP`=sum(`Pop`))%>%
ungroup()%>%
pivot_wider(names_from = AG,
values_from = POP)
colnames(popdata2023) [1] "PA" "SZ" "0_to_4" "10_to_14" "15_to_19"
[6] "20_to_24" "25_to_29" "30_to_34" "35_to_39" "40_to_44"
[11] "45_to_49" "50_to_54" "55_to_59" "5_to_9" "60_to_64"
[16] "65_to_69" "70_to_74" "75_to_79" "80_to_84" "85_to_89"
[21] "90_and_Over"
popdata2023 <- popdata2023 %>%
mutate(YOUNG = rowSums(.[3:6])
+rowSums(.[14])) %>%
mutate(`ECONOMY ACTIVE` = rowSums(.[7:13])+
rowSums(.[15]))%>%
mutate(`AGED`=rowSums(.[16:21])) %>%
mutate(`TOTAL`=rowSums(.[3:21])) %>%
mutate(`DEPENDENCY` = (`YOUNG` + `AGED`)
/`ECONOMY ACTIVE`) %>%
select(`PA`, `SZ`, `YOUNG`,
`ECONOMY ACTIVE`, `AGED`,
`TOTAL`, `DEPENDENCY`)2.2 Joining popdata2023 and mpsz19_shp
popdata2023 <- popdata2023 %>% mutate_at(.vars = vars(PA,SZ), .funs = list(toupper))
toupperis used to convert all text to uppercases so that the data is uniform for comparison, filtering, or joining with other datasets.
mpsz_pop2023 <- left_join(mpsz19_shp, popdata2023, by = c("SUBZONE_N" = "SZ"))popdata2023_mpsz <- left_join(popdata2023, mpsz19_shp, by = c("SZ" = "SUBZONE_N"))