Project Proposal

This is our group’s project proposal for the topic on Geographically Weighted Regression

Regressors https://g5-is415-proj.netlify.app/ (Singapore Management University)https://www.smu.edu.sg
2021-11-19

Project Motivation

COVID-19 has become an indisputable part of our daily life ever since the virus spread to the majority of the world. Some countries have been able to keep the situation under control, however, there are some that suffered more devastating effects from it. India and Indonesia are 2 countries in Asia that have the highest COVID related mortality and positive rates (Worldometers, n.d.) for COVID-19 cases, especially so in the main capital of the respective countries, likely due to certain underlying common factors within the countries.

Researchers claims that Indonesia’s capital, Jakarta, could have as many as 4.7 million people who are possibly infected by the virus in March 2021 (Sood, 2021). This is alarming as this number constitutes to “nearly half” of Jakarta’s population.

Project Objective

Our research would focus on identifying the factors that may have played a part in Jakarta’s high infection rate, how much each factor contributes to the rates itself and how much of the infection rate can be explained by these factors, before applying the same model onto other countries to verify our observations.

Datasets

Indonesia Monthly Covid Data

Description: The data contain COVID-19 measures at sub-district level.

File Format: Excel

Variables: ID_KEL, Nama_provinsi, nama_kota, nama_kecamatan, nama_kelurahan, POSITIF, Meninggal

Source: https://riwayat-file-covid-19-dki-jakarta-jakartagis.hub.arcgis.com/

Indonesia Population Density / Counts 2020

Description: The datasets contain population density / total number of people in Indonesia per grid-cell, at resolution of 30 arc (approximately 1km).

File Format: TIF

Source: https://www.worldpop.org/

Indonesia Birth 2015

Description: The data contain numbers of live births per grid square in Indonesia, at resolution of 30 arc (approximately 1km).

File Format: TIF

Source: https://www.worldpop.org/

Batas Desa Provinsi DKI Jakarta

Description: This is the geospatial data of DKI Jarkarta.

File Format: Shapefile

Variables: OBJECT_ID, KODE_DESA, DESA, KODE, PROVINSI, KAB_KOTA, KECAMATAN. DESA_KELUR, JUMLAH_PEN

Source: https://www.indonesia-geospasial.com/

Jakarta Railway

Description: The data contain railways and stations in Jakarta.

File Format: Shapefile

Variables: name, railway

Source: https://data.humdata.org/

Jakarta Education

Description: The data contain kindergartens, schools, colleges, and universities in Jakarta.

File Format: Shapefile

Variables: name, amenity, addrcity

Source: https://data.humdata.org/

Jakarta Airport

Description: The data contain gates, helipads, and terminals in Jakarta.

File Format: Shapefile

Variables: name, aeroway

Source: https://data.humdata.org/

Jakarta Points of Interests

Description: The data contain amenities, shops, and tourist attractions in Jakarta.

File Format: Shapefile

Variables: name, amenity, shop, tourism

Source: https://data.humdata.org/

Jakarta Health Facilities

Description: The data contains healthcare facilities in Jakarta.

File Format: Shapefile

Variables: Name, Address, Type, Class, District, Province, Lat, Long, Hospital

Source: https://data.humdata.org/

Literature Review

The focus of our analysis will be on infection rates and we will draw similar methodology from the papers researching on death rates.

1. Geographically weighted regression (GWR) analysis on the death incidence by COVID-19 in São Paulo, Brazil

Objective

To gain an understanding of how socio-spatial behaviour causes COVID-19 transmission in the most impacted area in Brazil.

Methodology

Learning Point

Areas for Improvement

2. Geographically varying relationships of COVID-19 mortality with different factors in India

Objective

To understand the relationship geographically for how different driving factors affect COVID-19 deaths.

Methodology

Learning Point

Areas for Improvement

3. The effect of sociodemographic factors on COVID-19 incidence of 342 cities in China: a geographically weighted regression model analysis

Objective

To gain an understanding of how socio-demographic factors causes COVID-19 transmission in 342 cities in China in a geographic perspective.

Methodology

Learning Point

Areas for Improvement

Methodology

1. Data Preparation

2. Exploratory Data Analysis (EDA)

3. Geographically Weighted Regression Models (GWR)

Storyboard

1. EDA and DataTable

2. GWR

3. Prediction

References