Estimation Of Missing Values In Air Pollution Dataset By Using Various Imputation Methods

Sukatis, Fahren Fazzer and Noor, Norazian Mohamed and Zakaria, Nur Afiqah and Ul-Saufie, Ahmad Zia and Annas, Suwardi (2019) Estimation Of Missing Values In Air Pollution Dataset By Using Various Imputation Methods. International Journal of Conservation Science, 10 (4). pp. 791-804. ISSN 2067-533X

[img] Text (Artikel Jurnal Internasional)
Artikel Jurnal Internasional - Estimation of Missing Values in AIr Pollution Datasets.pdf - Published Version

Download (940kB)
[img] Text (Peer Review Artikel Jurnal Internasional)
Peer Review Artikel Jurnal Internasional - Estimation of Missing Values in AIr Pollution Datasets.pdf - Published Version

Download (1MB)
[img] Text (Turnitin Artkel Jurnal Internasional)
Turnitin Artikel Jurnal Internasional - Estimation of missing values in AIr Pollution.pdf - Published Version

Download (3MB)
Official URL: http://ijcs.ro/public/IJCS-19-71_Sukatis.pdf

Abstract

The aim of this study is to determine the best imputation method to fill in the various gaps of missing values in air pollution dataset. Ten imputation methods such as Series Mean, Linear Interpolation, Mean Nearest Neighbour, Expectation Maximization, Markov Chain Monte Carlo, 12-hours Moving Average, 24-hours Moving Average, and Exponential Smoothing (α = 0.2, 0.5, and 0.8) were applied to fill in the missing values. Annual hourly monitoring data for ambient temperature, wind speed, humidity, SO2, NO2, O3, CO, and PM10 from Petaling Jaya and Shah Alam were used from 2012 to 2016. These datasets were simulated into three types of missing data patterns that vary in length gaps of missing patterns, i.e. simple, medium and complex patterns. Each patterns was simulated into two percentages of missing, i.e. 10% and 20%. The performance of these imputation methods was evaluated using four performance indicator: mean absolute error, root mean squared error, prediction accuracy, and index of agreement. Overall, the Expectation Maximization method was selected as the best method of imputation to fill in the simple, medium and complex patterns of simulated missing data, while the Series Mean method was shown as the worst method of imputation.

Item Type: Article
Uncontrolled Keywords: Air pollution; Estimation; Missing data; Imputation methods; Simulation; Performance indicators
Subjects: KARYA ILMIAH DOSEN
Universitas Negeri Makassar > KARYA ILMIAH DOSEN
Divisions: KOLEKSI KARYA ILMIAH UPT PERPUSTAKAAN UNM MENURUT FAKULTAS > KARYA ILMIAH DOSEN
KARYA ILMIAH DOSEN
Depositing User: S.T., M.T. Faruq Ratuhaji
Date Deposited: 30 Dec 2020 14:15
Last Modified: 31 Aug 2021 12:05
URI: http://eprints.unm.ac.id/id/eprint/18852

Actions (login required)

View Item View Item