|
|
2020 » Papers » Volume 1 » EFFECT OF ENCODING CATEGORICAL DATA ON STUDENT'S ACADEMIC PERFORMANCE USING DATA MINING METHODS 1. EFFECT OF ENCODING CATEGORICAL DATA ON STUDENT'S ACADEMIC PERFORMANCE USING DATA MINING METHODS Authors: Jawthari Moohanad, Stoffova Veronika Volume 1 | DOI: 10.12753/2066-026X-20-068 | Pages: 521-526 | Download PDF | Abstract
Educational data mining (EDM) is the techniques used to discover the knowledge from student's data .it is used to improve the students' performance and teachers' performances as well.
Since Machine learning (ML) models deals with numeric data, preprocessing of the categorical data is a must step to transform such data into accepted types by ML models. Categorical data is further divided into nominal and ordinal attributes in the dataset. The used data set was collected by using a learner activity tracker tool, which called experience API (xAPI). The purposed was to monitor the behaviors of students to evaluate the features that may impact the student performance (Amrieh, Hamtini & Aljarah, 2015). The dataset includes 480 student records and 16 features. The features are eitrher numeric or catgorical.
In this paper, we study the effect of encoding some non-ordinal features as one-hot (dummy variables) on the students' performance prediction accuracy. We used techniques form ensemble methods such as Random Forest Trees, Boosting methods specifically namely gradient Boosted trees (GBT), and support vector machines. Also, we compared the performance of Random forest and Gradient boosted trees. We achieve a better result of 81% using random forest classifier. GBT has approximately same performance in all cases. SVM accuracy improved when used dummy variables. In addition, we show the importance of the behavioral features as they increase the model's accuracy. Random Forest outperformed other models considered in this study. At the end, the encoding methods as preprocessing steps can affect the performance accuracy of the models. | Keywords
E-learning, Educational Data Mining, Student performance, Support Vector machine, Random Forest |
|
|
|