Computer Science Applications articles list

Comparative analysis of machine learning classification models in predicting cardiovascular disease

For a long time, cardiovascular diseases have been the leading cause of death worldwide. Machine learning has found significant usage in the medical field as it can find patterns in data. Classification models can help cardiologists to diagnose heart diseases and minimize misdiagnosis accurately. In this paper, we explored a dataset related to heart disease and compared the accuracy of 43 machine learning classification models. The dataset for this research was downloaded from Kaggle; it contained 1190 observations, 11 features (age, sex, chest pain type, resting blood pressure, serum cholesterol, fasting blood sugar, resting electrocardiogram results, maximum heart rate achieved, exercise induced angina, oldpeak, the slope of the peak exercise ST segment) and a binary target variable (no heart disease or observed cardiovascular disease). For data exploration, preprocessing, training, testing, and predictor importance analysis, we used MATLAB R2004a software and the Classification Learner app included in this software. Before training machine learning classification models, we divided the dataset into a training set (90% of observations) and a test set (10% of observations). To prevent overfitting during the training of classification models, 10-fold cross-validation was used. The result showed that the best accuracy was reached with an optimized ensemble classification model (validation accuracy: 0.9262 and test accuracy: 0.9580). After calculating the permutation importance of each feature, we observed that the most important feature among all 11 features was the slope of the peak exercise ST segment.

Ladislav Végh

Evaluating optimizable machine learning models for anemia type prediction from complete blood count data

This paper compares different optimizable machine learning classification models to predict eight types of anemia from complete blood count (CBC) data. For the research, we used a publicly available Kaggle dataset containing 1281 observations, 14 predictors, and the diagnosis as the categorical target variable with nine categories (eight types of anemia and the healthy category). First, we examined the dataset and observed the histograms of some of the predictors. We compared the values of predictors of observations with no anemia to the observations where any anemia was diagnosed. Next, we used MATLAB R2024a to train and test nine optimizable machine-learning classification models. These models were Ensemble, Tree, SVM, Efficient Linear, Neural Network, Kernel, KNN, Naïve Bayes, and the Discriminant. Bayesian optimization was used to optimize the hyperparameters of all these models. We used 90% of observations for training and 10% of observations for testing. During the training, 10-fold cross-validation was used to prevent overfitting. The results showed the best accuracy was reached with the Ensemble classification model using the bag ensemble method (validation accuracy: 99.22%, test accuracy: 100%). Finally, we inspected our best classification model in more detail. We calculated the permutation feature importance to determine the contribution of each predictor to the final model. The results showed 6–7 important predictors, while the most important feature was the amount of hemoglobin.

Ladislav Végh

Two-stage rfid approach for localizing objects in smart homes based on gradient boosted decision trees with under- and over-sampling

Developing automated systems with a reasonable cost for long-term care for elders is a promising research direction. Such smart systems are based on realizing activities of daily living (ADLs) to enable aging in place while preserving the quality of life of all inhabitants in smart homes. One of the research directions is based on localizing items used by elders to monitor their activities with fine-grained details of the progress. In this paper, we shed the light on this issue by presenting an approach for localizing items in smart homes. The presented method is based on applying machine learning algorithms to Radio Frequency IDentification (RFID) tags readings. Our approach achieves the required task through two stages. The first stage detects in which room the selected object is located. Then, the second one determines the exact position of the selected object inside the detected room. Additionally, we present an efficient approach based on gradient boosted decision trees for detecting the location of the selected object in a real-world smart home. Moreover, we employ some techniques of over- and under-sampling with data clustering for improving the performance of the presented techniques. Many experiments are conducted in this work to evaluate the performance of the presented approach for localizing objects in a real smart home. The results of the experiments have shown that our approach provides remarkable performance.

Shadi Abudalfa

Improving machine learning classification models for anaemia type prediction by oversampling imbalanced complete blood count data with smote-based algorithms

Computer-assisted disease diagnosis is cost-effective and time-saving, increasing accuracy and reducing the need for an additional workforce in medical decision-making. In our prior research, we trained, tested, and compared the accuracies of nine optimizable classification models to diagnose and predict eight anaemia types from Complete Blood Count (CBC) data. This study aimed to improve these classification models by oversampling the original imbalanced dataset with four algorithms related to the Synthetic Minority Over-sampling Technique (SMOTE). The results showed that the validation accuracy increased from 99.22% (Ensemble model) to 99.57% (Tree model), and most importantly, the False Discovery Rate (FDR) for the anaemia type with the highest FDR decreased from 23.1% to 1.5%.

Ladislav Végh

Two-stage rfid approach for localizing objects in smart homes based on gradient boosted decision trees with under- and over-sampling

eveloping automated systems with a reasonable cost for long-term care for elders is a promising research direction. Such smart systems are based on realizing activities of daily living (ADLs) to enable aging in place while preserving the quality of life of all inhabitants in smart homes. One of the research directions is based on localizing items used by elders to monitor their activities with fine-grained details of the progress. In this paper, we shed the light on this issue by presenting an approach for localizing items in smart homes. The presented method is based on applying machine learning algorithms to Radio Frequency IDentification (RFID) tags readings. Our approach achieves the required task through two stages. The first stage detects in which room the selected object is located. Then, the second one determines the exact position of the selected object inside the detected room. Additionally, we present an efficient approach based on gradient boosted decision trees for detecting the location of the selected object in a real-world smart home. Moreover, we employ some techniques of over- and under- sampling with data clustering for improving the performance of the presented techniques. Many experiments are conducted in this work to evaluate the performance of the presented approach for localizing objects in a real smart home. The results of the experiments have shown that our approach provides remarkable performance.

Shadi Abudalfa

Arabic text formality modification: a review and future research directions

Formality transfer seeks to adjust text formality without altering its core meaning, which carries substantial implications across diverse domains like machine translation, dialogue systems, and social media content creation. This study provides an extensive overview of formality transfer specifically within Arabic text, an emerging domain within natural language processing. Particularly, we carried out a comprehensive review of literature on text formality transfer, focusing on studies published between July 2010 and April 2024. Our focus lies in treating formality transfer in Arabic as akin to a machine translation task, presenting synthesized insights. Despite advancements in formality transfer for English and other languages, Arabic’s distinct linguistic features present unique challenges and opportunities. Our investigation uncovers several research gaps necessitating future exploration, emphasizing persistent limitations. Moreover, we delve into text formality transfer as a promising avenue for forthcoming research initiatives in the realm of Arabic text processing.

Shadi Abudalfa

Tracking students' progress in introductory c programming courses through moodle tests with randomized questions

Assessing students' progress in introductory programming courses is crucial for identifying learning gaps and improving teaching methods. This study evaluates the effectiveness of Moodle-based tests with randomized questions in monitoring student progress in C programming courses at J. Selye University during the 2023/24 academic year. A series of ten tests were administered across two courses, covering essential programming topics such as data types, variables, conditional statements, loops, two-and three-dimensional arrays, recursion, and sorting algorithms. The results revealed significant variations in student performance, with recursion and the pretest/posttest loops presenting the greatest challenges. The correlation analysis of test scores showed strong relationships among related topics, confirming the structured progression of the curriculum. These findings suggest that Moodle-based assessments offer valuable insights into students' learning trajectories, enabling educators to adapt their instructional strategies accordingly. Such insights can help optimize introductory programming curricula, enhancing student engagement and understanding.

Ladislav Végh

‹ First  < 2 3 4