Novel Tiny Textural Motif Pattern-Based RNA Virus Protein Sequence Classification Model
| dc.contributor.author | Erten, Mehmet | |
| dc.contributor.author | Aydemir, Emrah | |
| dc.contributor.author | Barua, Prabal Datta | |
| dc.contributor.author | Baygin, Mehmet | |
| dc.contributor.author | Dogan, Sengul | |
| dc.contributor.author | Tuncer, Turker | |
| dc.contributor.author | Acharya, U. Rajendra | |
| dc.date.accessioned | 2026-03-26T15:02:06Z | |
| dc.date.available | 2026-03-26T15:02:06Z | |
| dc.date.issued | 2024 | |
| dc.description | Erten, Mehmet/0000-0002-6664-4568; Aydemir, Emrah/0000-0002-8380-7891; Hafeez-Baig, Abdul/0000-0003-3848-8008; Dogan, Sengul/0000-0001-9677-5684; | en_US |
| dc.description.abstract | Background: RNA viruses, including severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), are important human pathogens. Sequencing of the proteins produced by RNA viruses is essential for understanding disease pathogenesis and may have diagnostic and therapeutic implications. We aimed to develop an accurate and computationally efficient handcrafted feature engineering model for classifying the protein sequences of six pathogenic RNA viruses: SARS-CoV-2, influenza A, influenza B, influenza C, human respirovirus 3, and human immunodeficiency virus (HIV)-1. The first five cause primary respiratory infections; the last has some functional similarity with SARS-CoV-2, justifying the need for diagnostic differentiation. Materials and method: We downloaded 14,787 protein sequences belonging to the six categories in FASTA format from the open-source National Center for Biotechnology Information database and transformed the sequences into numeric arrays. First, the signal was divided into overlapping blocks representing three amino acids. Tiny textural motif pattern, a new histogram-based feature extractor, was then applied to extract textural features using simple signum, lower, and upper ternary functions. 512 features were extracted for each protein sequence and fed to an iterative neighborhood component analysis function to select a study dataset-specific optimal number (34) of the most discriminative features for downstream classification using a shallow k-nearest neighbor classifier with 10-fold cross-validation. Novelties: An efficient linear time complexity is introduced for data classification, providing a robust classification approach, especially for complex datasets. Notably, this approach extends beyond the traditional binary classification focus, successfully distinguishing up to six distinct classes. Furthermore, a novel handcrafted feature extraction method is developed, significantly enhancing data analysis and yielding more precise results. Results: The model attained 99.71% overall 6-class classification accuracy in a data subset and 99.85% for binary classification of SARS-CoV-2 vs. HIV-1, outperforming a similar published model. Conclusions: Our simple model accurately classified the protein sequences of six pathogenic RNA viruses and can potentially be implemented in diagnostic applications to improve RNA virus disease screening. | en_US |
| dc.identifier.doi | 10.1016/j.eswa.2023.122781 | |
| dc.identifier.issn | 0957-4174 | |
| dc.identifier.issn | 1873-6793 | |
| dc.identifier.scopus | 2-s2.0-85178143929 | |
| dc.identifier.uri | https://doi.org/10.1016/j.eswa.2023.122781 | |
| dc.identifier.uri | https://hdl.handle.net/20.500.14901/3524 | |
| dc.language.iso | en | en_US |
| dc.publisher | Pergamon-Elsevier Science Ltd | en_US |
| dc.relation.ispartof | Expert Systems with Applications | en_US |
| dc.rights | info:eu-repo/semantics/openAccess | en_US |
| dc.subject | Protein Sequence Classification | en_US |
| dc.subject | SARS-CoV-2 | en_US |
| dc.subject | Bioinformatics | en_US |
| dc.title | Novel Tiny Textural Motif Pattern-Based RNA Virus Protein Sequence Classification Model | en_US |
| dc.type | Article | en_US |
| dspace.entity.type | Publication | |
| gdc.author.id | Erten, Mehmet/0000-0002-6664-4568 | |
| gdc.author.id | Aydemir, Emrah/0000-0002-8380-7891 | |
| gdc.author.id | Hafeez-Baig, Abdul/0000-0003-3848-8008 | |
| gdc.author.id | Dogan, Sengul/0000-0001-9677-5684 | |
| gdc.author.scopusid | 57204756278 | |
| gdc.author.scopusid | 57210571135 | |
| gdc.author.scopusid | 36993665100 | |
| gdc.author.scopusid | 55293658600 | |
| gdc.author.scopusid | 25653093400 | |
| gdc.author.scopusid | 37062172100 | |
| gdc.author.scopusid | 24461808100 | |
| gdc.author.wosid | Erten, Mehmet/W-8578-2018 | |
| gdc.author.wosid | Aydemir, Emrah/Aav-6372-2021 | |
| gdc.author.wosid | Dogan, Sengul/W-4854-2018 | |
| gdc.author.wosid | Baygin, Mehmet/Aat-5720-2021 | |
| gdc.author.wosid | Tuncer, Turker/W-4846-2018 | |
| gdc.author.wosid | Tan, Ru San/Hji-5085-2023 | |
| gdc.author.wosid | Acharya, Rajendra/E-3791-2010 | |
| gdc.description.department | Erzurum Technical University | en_US |
| gdc.description.departmenttemp | [Erten, Mehmet] Fethi Sekin City Hosp, Lab Med Biochem, TR-23100 Elazig, Turkiye; [Aydemir, Emrah] Sakarya Univ, Coll Management, Dept Management Informat, Sakarya, Turkiye; [Barua, Prabal Datta] Univ Southern Queensland, Sch Business Informat Syst, Darling Hts, Australia; [Baygin, Mehmet] Erzurum Tech Univ, Fac Engn & Architecture, Dept Comp Engn, Erzurum, Turkiye; [Dogan, Sengul; Tuncer, Turker] Firat Univ, Coll Technol, Dept Digital Forens Engn, Elazig, Turkiye; [Tan, Ru-San] Natl Heart Ctr Singapore, Dept Cardiol, Singapore, Singapore; [Tan, Ru-San] Duke NUS Med Sch, Singapore, Singapore; [Hafeez-Baig, Abdul] Univ Southern Queensland, Sch Management & Enterprise, Toowoomba, Qld, Australia; [Acharya, U. Rajendra] Univ Southern Queensland, Sch Math Phys & Comp, Springfield, Australia; [Acharya, U. Rajendra] Kumamoto Univ, Int Res Org Adv Sci & Technol IROAST, Kumamoto 8608555, Japan | en_US |
| gdc.description.publicationcategory | Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı | en_US |
| gdc.description.scopusquality | N/A | |
| gdc.description.volume | 242 | en_US |
| gdc.description.woscitationindex | Science Citation Index Expanded | |
| gdc.description.wosquality | Q1 | |
| gdc.identifier.wos | WOS:001132935100001 | |
| gdc.index.type | Scopus | |
| gdc.virtual.author | Bayğın, Mehmet | |
| relation.isAuthorOfPublication | 131a2dd2-0bc0-4048-a02f-13336fbc84f6 | |
| relation.isAuthorOfPublication.latestForDiscovery | 131a2dd2-0bc0-4048-a02f-13336fbc84f6 |
