Bilgilendirme: Kurulum ve veri kapsamındaki çalışmalar devam etmektedir. Göstereceğiniz anlayış için teşekkür ederiz.
 

Novel Tiny Textural Motif Pattern-Based RNA Virus Protein Sequence Classification Model

dc.contributor.author Erten, Mehmet
dc.contributor.author Aydemir, Emrah
dc.contributor.author Barua, Prabal Datta
dc.contributor.author Baygin, Mehmet
dc.contributor.author Dogan, Sengul
dc.contributor.author Tuncer, Turker
dc.contributor.author Acharya, U. Rajendra
dc.date.accessioned 2026-03-26T15:02:06Z
dc.date.available 2026-03-26T15:02:06Z
dc.date.issued 2024
dc.description Erten, Mehmet/0000-0002-6664-4568; Aydemir, Emrah/0000-0002-8380-7891; Hafeez-Baig, Abdul/0000-0003-3848-8008; Dogan, Sengul/0000-0001-9677-5684; en_US
dc.description.abstract Background: RNA viruses, including severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), are important human pathogens. Sequencing of the proteins produced by RNA viruses is essential for understanding disease pathogenesis and may have diagnostic and therapeutic implications. We aimed to develop an accurate and computationally efficient handcrafted feature engineering model for classifying the protein sequences of six pathogenic RNA viruses: SARS-CoV-2, influenza A, influenza B, influenza C, human respirovirus 3, and human immunodeficiency virus (HIV)-1. The first five cause primary respiratory infections; the last has some functional similarity with SARS-CoV-2, justifying the need for diagnostic differentiation. Materials and method: We downloaded 14,787 protein sequences belonging to the six categories in FASTA format from the open-source National Center for Biotechnology Information database and transformed the sequences into numeric arrays. First, the signal was divided into overlapping blocks representing three amino acids. Tiny textural motif pattern, a new histogram-based feature extractor, was then applied to extract textural features using simple signum, lower, and upper ternary functions. 512 features were extracted for each protein sequence and fed to an iterative neighborhood component analysis function to select a study dataset-specific optimal number (34) of the most discriminative features for downstream classification using a shallow k-nearest neighbor classifier with 10-fold cross-validation. Novelties: An efficient linear time complexity is introduced for data classification, providing a robust classification approach, especially for complex datasets. Notably, this approach extends beyond the traditional binary classification focus, successfully distinguishing up to six distinct classes. Furthermore, a novel handcrafted feature extraction method is developed, significantly enhancing data analysis and yielding more precise results. Results: The model attained 99.71% overall 6-class classification accuracy in a data subset and 99.85% for binary classification of SARS-CoV-2 vs. HIV-1, outperforming a similar published model. Conclusions: Our simple model accurately classified the protein sequences of six pathogenic RNA viruses and can potentially be implemented in diagnostic applications to improve RNA virus disease screening. en_US
dc.identifier.doi 10.1016/j.eswa.2023.122781
dc.identifier.issn 0957-4174
dc.identifier.issn 1873-6793
dc.identifier.scopus 2-s2.0-85178143929
dc.identifier.uri https://doi.org/10.1016/j.eswa.2023.122781
dc.identifier.uri https://hdl.handle.net/20.500.14901/3524
dc.language.iso en en_US
dc.publisher Pergamon-Elsevier Science Ltd en_US
dc.relation.ispartof Expert Systems with Applications en_US
dc.rights info:eu-repo/semantics/openAccess en_US
dc.subject Protein Sequence Classification en_US
dc.subject SARS-CoV-2 en_US
dc.subject Bioinformatics en_US
dc.title Novel Tiny Textural Motif Pattern-Based RNA Virus Protein Sequence Classification Model en_US
dc.type Article en_US
dspace.entity.type Publication
gdc.author.id Erten, Mehmet/0000-0002-6664-4568
gdc.author.id Aydemir, Emrah/0000-0002-8380-7891
gdc.author.id Hafeez-Baig, Abdul/0000-0003-3848-8008
gdc.author.id Dogan, Sengul/0000-0001-9677-5684
gdc.author.scopusid 57204756278
gdc.author.scopusid 57210571135
gdc.author.scopusid 36993665100
gdc.author.scopusid 55293658600
gdc.author.scopusid 25653093400
gdc.author.scopusid 37062172100
gdc.author.scopusid 24461808100
gdc.author.wosid Erten, Mehmet/W-8578-2018
gdc.author.wosid Aydemir, Emrah/Aav-6372-2021
gdc.author.wosid Dogan, Sengul/W-4854-2018
gdc.author.wosid Baygin, Mehmet/Aat-5720-2021
gdc.author.wosid Tuncer, Turker/W-4846-2018
gdc.author.wosid Tan, Ru San/Hji-5085-2023
gdc.author.wosid Acharya, Rajendra/E-3791-2010
gdc.description.department Erzurum Technical University en_US
gdc.description.departmenttemp [Erten, Mehmet] Fethi Sekin City Hosp, Lab Med Biochem, TR-23100 Elazig, Turkiye; [Aydemir, Emrah] Sakarya Univ, Coll Management, Dept Management Informat, Sakarya, Turkiye; [Barua, Prabal Datta] Univ Southern Queensland, Sch Business Informat Syst, Darling Hts, Australia; [Baygin, Mehmet] Erzurum Tech Univ, Fac Engn & Architecture, Dept Comp Engn, Erzurum, Turkiye; [Dogan, Sengul; Tuncer, Turker] Firat Univ, Coll Technol, Dept Digital Forens Engn, Elazig, Turkiye; [Tan, Ru-San] Natl Heart Ctr Singapore, Dept Cardiol, Singapore, Singapore; [Tan, Ru-San] Duke NUS Med Sch, Singapore, Singapore; [Hafeez-Baig, Abdul] Univ Southern Queensland, Sch Management & Enterprise, Toowoomba, Qld, Australia; [Acharya, U. Rajendra] Univ Southern Queensland, Sch Math Phys & Comp, Springfield, Australia; [Acharya, U. Rajendra] Kumamoto Univ, Int Res Org Adv Sci & Technol IROAST, Kumamoto 8608555, Japan en_US
gdc.description.publicationcategory Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı en_US
gdc.description.scopusquality N/A
gdc.description.volume 242 en_US
gdc.description.woscitationindex Science Citation Index Expanded
gdc.description.wosquality Q1
gdc.identifier.wos WOS:001132935100001
gdc.index.type Scopus
gdc.virtual.author Bayğın, Mehmet
relation.isAuthorOfPublication 131a2dd2-0bc0-4048-a02f-13336fbc84f6
relation.isAuthorOfPublication.latestForDiscovery 131a2dd2-0bc0-4048-a02f-13336fbc84f6

Files