Feature selection and extraction for class prediction in dysphonia measures analysis:A case study on Parkinson’s disease speech rehabilitation Article


  • BACKGROUND: Speech disorders such as dysphonia and dysarthria represent an early and common manifestation of Parkinson’s disease. Class prediction is an essential task in automatic speech treatment, particularly in the Parkinson’s disease case. Many classification experiments have been performed which focus on the automatic detection of Parkinson’s disease patients from healthy speakers but results are still not optimistic. A major problem in accomplishing this task is high dimensionality of speech data. OBJECTIVE: In this work, the potential of Principal Component Analysis (PCA) based modeling in dimensionality reduction is taken into consideration as the data smoothening tool with multiclass target expression data. METHODS: On the basis of suggested PCA-based modeling, the power of class prediction using logistic regression (LR) and C5.0 in numeric data is investigated in publicly available Parkinson’s disease dataset Silverman voice treatment (LSVT) to develop an advanced classification model. RESULTS: The main advantage of our model is the effective reduction of the number of factors from p= 309 to k= 32 for LSVT Voice Rehabilitation dataset, with a fine classification accuracy of 100% and 99.92% for PCA-LR and PCA-C5.0 respectively. In addition, using only 9 dysphonia features, classification accuracy was (99.20%) and (99.11%) for PCA-LR, and PCA-C5.0 respectively. CONCLUSIONS: Our combined dimension reduction and data smoothening approaches have significant potential to minimize the number of features and increase the classification accuracy and then automatically classify subjects into Parkinson’s disease patients or healthy speakers.


published in

number of pages

  • 15

start page

  • 693

end page

  • 708


  • 25


  • 4