Optimize and fine-tune the neural network parameters (the code is
already written) to improve the frame-level speech recognition
classification using a neural network model based on input MFCC (Mel
Frequency Cepstral Coefficients) data. The task is to classify the
specific phonemes in the audio frames and eventually submit a prediction
result in .csv format to Kaggle, aiming to achieve an accuracy of 86-87%. Currently, my model achieves an accuracy of 84%. Must use MLP (Multilayer Perceptron) The total number of parameters up to 20 million.
The expected output/submission file should be similar to the attached submission(2).csv.
My code is HW1P2_F24_Starter_Notebook-Copy1(1).ipynb
The required data/corpus may be found at https://www.openslr.org/12.