Nepali Speaker Recognition
Speaker Identification is the process of identifying a person based on the audio of their spoken words. Previously, audio processing models were widely accepted, but convolution neural networks have lately demonstrated that they, too, may generate astounding outcomes. I’ve always wanted to be a model trainer in my mother language. As a result, I collected Nepali audio from YouTube of exactly 34 politicians, both male and female, speaking in diverse circumstances, taking care not to include noise in the audio, and the average audio length is approximately 5 minutes.
In this study, a siamese network with contractive loss was implemented, yielding high-quality results.
I have written a medium article in which I go into great detail about the procedures I took to solve this problem: