Fine-Tune Whisper For Multilingual ASR with 🤗 Transformers

Models mentioned in this article 9

Community

Thanks for this, it's really awesome. Would it be possible to fine-tune this model to listen for a particular sound (like a frog call)? I have done this with the wav2vec model and had fairly good results but always looking to improve.

Cheers,

Liam liam.bolitho@gmail.com

Hey!

Did you figure it out? It seems quite interesting!

This comment has been hidden

I using this code find ASR improved, but LID is deceased. I want to fintune ASR and LID at the same time. How to do it?

why do you need LID? Isn't that a different task? if your dataset is multilingual you can set language="auto"🤔

This comment has been hidden (marked as Off-Topic)

hey i am new to this fine tuning can you tell me how to prepare a dataset another lanaguage lets say Persian

just change the every hi in above code to fa. there are other datasets like fleurs too that you can use for farsi

Can't load tokenizer for 'amanjain96/whisper-small-hi'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'amanjain96/whisper-small-hi' is the correct path to a directory containing all relevant files for a WhisperTokenizer tokenizer.

I used the above steps to fine tune it. When I try to use the model, it give the above error.

https://www.kaggle.com/code/amanjain114/notebook8f89392d9d

getting the same error, did you manage to solve this?

This comment has been hidden

excelente explicação! adorei a postagem tem me ajudado muito ; sou de Brazil

I get error: Dataset scripts are no longer supported, but found common_voice_11_0.py

what's your suggestion to solve that. I'm a beginner and don't yet know my way around. Your assistance is highly appreciated.

This is the version problem. You can just downgrade your dataset, it will work.

Install the dataset with %pip install datasets==3.6.0

When I train, I get the following error: RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

hi, have you solve the problem, i'm encounting the same issue.

Few questions:

can we do peft using lora ?
If I want to fine-tune on multiple datasets , do i need to fine-tune seperately and then concate the weights or do i need to concatenate the datasets and then shuffle ?

to concat data you can use "from datasets concatenate_datasets"

If I want to use local data for training, how should I proceed with the operation?

My val_wer never goes below 40 while training with the same data even on same setting for A10 GPU. Anyone facing the some?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images