PaliGemma 2 Mix - New Instruction Vision Language Models by Google

Hey there, is from transformers import AutoProcessor, AutoModelForVision2Seq can be useed for all VLMs or do we have to go with from transformers import PaliGemmaProcessor, PaliGemmaForConditionalGeneration for PaliGemma and from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor for QwenVL ? My goal is to have a unified script for all VLMs on the Hub given a model_name as arg

Article author

Feb 20, 2025

Looking at transformers mapping I am pretty sure the Auto class would be able to pull PaliGemma and Qwen2_5 at the same time.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images