GPT-4 Vision is an amazing model, and if you know its capabilities and limits, you can use it to unlock new applications and new user experiences.
I found CogVLM to be the most performant multimodal model out of the existing ones.
Also, did you get the chance to try Apple's recently open-sourced Ferret? https://github.com/apple/ml-ferret
I haven't played around with Ferret. Is it good?
I found CogVLM to be the most performant multimodal model out of the existing ones.
Also, did you get the chance to try Apple's recently open-sourced Ferret? https://github.com/apple/ml-ferret
I haven't played around with Ferret. Is it good?