OpenAI is rolling out multimodal ChatGPT to all Plus users. This was first announced at the GPT-4 unvailing last year.
This allows Plus users to upload all kinds of data to ChatGPT to add context to chats. For example, I can upload a photo of a leaky faucet and even draw a circle around the part I want ChatGPT to focus on and ask, “how should I fix this?”
Kagi Assistant is already sort of multimodal in the fact that I can attach a YouTube video, webpage article, podcast recording, text file, PDF and more, but it doesn’t have the ability to understand non-textual images in the way ChatGPT does.
Assistant can implement this using the various open-source image-to-text-prediction models, like CLIP or Stable Diffusion.