What model were you using?
If you are using Ki (our agent assistant model in beta) it should be capable of doing it. For "regular" models (eg. kagi assistant with Claude 3.7) it's not really possible because of the heterogeneity in how the zoo of models handles those edge cases.
Even for an image input (eg. dragging and dropping an image into the input box) for many of those models (that aren't multimodal vision + text like mistral-small) we have to ship off the work to a sub-model and report the image back as a text description of the image.
If you're an ultimate subscriber you can use Ki (in beta, launched very soon) by clicking this link