It would be nice to have a way to configure sampling methods like Top P, Top K, and temperature for LLMs for custom assistants.
Model performance can vary based on these sampler settings and the requested task, e.g. a temperature like 0.7 can lead to improved performance compared to a temperature of 1. It will allow the user to get more out of the assistant.