Agreed. Longer inputs with examples enables better results, as does being able to toggle temperature and top p. Depending on your use cases and model, certain models perform better or worse with few shot examples vs structured inputs vs sampler settings.
I understand limiting max output tokens for cost reasons, but at least exposing these and allowing different assistants to be created under different settings fits exactly with what made Kagi the only platform that managed to get me to move off Google.
There’s often misconceptions that temp=0, for example, is the best way to use LLMs for code. But well structured and longer inputs, and higher entropy constrained with higher diversity (top p) is really important for getting good outputs, and no two models have the exact same optimal ones (though we know between 0.5 and 1 temp with a .95 top p tend to work best with thinking models). Often higher with non thinking.
I understand limiting this on the main assistant screens to defaults, but making this like a baby would make me recommend this more to people who still use perplexity.