There is the AI Benchmark by Kagi that is also elaborating (at least a bit) on what the different models are more capable of. It at least shows accuracy and general vs. reasoning llms.
But I think extending that information e.g. for coding or other purposes would help.