Fine-tuned 3B Beats Haiku on Agent Task
A fine-tuned 3B model outperforms Claude Haiku on constrained generation. Full scaling curve from 0.5B to 72B shows where the quality cliff is.
A fine-tuned 3B model outperforms Claude Haiku on constrained generation. Full scaling curve from 0.5B to 72B shows where the quality cliff is.