Recent Posts

Fine-tuned 3B Beats Haiku on Agent Task

4 minute read

A fine-tuned 3B model outperforms Claude Haiku on constrained generation. Full scaling curve from 0.5B to 72B shows where the quality cliff is.