Campus, Varian 355
Zoom info: https://stanford.zoom.us/j/92249714551?pwd=baDhMAb6iCoUW8UjAOB2ag4CbrqR…
The newest large-language reasoning models are for the first time powerful enough to perform mathematical reasoning in theoretical physics at graduate level. In the mathematics community, data sets such as FrontierMath are being used to drive progress and evaluate models, but theoretical physics has so far received less attention. In this talk I will present our dataset TPBench (arxiv:2502.15815, tpbench.