BDH interpretability explorer

A small from-scratch language model whose neurons are sparse and often stand for a single concept. Poke at it below.

20 400
0.1 1.5