BDH interpretability explorer

A small from-scratch language model whose neurons are sparse and often stand for a single concept. Poke at it below.

prompt

characters to add

20 400

temperature

0.1 1.5

generated