Back to writing

Tag

interpretability

1 published post under this topic.

When AI Finds Emotion Concepts Inside a Model

A product-builder reading of model interpretability work: if internal concepts can be found and steered, the UI should stop pretending behavior is only prompt text.