Tag
1 published post under this topic.
A product-builder reading of model interpretability work: if internal concepts can be found and steered, the UI should stop pretending behavior is only prompt text.