North Norwegian has a contrast between /s/ and /ʂ/ that is neutralized in word-initial position before a consonant, and an optional process of Expressive Sibilant Retraction (ESR), which changes /s/ to [ʂ] in precisely the environment where the contrast is neutralized ( Broch 1927). ESR appears ambiguous between a word formation process and a spoken gesture ( Okrent 2002; Perlman et al. 2015). On the one hand, ESR exploits givens of phonological structure. On the other, treating it as a morphological process entails claiming that the spell-out of certain (“expressive”) morphemes may take place after phonological processes have applied, or that the realization of these morphemes takes precedence to phonological constraints. I argue that ESR is a communicative (i.e. non-linguistic, or post-linguistic) spoken gesture that nonetheless exploits the suspension of phonological generalizations in a way that directs attention to its iconic function. I describe the varied interpretations that ESR has depending on whether it indexes an action/event, object, or state/property, and propose that these share a common semantic core. This gesture-based account of ESR is offered as a possible model for “expressive phonology” (e.g. Diffloth 1979) in other languages.