Covert speech is accompanied by a subjective multisensory experience with auditory and kinaesthetic components. An influential hypothesis states that these sensory percepts result from a simulation of the corresponding motor action that relies on the same internal models recruited for the control of overt speech. This simulationist view raises the question of how it is possible to imagine speech without executing it. In this perspective, we discuss the possible role(s) played by motor inhibition during covert speech production. We suggest that considering covert speech as an inhibited form of overt speech maps naturally to the purported progressive internalisation of overt speech during childhood. We further argue that the role of motor inhibition may differ widely across different forms of covert speech (e.g., condensed vs. expanded covert speech) and that considering this variety helps reconciling seemingly contradictory findings from the neuroimaging literature.