This paper explores the importance of preconceptual meaning in speech and music, stressing the role of affective vocalizations as a common ancestral instrument in communicative interactions. Speech and music are sensory rich stimuli, both at the level of production and perception, which involve different body channels, mainly the face and the voice. However, this bimodal approach has been challenged as being too restrictive. A broader conception argues for an action-oriented embodied approach that stresses the reciprocity between multisensory processing and articulatory-motor routines. There is, however, a distinction between language and music, with the latter being largely unable to function referentially. Contrary to the centrifugal tendency of language to direct the attention of the receiver away from the text or speech proper, music is centripetal in directing the listener's attention to the auditory material itself. Sound, therefore, can be considered as the meeting point between speech and music and the question can be raised as to the shared components between the interpretation of sound in the domain of speech and music. In order to answer these questions, this paper elaborates on the following topics: (i) The relationship between speech and music with a special focus on early vocalizations in humans and non-human primates; (ii) the transition from sound to meaning in speech and music; (iii) the role of emotion and affect in early sound processing; (iv) vocalizations and nonverbal affect burst in communicative sound comprehension; and (v) the acoustic features of affective sound with a special emphasis on temporal and spectrographic cues as parts of speech prosody and musical expressiveness.