Finding Your Voice: What is "voicing", and how can we use it in sociolinguistic research?
- Derrek Powell
- Dec 13, 2024
- 12 min read
Updated: Dec 19, 2024
When you think of a "voice", what comes to mind? Most of us might immediately think of that unique acoustic resonance produced by vocal fold vibration. Others might view the voice more interactionally as the way we share our thoughts, desires, worries, etc. The symbolism of the voice and the perils of being voiceless have a long history, particularly in literature and critical social theory. Some personal favorites of mine include Disney's (re)adaptation of Hans Christian Andersen's The Little Mermaid and Sylvia Wynter's Unsettling the Coloniality of Being/Power/Truth/Freedom. But my goal here is not to review the lineage of voice symbolism or closely examine its use in any particular story. Instead, we'll explore the voice as an analytical framework and its potential to guide sociolinguistic research from a discursive perspective.
Have you ever changed your voice or the way you talk when imitating the speech of someone else? Of course you have! There is an entire industry built around speakers who are experts at adopting another voice for entertainment purposes. If we start here, we can think of voicing as "giving a voice" to different characters, persons, stances, situations, etc., by stylizing our linguistic output in variable ways. Maybe we change the pitch of our voice, or we use certain words we wouldn't typically use ourselves. When we adopt or represent another voice, we are doing what experts call "voicing contrast entextualization," because we are contrasting another voice against our own within a linguistic text, whether it be spoken, written, signed, or sung.
I believe this concept is more digestible when using examples of reported speech. Let's map out voicing contrasts diagrammatically using reggaetón lyrics. The outer box labeled "representing voice" is the voice of the person physically producing the utterance. The inner box labeled "represented voice" is the voice of a past speech act brought into the present through entextualization. In (1) below, Bad Bunny reports the message of his Titi, representing her voice with his own utterance. The voicing contrast, therefore, is between Bad Bunny's voice (orange) and that of his Titi (blue).

These contrasts, which occur in everyday talk, have a gradient transparency, meaning the ease with which we can recognize the presence of different voices depends on the linguistic resources used to create the text. In his outline of enregisterment —the process by which we typify linguistic behavior— the British linguistic anthropologist Asif Agha conceptualized this transparency as the co-occurrence of several linguistic cues. As a general rule, transparency is greatest when all four cues are specified and lessens with the absence of one or more cue(s).

Figure 1: Transparency of voicing contrasts adapted from Agha (2005). Specification need not occur as outlined here.
The first cue is the use of some variable aspect of a linguistic signal at any level of linguistic organization. This can be the use of different sounds or voice qualities, pronunciations of words, the actual words themselves, sentence structure, or discourse markers. If someone were to voice a "valley-girl", they might make stylistic choices regarding the use of vocal fry, quotative "like," or iconic expressions like "as if", or "that's hot" (looking at you Paris 👀). The inherent variability of language offers infinite combinations that speakers and listeners can orient towards in distinguishing and typifying voices. Importantly, this first cue is a universal requirement, as voices cannot be contrasted against one another if there is no notable difference in textual output.
The second cue is the use of syntax to introduce a segment specifically as the languaging of someone else. In (1) above, the voiced material "si tengo muchas novias" is introduced with a [reporting verb + subordination] syntactic frame using "preguntar" with the conjunction "si". The expression "X be like" is a very popular, contemporary example. If you were to say "Gamers be like: 'One more level then I'll go to bed'", you would be using syntax to contrast your own voice against that of "gamers". When we use syntax to frame voiced material, it helps our conversation partners clearly identify the presence of a contrast.
The third cue deals with deixis, which is an area of semantics that focuses on words whose literal meanings are relative to an established reference point called an "indexical origo". Deictic expressions include personal pronouns, demonstrative adjectives, locative adverbs, etc. For example, when I use the word "I", I am referring to myself. But when you use this word, you are not generally referring to me, but to yourself. This is because in languages like English and Spanish, the origo is often the speaker. Deictic expressions are important for triggering evaluations of sameness or difference of participants, places, and time in reported speech. Let's use another reggaetón example to illustrate this. In (2) below, Ivy Queen is reporting a message spoken to her by an unnamed male participant, contrasting her own voice (orange) against his (blue).

In this example, we see what is called a "deictic incongruency" between the origo of the representing and represented voices which differs in terms of person. This means the "yo" in line a and the "yo" in line b do not refer to the same person, which implies two different speech centers: One where the representing voice is the origo, and another where the origo is the original author of the message being represented. The de-centering of the representing voice as the origo of the represented material emphasizes the voicing contrast. This is because the deictic incongruency forces an interpretation of the represented material as belonging to a voice other than the one making the report. This cue is key to distinguishing voiced material that has been directly reproduced (i.e., quoted) like in (2) and that which is indirectly represented (i.e., paraphrased) like in (1) earlier.
The final cue is the explicit naming of voicing participants. Naming —either by use of a proper name or a personal pronoun— binds voiced material to the voice that has been named. In (1), Bad Bunny names his Titi. It is the co-occurrence of the syntactic frame [reporting verb + subordination] and the naming of Titi that renders the voiced material "si tengo muchas novias" uninterpretable as Bad Bunny's own message. Similarly in (2), it is the co-occurence of the reporting verb "jurar", the naming of the represented voice with the second-person "tú", and the deictic incongruency between segments that make the voicing contrast and voicing participants legible. If we were to compare these examples in terms of transparency, (2) features the most visible contrast as all four cues are specified (variable output, syntactic frame, deictic incongruency, participant naming). (1) would still features a highly visible contrast, though Agha suggests it is less transparent because the voiced material is paraphrased rather than directly reproduced.
While reports of speech are helpful for conceptualizing voicing contrasts, real-time language use is far more complicated and variable. Several voices may be present and overlapping in any utterance, occurring without a syntactic frame, deictic incongruency, or any explicit mention of named participants. Our ability to recognize when these less-transparent contrasts take place depends on our socialization and cultural knowledge surrounding which linguistic signs warrant attention under differing conditions, as well as the voices associated with those signs. While we cannot capture in writing how in (2) Ivy Queen drops the pitch of her voice when producing the voiced material, it is nevertheless the combination of the lexical, syntactic, semantic, and phonetic form of the utterance that marks the voicing contrast. Whether or not this stylistic move is recognized by listeners depends on whether they have been socializied to associate low-pitched voices with caricatures of men. In other words, the transparency of a voicing contrast hinges not only on the form of its entextualization, but also on listeners’ knowledge of the specific signs used to signal oppositional identities and their meanings. Let's look at another example.

In (3) above, Villano Antillano voices the desires of participants named "muchos" and "la gente" in lines a and c. But there is no syntactic frame encapsulating the voiced material like there is in line b (where she represents her own voice), and the indexical oirgo remains consistent throughout the entire segment. Of course, any speaker of Spanish will be able to identify the contrasts between Villana's voice and those of the interactants. However, if the listener does not know this example is a reproduction of an Ivy Queen song by the same name released in 1997, they will be unaware that Villana is re-voicing a display originally authored and animated with Ivy Queen's voice (orange). The outer dotted box (purple) thereby represents another representation of an existing entextualized contrast, therein allowing Villana to temporarily adorn Ivy Queen's voice as her own.
Therefore, recognizing and responding to voicing contrasts depends not only on the linguistic form or listeners' knowledge of the metapragmatic meanings of various signs: It intimately depends on language users histories with voice typification and the various voices already populating their social realities that are available for construal when presented with a contrast. At times, the listeners' recognition of a voice may be in line with the contrast intended by the speaker, while at others they may be at complete odds with one another. It is also possible for listeners to hear a voice that was not intentionally entextualized by the speakers!
Taking all of this into consideration, we might find ourselves asking probing questions about how certain voices become nationally, transnationally, or even globally recognizable as identitarian facts. Agha's research on enregisterment is a great place to start, but that's not the purpose of this particular post. Here, I am musing about the ways voicing can help orient sociolinguistic research on style.
When sociolinguists talk about style, they are ultimately posing the question "what does it mean to speak like X?". This question can be approached using a wide array of methods and data, focusing on sound patterns, morphosyntax, discourse structure, or any other variable aspect of language. To place style in conversation with voicing, we might theorize style as the iconographic means through which we use language to represent different versions of our own voice or to contrast our voice against those of others. For example, I do not use the same voice when adorning the image of a knowledgeable lecturer in front of a class of undergrads as I do when projecting a comical character when reminiscing about fond memories with my best friends. If we adopt a theory of style as a creative presentation of a bricolage of voices —many of which we claim as our own, others from which we wish to create distance— we recruit a view of language use that is more fluid and agentive than has been used in the past.
But how do we use voicing as a method? For us to accomplish this, there are certain elements we must pay attention to that I have outlined below. I discuss each step using an example from my own research here: the figure of the Caribbean Diva.
Step | Rationale |
1 | Identify the voicing figure by solving for X in the question: "What does it mean to speak like X?". |
2 | Identify the kind of linguistic data you want to work with, ideally one where this figure is already typified or is highly likely to appear. This can be thought of as a fishing expedition or a form of ethnography. |
3 | Inquire about the qualities, stances, settings, activities, and person-types associated with this characterological figure. This will help you gain an understanding of what this figure is/represents, which will help deepen your interpretation of your results. |
4 | Identify the type of linguistic sign you will focus on (phonetic/phonological variables, lexemes, morphosyntax, discourse markers, etc.). Make sure to avoid essentialist and absolutist claims, as any linguistic sign can take on any meaning or function language users can think of, which at times can be at significant odds with what is established in the literature. Moreover, hold space for the fact that more than one sign is contributing to the entextualization of these characterological figures. |
The first step is to solve for X in the question "What does it mean to speak like X?". This will give you a clear idea about what exactly you are looking for. The second step is to identify the types of discursive artifacts where this figure is already typified or likely to be voiced. These can include face-to-face interactions, public utterances, etiquette manuals, television and film, cartoons, music, literary works; really any case where language is being used. Identifying both the figure you are interested in studying and where this figure may make an appearance can help tailor your methodology during the data collection and codification steps. So, if I am interested in studying the linguistic stylization of a 'Caribbean Diva', I will want to look to sites where a Diva persona is notably present. Reggaetón is an ideal site for this, as this character quintessentially appears in the performances of some of the most widely known performers like Ivy Queen (who often calls herself La Diva and released a 2004 album titled as such).
Once you have identified the figure you are interested in and the type of data you want to work with, you will need to adopt a qualitative, ethnographic approach before you conduct any formal linguistic analysis. This is because speakers often do not voice these figures directly: they use them as situated emblems of personhood influenced by their associated qualities. Ivy Queen does not project herself as a Caribbean Diva because this figure is in itself inherently anything: she does so because of what the Diva embodies and represents (authority, agency, sensuality, empowerment, etc.). If you take the time to emphasize the qualities indexed by the voicing figure, you begin to tap into the ideological connections language users make between linguistic form, persona figures, and presentations of the self. This is imperative to potentially understanding "why" speakers choose the stylizing moves they do. Also, this can open your analysis up to the possibility of identifying the connections and intersections this figure has to other voices. In my own work, I have found significant overlap between the enactment of the Caribbean Diva figure and a male-centered figure that I am tentatively calling "the Big Boss". This would not have been possible if I had not taken the time to consider the ideological and identity work underscoring the presentation of the Diva.
Once you have identified your figure and data source and taken the time to familiarize yourself with the qualities, stances, activities, situations, spaces, etc. associated with that figure, you will need to make some executive decisions surrounding the linguistic signs you want to study. Here, I think it's worth mentioning that sometimes it is the variable linguistic behavior that catches our attention before we can name the figure we wish to study. I personally was more drawn to Puerto Rican /s/ reduction and /ɾ/ lateralization well before I was able to articulate that I wanted to study the linguistic enactment of the Caribbean Diva. It was through my research on /s/ and /ɾ/ that I realized the Diva figure uses a distinct /ɾ/ realization that does not appear elsewhere in Ivy Queen's performances. Whether you are first drawn to the voicing figure or to the linguistic behavior is irrelevant, as both roads lead to the same destination (unless you fall prey to the second-wave variationist trap of simply quantifying variable occurrence across a data set).
Regardless the starting point, you must hold space for the reality that these discursive figures are voiced using a simultaneous combination of multiple signs at various levels of linguistic organization. For this reason, I personally recommend orienting first towards the figure(s) and then inquiring about which linguistic behaviors accompany that figure, as this approach allows for more nuance and offers a more holistic view of linguistic behavior as a practice rather than the patterning of any particular linguistic sign.
Having identified your figure, data set, associated qualities, and linguistic sign(s) used in its entextualization, you can start to use the wide range of already available sociolinguistic and applied linguistic methods to study voicing as a form of linguistic praxis. Personally, I prefer a multi-disciplinary approach pairing variationist statistical modeling with qualitative content analyses. This helps me not only identify the contexts in which the Diva figure is being invoked, but also document the linguistic output used in its entextualization. By using both qualitative and quantitative methods, you offer an analysis that can account for some of the shortcomings of only adopting one approach. Furthermore, this kind of analysis can make your research more appealing to a wide range of scholars beyond those with whom you are already in conversation in your field.
I believe the voice offers a unique perspective on both sociolinguistic variation and style that considers not only the agency of the speakers themselves, but the shared cultural knowledge of speakers and listeners organized by patterns of socialization. Moreover, as voicing can be literally anything (e.g., emotional language, honorifics, artistic performance speech, disidentifications and other forms of subjectivity formation, etc.), this approach allows us to push the limits of linguistic research. If you can imagine a situation where a voicing contrast may appear, even a highly opaque or obscure one, you can craft a study that allows you to examine the linguistic behavior that makes such a contrast possible and legible.
So get creative, and see what's out there!
Recommended Readings
Agha, A. (2003). The social life of cultural value. Language & Communication, 23, 231–273.
Agha, A. (2004). Registers of Language. In A. Duranti (Ed.), A Companion to Linguistic Anthropology (pp. 23–45). John Wiley & Sons, Inc.
Agha, A. (2005). Voice, footing, enregisterment. Journal of Linguistic Anthropology, 15(1), 38–59.
Bakhtin, M. (1981). Discourse in the novel. (C. Emerson & M. Holquist, Trans.). In M. Holquist (Ed.), The Dialogic Imagination (pp. 259–422). University of Texas Press.
Bakhtin, M. (1986). The problem of speech genres. (V. W. McGee, Trans.). In C. Emerson & M. Holquist (Eds.), Speech Genres and Other Late Essays (pp. 60–102). University of Texas Press.
Irvine, J. T. (2001). “Style” as distinctiveness: The culture and ideology of linguistic differentiation. In P. Eckert & J. R. Rickford (Eds.), Style and Sociolinguistic Variation (pp. 21–43). Cambridge University Press.
Coupland, N. (2001). Language, situation, and the relational self: Theorizing dialect-style in sociolinguistics. In P. Eckert & J. R. Rickford (Eds.), Style and Sociolinguistic Variation. (pp. 185–210). Cambridge University Press.
Reyes, A. (2016). The voicing of Asian American figures: Korean linguistic styles at an Asian American cram school. In H. S. Alim, J. R. Rickford, & A. F. Ball (Eds.), Raciolinguistics: How language shapes our ideas about race (pp. 256–272). Oxford University Press.



Comments