The Importance of Collocates

The following post was written by Mark Davies, a professor of Linguistics and English Language and one of the Humanities Center’s Faculty Fellows.

Corpora (large collections of highly-searchable texts) are obviously useful for linguistic analysis. But as I’ll try to show in this short blog post, they are also potentially very useful for research on culture and society as well.

One of the useful things that researchers can do with the right kind of corpora (like those from is to look at collocates (“nearby words”), which provide great insight into the meaning and usage of words and phrases. As the saying goes, “you can learn a lot about words by the words that they hang out with”.

At the most basic level, we can create simple “sketches” for words like brooding (collocates = dark, eyes, silence, presence, sky, heavy, gray, and mysterious) or sprawl (suburban, growth, traffic, pollution, and congestion). Notice how collocates move far beyond a simple dictionary definition to give us insight into societal issues. Getting a bit more complicated, we can also look at “semantic prosody”, or the tendency for certain words to collocate with primarily negative or positive words, such as the verb cause (problems, damage, pain, trouble, cancer, disease).

Where things get really interesting is when we start looking at variation. For example, collocates of strong that are more common in fiction are fingers, shoulders, sun, and coffee, whereas in academic they are students, research, association, and programs, showing that the word has quite different “meanings” in these two genres.

Collocates can also point out differences between cultures. For example, collocates of belief that are more common in “developing” countries are Hindu, sectarian, corrupt, and heretical (more religious), while in the “developed” countries they are contradictory, liberal, apparent, and deepest (potentially more secular). Another example are collocates of wife, which include chaste, obedient, good, virtuous, or senior, temporary, and permanent in Africa and Asia, where there is more Muslim influence than in Europe or North America.

Perhaps the most interesting use of collocates is to look at the changing view of different concepts over time. For example, collocates of women that were much more common in American English in the 1800s than today include strong-minded, noble, helpless, defenseless, delicate, and virtuous, which definitely have a sexist ring to them (at least according to our current sensibilities). We can also see the changing way in which we view religion, with collocates like divine, truth, Christian, and doctrines in the 1800s, and collocates like Eastern, traditional, old-time, and humanism since the 1960s. We could likewise use corpora to look at changing views on any other topic over time – the environment, sports, drug use, politics, or law. (And the BYU corpora have in fact been used by the US Supreme Court to look at the changing usage of words over time.)

We can even search for changes in LDS discourse from the 1850s-2010s. For example, we can see changes in the collocates of words like gospelmissionaryeternal, or prayer over time. Even more in-depth research allows us to see what is being said about topics like marriage, which in the 1800s included collocates like patriarchal, plural, single, or mongamic (i.e. theological justifications for plural marriage), whereas more recently they refer to good, successful, happy and ideal marriages (a more “warm and fuzzy” view of marriage).

This handful of examples hopefully shows how we can gain insight into culture and society (in different countries, or across time) by looking at collocates (nearby words). Because the BYU corpora are almost unique in the way that they allow users to carry out such research, it is little wonder that the BYU corpora are used for this and similar purposes by nearly 200,000 researchers from throughout the world every month.


Popular Articles...

Leave a Reply

Your email address will not be published. Required fields are marked *