In this post, I am going to consider how corpus linguistics as a method for analysing language can be beneficial to research on online grooming communication. By ‘online grooming communication’, I mean the use of language in the context of adults targeting and manipulating children online in order to sexually abuse them. This is an issue that is the focus of our work on DRAGON-S where we are developing tools that will help child-safeguarding practitioners to detect online grooming content and understand groomers’ and children’s communicative behaviour when online grooming is taking place.
Broadly speaking, corpus linguistics refers to a research methodology that uses a corpus or corpora (a body or bodies of texts large enough to mean that manual analysis makes little or no sense) to analyse how language is used. It involves using computer software to identify linguistic patterns – based on frequency or statistical calculations – that can tell us something interesting about a corpus and the use of language it represents; for example, about how minority groups are represented in the press or how people talk about their healthcare experiences online. To do this, a corpus should ideally be representative and balanced. This means it should contain all variations, types of speaker, situations of use, etc, of a language or text type, and in similar proportions to how these occur in the ‘real world’. If it does, then we as researchers can have a certain amount of confidence that our findings are generalisable about the language use a corpus represents.
Being able to produce generalisable findings is particularly valuable when researching the use of language in online grooming. One reason is because the communicative modus operandi of individual groomers varies considerably, from other groomers and within their own communication; such as between being aggressive or friendly, sexually explicit or implicit, opportunistic or gradually tactical, and so on. Therefore, any research that uses small datasets, as often occurs in studies that are purely qualitative, is likely to produce a partial and unrepresentative account of the language of online grooming. Using corpus linguistic methods, on the other hand, means that results can be based on all varieties of this use of language. (Of course, building a balanced and representative corpus using online grooming data comes with its own set of challenges, which is a topic for another day / blog post.)
Another argument in support of corpus linguistics as a method is that it helps reduce the influence of bias. That does not mean it can ever eliminate it completely, and researchers using corpus linguistics have plenty of choices to make where bias will inevitably creep in. However, using quantitative means to generate initial patterns for further analysis does reduce the risk of researchers being overly selective when picking which features or aspects of their corpus to focus on.
Confirmation bias – i.e., when researchers are drawn to samples or features that confirm what they want / expect to find in the data – is a particular risk with research involving well-established ideas about certain practices. This potentially occurs with online grooming, where its abusive and predatory nature may create the expectation that child-targets of grooming are always passive victims of bullying, controlling groomers. However, many child-targets will in fact actively participate in and drive forward conversations with groomers, without realising they are being groomed. It does not make them any less victims, but rather victims of groomers who may be using very subtle means of communicative manipulation.
With corpus linguistics, the use of frequency or statistical measures to generate results not only helps reduce the influence of what the researcher may already think about the data, but it can also reveal patterns not obvious to the naked eye, and lead to new and unexpected discoveries. As I’ve just suggested, groomers can be very subtle in the way they manipulate the children they target. This includes using language in a way that, on the surface, may make them seem like other children, such as when talking about hobbies and interests, relationships with parents, etc. At the same time, though, groomers will be working towards their primary goal of trying to engage children in sexual activity, and this may be evident in underlying patterns in their talk that the use of corpus linguistic methods can help reveal.
Revealing non-obvious meaning or behaviour, and creating opportunities for serendipitous findings, are qualities of corpus linguistics that not only appeal for what they may contribute to knowledge about online grooming, but also for how they help practitioners in child safeguarding roles tackle this problem. Disguise is a hallmark of groomer behaviour, and corpus linguistics has great potential to reveal what groomers may try to hide. In this way, it is a method (or perhaps more accurately, a set of methods) that is well-suited to a project like DRAGON-S and its aim to produce research-evidenced interventions to countering online grooming.
Another benefit of corpus linguistics is that it is very adaptable to mixing methods in different ways. While there are certainly patterns in the talk of groomers that can be highlighted using quantitative measures, online grooming represents two-way interaction between groomers and the children they target. Therefore, it will often involve complex context-specific behaviour that warrants a close qualitative analysis. Corpus linguistics includes a variety of well-established procedures for combining quantitative and qualitative methods, which operate at various levels or dimensions of language – from character to sentence – and always in context. It allows researchers to identify the overall features of how language is used in a particular context, while also being able to zoom in and look at qualitative examples in detail. In this way, researchers are never left unable to see the wood for the trees or the trees for the wood, and can get a complete picture of how people use language.
Thank you for reading this blog post. If you are interested in finding out more about using corpus linguistics to study language use or about how corpus-based methods have been used to research online grooming, here are a few suggestions (by no means a definitive list, just a starting point for anyone new to the topic):
Corpus linguistics and discourse:
Baker, P. (2006). Using corpora in discourse analysis. London: Bloomsbury
Baker, P. and McEnery, T. (2015). Introduction. In P. Baker and T. McEnery (Eds.), Corpora and discourse studies: Integrating discourse and corpora (pp. 1–19). Basingstoke: Palgrave Macmillan
Marchi, A. and Taylor, C. (2018). Introduction: Partiality and reflexivity. In A. Marchi and C. Taylor (Eds.), Corpus approaches to discourse: A critical review (pp. 1–15). London: Routledge.
Partington, A., Duguid, A. and Taylor, C. (2013). Patterns and meanings in discourse theory and practice in corpus-assisted discourse studies (CADS). Amsterdam: John Benjamins.
Corpus approaches to online grooming:
Lorenzo-Dus, Nuria and Anina Kinzel. 2019. ‘So is your mom as cute as you?’: Examining patterns of language use by online sexual groomers, Journal of Corpora and Discourse Studies, 4-39.
Lorenzo-Dus, N., Kinzel, A., & Di Cristofaro, M. (2020). The communicative modus operandi of online child sexual groomers: Recurring patterns in their language use. Journal of Pragmatics, 155, 15-27.
Lorenzo-Dus, Nuria and Anina Kinzel. 2021: ‘We’ll watch TV and do other stuff’: A corpus-assisted discourse study of vague language use in online child sexual grooming. In Exploring Discourse and Ideology through Corpora, edited by Miguel Fuster-Márquez, et al. Bern: Peter Lang, 189-210.
Schneevogt, D., Chiang, E., & Grant, T. (2018). Do Perverted Justice chat logs contain examples of overt persuasion and sexual extortion? A research note responding to Chiang and Grant (2017, 2018). Language and Law/Linguagem e Direito, 5(1), 97-102.
Dr Craig Evans is a corpus linguist working as Research Associate within Project DRAGON-S.