International Journal of English for Academic Purposes: Research and Practice

A review of CLIL outcomes: Calling for fuller use of effect sizes and less reliance upon p

International Journal of English for Academic Purposes: Research and Practice 2021, 205–212.

Abstract

This article reviews empirical quantitative studies concerning the outcomes of Content and Language Integrated Learning (CLIL) from the perspective of methodological rigour. This perspective advocates for fuller use of effect sizes (ESs) and less reliance upon the statistical significance (i.e., p). Regarding making fuller use of ESs, it proposes 1) reporting ESs for both statistically significant and non-significant results, 2) specifying which ES benchmark system for interpretation, 3) utilizing a topic-specific ES benchmark system, 4) valuing ES over p, and 5) interpreting ESs adequately. Regarding relying less on p, this article underscores 1) reporting exact p values and 2) using ‘statistically’ to modify ‘(non-)significant(ly)’. This review contributes to a better understanding of the inconsistent findings concerning the outcomes of CLIL. It also provides some suggestions for EAP practitioners (e.g., emphasizing reporting and interpreting ESs in their EAP teaching).

A review of CLIL outcomes: Calling for fuller use of effect sizes and less reliance upon p

Abstract

This article reviews empirical quantitative studies concerning the outcomes of Content and Language Integrated Learning (CLIL) from the perspective of methodological rigour. This perspective advocates for fuller use of effect sizes (ESs) and less reliance upon the statistical significance (i.e., p). Regarding making fuller use of ESs, it proposes 1) reporting ESs for both statistically significant and non-significant results, 2) specifying which ES benchmark system for interpretation, 3) utilizing a topic-specific ES benchmark system, 4) valuing ES over p, and 5) interpreting ESs adequately. Regarding relying less on p, this article underscores 1) reporting exact p values and 2) using ‘statistically’ to modify ‘(non-)significant(ly)’. This review contributes to a better understanding of the inconsistent findings concerning the outcomes of CLIL. It also provides some suggestions for EAP practitioners (e.g., emphasizing reporting and interpreting ESs in their EAP teaching).

Introduction

In Content and Language Integrated Learning (CLIL), an educational approach characterized by Coyle (2007, p. 545) as ‘where both language and content are conceptualized on a continuum without an implied preference for either’, subject contents are taught through a foreign language as the medium of instruction (Coyle et al., 2010; Lasagabaster & Sierra, 2010). The last two decades have witnessed an increasing number of CLIL programmes across Europe and beyond, from primary and secondary education to universities, as a response to the demand for improvement in foreign language teaching (Dafouz et al., 2014; Merino & Lasagabaster, 2015; Wei & Feng, 2015; Yang, 2015). Along with the boom of the CLIL practice, many studies concerning its outcomes have been conducted. Unfortunately, it appears that those studies have been outpaced by the rapid expansion of CLIL (Merino & Lasagabaster, 2015). The inconsistent results emerging from accumulating studies have not been fully examined. Some scholars (e.g., Dalton-Puffer, 2011; Merino & Lasagabaster, 2015) suggest that CLIL programmes bring benefits to learners, but the results are not always optimistic (Yang, 2015). Inconsistent outcomes concerning the effectiveness of CLIL have been found in both linguistic (e.g., listening, writing) and non-linguistic (e.g., motivation, content achievement) areas.

Against this background, the present article aims to critically review CLIL outcomes in previous quantitative research, from the perspective of methodological rigour which advocates fuller use of effect sizes (ESs) and less reliance upon the statistical significance (viz. p value). This article attempts to contribute to a better understanding of the inconsistent findings concerning the outcomes of CLIL. In quantitative research, the magnitude of difference or the strength of correlation, which is reflected by ES rather than the statistical significance (viz. p value), should be focussed on (Cohen, 1990). The importance of ES over p value has also been highlighted in recent years. For example, the use of p resulting from the null hypothesis significance testing procedure has been banned by the editors of Basic and Applied Social Psychology, who ‘require strong descriptive statistics, including effect sizes’ (Trafimow & Marks, 2015, p. 1). However, the importance of ES has not been widely recognized and put into practice in the field of applied linguistics in general (Kong & Wei, 2019; Wei et al., 2019) and CLIL in particular. Accordingly, the present article will illustrate the importance of ESs with some typical studies about CLIL outcomes which unfortunately fail to make fuller use of ESs and/or rely too much on p. Such examination of methodological rigour is of significance to CLIL research because, as the review below will show, misunderstanding of ESs and p values can lead to misinterpretation of CLIL outcomes. This article also aims to provide pedagogical suggestions for EAP practitioners. The following review of CLIL outcomes is structured around two themes: encouraging fuller use (i.e., reporting and interpreting) of ESs and less reliance upon p.

Research of CLIL outcomes and ES

In connection with reporting, ESs should be reported both when results are statistically significant and when they are not (Larson-Hall & Plonsky, 2015; Sun et al., 2010). It is not uncommon to see scholars only reporting p values and omitting ESs in CLIL research (e.g., Nieto Moreno de Diezmas & Matthew, 2019). Likewise, some researchers (e.g., Pérez Cañado & Lancaster, 2017) only report ESs when the results were statistically significant.

In terms of ES interpretation, four points merit attention. First, scholars should always specify which ES benchmark system they draw upon to avoid ambiguity. Studies on CLIL outcomes which fail to specify the benchmark system include Prieto-Arranz et al. (2015) and Roquet et al. (2016).

Second, it is more advisable to draw on a topic-specific ES benchmark system. For instance, Cohen’s (1988) benchmarks (e.g., for Pearson’s r, .10 as small, .30 medium, and .50 large) used ‘by too many researchers as iron-clad criteria’ (Wei et al., 2019, p. 2), are based on the ESs usually found in the studies of behavioural sciences, so they ‘do not have absolute meaning and are only relative to typical findings in these areas’ (Leech et al., 2005, p. 56). An ES benchmark most relevant to the focal research topic should be prioritized. For example, Martínez Agudo (2021) examined the strength of association between motivational variables and learning competence in natural science among 156 CLIL learners and their 162 non-CLIL counterparts. Based on the correlational strength (r = .222 and -.212 respectively) this researcher claimed to find a ‘weak degree of correlation’ (p. 236) between two (i.e., self-demand and lack of interest) of the motivational variables and CLIL students’ content achievement. Motivational variables were measured with Pelechano’s (1994) motivation instrument, and the performance in natural science with final grades obtained from the schools. Considering that this is a survey-based study, Wei and Hu’s (2019) ES interpretation system is recommended (Wei et al., 2019). In this more topic-specific interpretation system, the small, medium, large, and very large cut-offs for the ES index R2 are .005, .01, .02, and .09 respectively; when R2 is un-squared (r), the corresponding cut-offs are (roughly) .07, .10, .14, and .30. According to this system, r = .222 and r = -.212 from Martínez Agudo (2021) both fell between the large and very large benchmarks. Therefore, the correlation between the two motivational variables and the content-subject learning performance can be reinterpreted as large. This is very different from several conclusions, such as ‘a low correlation between the motivational variables scrutinized and content achievement levels in CLIL settings’ (p. 237).

Third, scholars should value ESs over ps when both are reported. Apart from judging whether or not the result is statistically significant according to p, researchers should rely more on ES benchmark systems to provide interpretations because ES is much more important than p (Larson-Hall, 2016; Wei et al., 2019). Hence researchers’ failure to interpret ESs can potentially diminish the importance of their findings (Wei & Hu, 2019). For instance, comparing the CLIL effect on the overall English proficiency between the CLIL-group (208 high school students who attended 3.4 CLIL sessions weekly on average) and the CLIL+ group (108 high school students who attended 8.4 CLIL sessions weekly on average) after one year, Merino & Lasagabaster (2017) concluded that ‘the evolution of CLIL+ students was significantly higher than that of their CLIL-counterparts (with a small r effect size: r = 0.27)’ (p. 8) (p > .001) and ‘the CLIL-participants progressed less than the CLIL+ participants’ (p. 9). It appears that the authors only used p to reach the conclusion. If the ‘small’ effect size is taken into consideration, the following could be a different interpretation: more exposure to CLIL only resulted in small progress, which might be neglected in practice.

Similarly, Pavón Vázquez (2018), without drawing up the reported ES (d = -.978), purported that there were no statistically significant differences (p = .051) in English pronunciation between thirty rural students and five urban students in the sixth grade of Spanish primary education who received CLIL. In the words of renowned statisticians, ‘Surely, God loves the 0.06 nearly as much as the 0.05’ (Rosnow & Rosenthal, 1989, p. 1277). Researchers should not simply dismiss the above magnitude of differences because p was slightly higher than the conventional cut-off (.05). A more reliable approach is to interpret the results based on ESs. When Cohen’s (1988) system was adopted, the above inter-group differences could be regarded as ‘large’ because d (-.978) exceeded Cohen’s ‘large’ benchmark (.80). Pavón Vázquez (2018) argued that pronunciation was one of the very few linguistic sub-aspects in which the urban and rural participants levelled out; however, the rather large d supported the following reinterpretation: the overall performance of urban students in terms of foreign language competence was better than that of the rural cohorts.

Fourth, whenever possible, researchers should adequately interpret ESs. One fruitful way is comparing the magnitudes of their ESs with those from previous research literature (Larson-Hall & Plonsky, 2015). When interpretating the difference between the CLIL and the non-CLIL groups regarding oral comprehension at the second post-test phase, Pérez Cañado & Lancaster (2017) claimed to find no inter-group difference based on p (.069). However, this large p was probably attributable to the very small sample size (only twelve students per group). The recalculated ES (d = .78) indicates that the CLIL cohorts outperformed its counterpart in oral comprehension; when interpreted according to Cohen’s (1988) system, this d reveals a large effect because this ES after rounding reaches the ‘large’ benchmark. For adequate interpretation of the ES, the authors could have compared that ES to those in the studies also examining the differences in terms of listening competence between CLIL and non-CLIL students (e.g., d = 1.09 after recalculation in Lasagabaster (2008) - one of the studies reviewed in Pérez Cañado & Lancaster (2017)), which may trigger questions regarding which factors cause the differences in the outcomes of CLIL across related studies. Such comparison is a good practice for interpreting ESs, helping inform the practical meaning of ESs (Sun et al., 2010).

Research of CLIL outcomes and p

In addition to making fuller use of ESs, less reliance on p should be highlighted. First, researchers are expected to report exact p values rather than ‘p > .05’ and the like. This is because the latter reporting practice subscribes to the dichotomous thinking that the result is either statistically significant or not (Larson-Hall & Plonsky, 2015; Sun et al., 2010). The dichotomous thinking compromises ‘the richness of information’ (Sun et al., 2010, p. 1001) in p: the smaller the p value is, the less possible its associated sample is in the sampling distribution. Furthermore, this thinking can result in underestimating the importance of ESs (Larson-Hall & Plonsky, 2015). The p value can be easily influenced by the sample size: it is common to see large (statistically non-significant) p values associated with a small sample size and small (statistically significant) p values with a big sample size. For instance, in Merino & Lasagabaster (2015), the difference in reading comprehension was not statistically significant (p > .05, see their Table 8) between twenty-four CLIL students and thirty-two non-CLIL ones; this could simply be attributed to the small sample size in addition to the reasons listed by the authors; in Várkuti’s (2010) study involving 816 CLIL students and 631 non-CLIL students, it is not surprising to see all the reported p were ‘p = 0.000’ as the sample size was rather large. As p is highly dependent on sample size, better interpretation needs to be based more on ESs and less on ps.

The second suggestion to reduce reliance upon p is that the word ‘statistically’ should be used to modify ‘(non-)significant(ly)’. ‘[T]he tendency to drop the word statistically and use significant difference instead of statistically significant difference in research reports’ can reflect the most common misconception that ‘statistically significant means important’ (Nassaji, 2012, p. 95). Some scholars have even proposed that ‘the term significance should be removed from the statistics vocabulary’ (Nassaji, 2012, p. 96). For instance, Kline (2004) (also cited in Larson-Hall, 2016) suggested using ‘statistical’ to denote a ‘statistically significant’ result and maintaining ‘significant’ in its original meaning of ‘important’. There are often cases in which researchers mistake ‘(non-)significant’ for ‘practically (un)important’ (e.g., Nieto Moreno de Diezmas, 2018; Nieto Moreno de Diezmas & Matthew, 2019; Prieto-Arranz et al., 2015; Várkuti, 2010). For example, Nieto Moreno de Diezmas (2018) reported that the overall score of digital competence and the scores of most of its sub learning standards of CLIL participants were ‘significantly higher’ (p. 81) ‘since p = .000’ (p. 79) and reached the conclusion that ‘the CLIL programme was more productive (than the mainstream programme) for learning digital competence’ (p. 81).

Conclusion and implications

This article from the perspective of methodological rigour has revealed several problems in the existing research on CLIL outcomes and proposed some suggestions for future CLIL research. Due to the inadequate use of ES and/or the over-reliance upon p, some research outcomes (e.g., differences between CLIL and non-CLIL students) have been unfortunately misinterpreted. As CLIL is burgeoning across Europe and beyond, more methodologically rigorous research needs to be carried out in order to paint a more holistic and reliable picture of the effectiveness of CLIL.

This article can provide some practical suggestions for EAP practitioners, who have the power to contribute to improving research practices. EAP practitioners often serve as trainers (e.g., EAP teachers teaching publication skills, see Li & Cargill, 2021) of (novice) researchers aspiring to publish their work in peer-reviewed publications. Given the importance of ESs in quantitative research discussed above, EAP practitioners can engage their postgraduate students and trainees (e.g., the above-mentioned researchers) in making use of ESs in their teaching practices in order to enhance students’ and trainees’ awareness of methodological rigour and improve their reporting practices. Improved reporting practices will increase those students’ and trainees’ chances to publish their quantitative research in peer-reviewed publications. Furthermore, explicit guidelines for reporting and interpreting ESs can be included in EAP teaching materials. But the development of such teaching materials might take time; in the absence of such materials, EAP practitioners should consider emphasizing ESs when teaching their postgraduate students and novice researchers.

References

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Routledge. Google Scholar

Cohen, J. (1990). Things I have learned so far. American Psychologist, 45, 1304-1312. Google Scholar

Coyle, D. (2007). Content and Language Integrated Learning: Towards a connected research agenda for CLIL pedagogies. International Journal of Bilingual Education and Bilingualism, 10(5), 543-562. https://doi.org/10.2167/beb459.0. Google Scholar

Coyle, D., Hood, P., & Marsh, D. (2010). CLIL: Content and Language Integrated Learning. Cambridge University Press. Google Scholar

Dafouz, E., Camacho, M., & Urquia, E. (2014). “Surely they can’t do as well”: A comparison of business students’ academic performance in English-medium and Spanish-as-first-language-medium programmes. Language and Education, 28(3), 223-236. https://doi.org/10.1080/09500782.2013.808661. Google Scholar

Dalton-Puffer, C. (2011). Content and Language Integrated Learning: From practice to principles? Annual Review of Applied Linguistics, 31, 182-204. https://doi.org/10.1017/S0267190511000092. Google Scholar

Kline, R. B. (2004). Beyond significance testing: Reforming data analysis methods in behavioral research. American Psychological Association. Google Scholar

Kong, M., & Wei, R. (2019). EFL learners’ attitudes toward English-medium instruction in China: The influence of sociobiographical variables. Linguistics and Education, 52, 44-51. https://doi.org/10.1016/j.linged.2019.03.005. Google Scholar

Larson-Hall, J. (2016). A guide to doing statistics in second language research using SPSS and R (2nd ed.). Routledge. Google Scholar

Larson-Hall, J., & Plonsky, L. (2015). Reporting and interpreting quantitative research findings: What gets reported and recommendations for the field. Language Learning, 65(S1), 127-159. https://doi.org/10.1111/lang.12115. Google Scholar

Lasagabaster, D. (2008). Foreign language competence in Content and Language Integrated Courses. The Open Applied Linguistics Journal, 1(1), 30-41. https://doi.org/10.2174/1874913500801010030. Google Scholar

Lasagabaster, D., & Sierra, J. M. (2010). Immersion and CLIL in English: More differences than similarities. ELT Journal, 64(4), 367-375. https://doi.org/10.1093/elt/ccp082. Google Scholar

Leech, N. L., Barrett, K. C., & Morgan, G. A. (2005). SPSS for intermediate statistics: Use and interpretation (2nd ed.). Lawrence Erlbaum. Google Scholar

Li, Y., & Cargill, M. (2021). Teaching English for research publication purposes to novice Chinese scientist authors: An interview with Margaret Cargill. International Journal of English for Academic Purposes: Research and Practice, 1(1), 81-94. https://doi.org/10.3828/ijeap.2021.6. Google Scholar

Martínez Agudo, J. de D. (2021). To what extent do affective variables correlate with content learning achievement in CLIL programmes? Language and Education, 35(3), 226-240. https://doi.org/10.1080/09500782.2020.1833910. Google Scholar

Merino, J. A., & Lasagabaster, D. (2015). CLIL as a way to multilingualism. International Journal of Bilingual Education and Bilingualism, 21, 1-14. https://doi.org/10.1080/13670050.2015.1128386. Google Scholar

Merino, J. A., & Lasagabaster, D. (2017). The effect of content and language integrated learning programmes’ intensity on English proficiency: A longitudinal study. International Journal of Applied Linguistics, 28(1), 1-13. https://doi.org/10.1111/ijal.12177. Google Scholar

Nassaji, H. (2012). Statistical significance tests and result generalizability: Issues, misconceptions, and a case for replication. In G. Porte (Ed.), Replication research in applied linguistics (pp. 92-115). Cambridge University Press. Google Scholar

Nieto Moreno de Diezmas, E. (2018). Exploring CLIL contribution towards the acquisition of cross-curricular competences: A comparative study on digital competence development in CLIL. Revista de Lingüística y Lenguas Aplicadas, 13(1), 75-85. https://doi.org/10.4995/rlyla.2018.9023. Google Scholar

Nieto Moreno de Diezmas, E., & Matthew, T. (2019). Social science learning and gender-based differences in CLIL. A preliminary study. Estudios de Lingüística Inglesa Aplicada, 19, 177-204. https://doi.org/10.12795/elia.2019.i19.08. Google Scholar

Pavón Vázquez, V. (2018). Learning outcomes in CLIL programmes: A comparison of results between urban and rural environments. Porta Linguarum Revista Interuniversitaria de Didáctica de Las Lenguas Extranjeras, 9-28. https://doi.org/10.30827/Digibug.54020. Google Scholar

Pelechano, V. (1994). Prueba MA [MA test]. Análisis y Modificación de La Conducta [Analysis and Modification of Conducts], 20, 71-72. Google Scholar

Pérez Cañado, M. L., & Lancaster, N. K. (2017). The effects of CLIL on oral comprehension and production: A longitudinal case study. Language, Culture and Curriculum, 30(3), 300-316. https://doi.org/10.1080/07908318.2017.1338717. Google Scholar

Prieto-Arranz, J. I., Rallo Fabra, L., Calafat-Ripoll, C., & Catrain-González, M. (2015). Testing progress on receptive skills in CLIL and non-CLIL contexts. In M. Juan-Garau & J. Salazar-Noguera (Eds.), Content-based language learning in multilingual educational environments (Vol. 23, pp. 123-137). Springer International Publishing. https://doi.org/10.1007/978-3-319-11496-5_8. Google Scholar

Roquet, H., Llopis, J., & Pérez-Vidal, C. (2016). Does gender have an impact on the potential benefits learners may achieve in two contexts compared: Formal instruction and formal instruction + content and language integrated learning? International Journal of Bilingual Education and Bilingualism, 19(4), 370-386. https://doi.org/10.1080/13670050.2014.992389. Google Scholar

Rosnow, R. L., & Rosenthal, R. (1989). Statistical procedures and the justification of knowledge in psychological science. American Psychologist, 44(10), 1276-1284. Google Scholar

Sun, S., Pan, W., & Wang, L. L. (2010). A comprehensive review of effect size reporting and interpreting practices in academic journals in education and psychology. Journal of Educational Psychology, 102(4), 989-1004. https://doi.org/10.1037/a0019507. Google Scholar

Trafimow, D., & Marks, M. (2015). Editorial. Basic and Applied Social Psychology, 37(1), 1-2. https://doi.org/10.1080/01973533.2015.1012991. Google Scholar

Várkuti, A. (2010). Linguistic benefits of the CLIL approach. International CLIL Research Journal, 1(3), 67-79. Google Scholar

Wei, R., & Feng, J. (2015). Implementing CLIL for young learners in an EFL context beyond Europe: Grassroots support and language policy in China. English Today, 31(1), 55-60. https://doi.org/10.1017/S0266078414000558. Google Scholar

Wei, R., & Hu, Y. (2019). Exploring the relationship between multilingualism and tolerance of ambiguity: A survey study from an EFL context. Bilingualism: Language and Cognition, 22(5), 1209-1219. https://doi.org/10.1017/S1366728918000998. Google Scholar

Wei, R., Hu, Y., & Xiong, J. (2019). Effect size reporting practices in applied linguistics research: A study of one major journal. Sage Open, 9(2), 1-11. https://doi.org/10.1177/2158244019850035. Google Scholar

Yang, W. (2015). Content and language integrated learning next in Asia: Evidence of learners’ achievement in CLIL education from a Taiwan tertiary degree programme. International Journal of Bilingual Education and Bilingualism, 18(4), 361-382. https://doi.org/10.1080/13670050.2014.904840. Google Scholar

References

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Routledge. Google Scholar

Cohen, J. (1990). Things I have learned so far. American Psychologist, 45, 1304-1312. Google Scholar

Coyle, D. (2007). Content and Language Integrated Learning: Towards a connected research agenda for CLIL pedagogies. International Journal of Bilingual Education and Bilingualism, 10(5), 543-562. https://doi.org/10.2167/beb459.0. Google Scholar

Coyle, D., Hood, P., & Marsh, D. (2010). CLIL: Content and Language Integrated Learning. Cambridge University Press. Google Scholar

Dafouz, E., Camacho, M., & Urquia, E. (2014). “Surely they can’t do as well”: A comparison of business students’ academic performance in English-medium and Spanish-as-first-language-medium programmes. Language and Education, 28(3), 223-236. https://doi.org/10.1080/09500782.2013.808661. Google Scholar

Dalton-Puffer, C. (2011). Content and Language Integrated Learning: From practice to principles? Annual Review of Applied Linguistics, 31, 182-204. https://doi.org/10.1017/S0267190511000092. Google Scholar

Kline, R. B. (2004). Beyond significance testing: Reforming data analysis methods in behavioral research. American Psychological Association. Google Scholar

Kong, M., & Wei, R. (2019). EFL learners’ attitudes toward English-medium instruction in China: The influence of sociobiographical variables. Linguistics and Education, 52, 44-51. https://doi.org/10.1016/j.linged.2019.03.005. Google Scholar

Larson-Hall, J. (2016). A guide to doing statistics in second language research using SPSS and R (2nd ed.). Routledge. Google Scholar

Larson-Hall, J., & Plonsky, L. (2015). Reporting and interpreting quantitative research findings: What gets reported and recommendations for the field. Language Learning, 65(S1), 127-159. https://doi.org/10.1111/lang.12115. Google Scholar

Lasagabaster, D. (2008). Foreign language competence in Content and Language Integrated Courses. The Open Applied Linguistics Journal, 1(1), 30-41. https://doi.org/10.2174/1874913500801010030. Google Scholar

Lasagabaster, D., & Sierra, J. M. (2010). Immersion and CLIL in English: More differences than similarities. ELT Journal, 64(4), 367-375. https://doi.org/10.1093/elt/ccp082. Google Scholar

Leech, N. L., Barrett, K. C., & Morgan, G. A. (2005). SPSS for intermediate statistics: Use and interpretation (2nd ed.). Lawrence Erlbaum. Google Scholar

Li, Y., & Cargill, M. (2021). Teaching English for research publication purposes to novice Chinese scientist authors: An interview with Margaret Cargill. International Journal of English for Academic Purposes: Research and Practice, 1(1), 81-94. https://doi.org/10.3828/ijeap.2021.6. Google Scholar

Martínez Agudo, J. de D. (2021). To what extent do affective variables correlate with content learning achievement in CLIL programmes? Language and Education, 35(3), 226-240. https://doi.org/10.1080/09500782.2020.1833910. Google Scholar

Merino, J. A., & Lasagabaster, D. (2015). CLIL as a way to multilingualism. International Journal of Bilingual Education and Bilingualism, 21, 1-14. https://doi.org/10.1080/13670050.2015.1128386. Google Scholar

Merino, J. A., & Lasagabaster, D. (2017). The effect of content and language integrated learning programmes’ intensity on English proficiency: A longitudinal study. International Journal of Applied Linguistics, 28(1), 1-13. https://doi.org/10.1111/ijal.12177. Google Scholar

Nassaji, H. (2012). Statistical significance tests and result generalizability: Issues, misconceptions, and a case for replication. In G. Porte (Ed.), Replication research in applied linguistics (pp. 92-115). Cambridge University Press. Google Scholar

Nieto Moreno de Diezmas, E. (2018). Exploring CLIL contribution towards the acquisition of cross-curricular competences: A comparative study on digital competence development in CLIL. Revista de Lingüística y Lenguas Aplicadas, 13(1), 75-85. https://doi.org/10.4995/rlyla.2018.9023. Google Scholar

Nieto Moreno de Diezmas, E., & Matthew, T. (2019). Social science learning and gender-based differences in CLIL. A preliminary study. Estudios de Lingüística Inglesa Aplicada, 19, 177-204. https://doi.org/10.12795/elia.2019.i19.08. Google Scholar

Pavón Vázquez, V. (2018). Learning outcomes in CLIL programmes: A comparison of results between urban and rural environments. Porta Linguarum Revista Interuniversitaria de Didáctica de Las Lenguas Extranjeras, 9-28. https://doi.org/10.30827/Digibug.54020. Google Scholar

Pelechano, V. (1994). Prueba MA [MA test]. Análisis y Modificación de La Conducta [Analysis and Modification of Conducts], 20, 71-72. Google Scholar

Pérez Cañado, M. L., & Lancaster, N. K. (2017). The effects of CLIL on oral comprehension and production: A longitudinal case study. Language, Culture and Curriculum, 30(3), 300-316. https://doi.org/10.1080/07908318.2017.1338717. Google Scholar

Prieto-Arranz, J. I., Rallo Fabra, L., Calafat-Ripoll, C., & Catrain-González, M. (2015). Testing progress on receptive skills in CLIL and non-CLIL contexts. In M. Juan-Garau & J. Salazar-Noguera (Eds.), Content-based language learning in multilingual educational environments (Vol. 23, pp. 123-137). Springer International Publishing. https://doi.org/10.1007/978-3-319-11496-5_8. Google Scholar

Roquet, H., Llopis, J., & Pérez-Vidal, C. (2016). Does gender have an impact on the potential benefits learners may achieve in two contexts compared: Formal instruction and formal instruction + content and language integrated learning? International Journal of Bilingual Education and Bilingualism, 19(4), 370-386. https://doi.org/10.1080/13670050.2014.992389. Google Scholar

Rosnow, R. L., & Rosenthal, R. (1989). Statistical procedures and the justification of knowledge in psychological science. American Psychologist, 44(10), 1276-1284. Google Scholar

Sun, S., Pan, W., & Wang, L. L. (2010). A comprehensive review of effect size reporting and interpreting practices in academic journals in education and psychology. Journal of Educational Psychology, 102(4), 989-1004. https://doi.org/10.1037/a0019507. Google Scholar

Trafimow, D., & Marks, M. (2015). Editorial. Basic and Applied Social Psychology, 37(1), 1-2. https://doi.org/10.1080/01973533.2015.1012991. Google Scholar

Várkuti, A. (2010). Linguistic benefits of the CLIL approach. International CLIL Research Journal, 1(3), 67-79. Google Scholar

Wei, R., & Feng, J. (2015). Implementing CLIL for young learners in an EFL context beyond Europe: Grassroots support and language policy in China. English Today, 31(1), 55-60. https://doi.org/10.1017/S0266078414000558. Google Scholar

Wei, R., & Hu, Y. (2019). Exploring the relationship between multilingualism and tolerance of ambiguity: A survey study from an EFL context. Bilingualism: Language and Cognition, 22(5), 1209-1219. https://doi.org/10.1017/S1366728918000998. Google Scholar

Wei, R., Hu, Y., & Xiong, J. (2019). Effect size reporting practices in applied linguistics research: A study of one major journal. Sage Open, 9(2), 1-11. https://doi.org/10.1177/2158244019850035. Google Scholar

Yang, W. (2015). Content and language integrated learning next in Asia: Evidence of learners’ achievement in CLIL education from a Taiwan tertiary degree programme. International Journal of Bilingual Education and Bilingualism, 18(4), 361-382. https://doi.org/10.1080/13670050.2014.904840. Google Scholar


Details

Author details

Li, Xinyu