Introduction
The requirement of transparency, which requires contracts to be drafted in plain and intelligible language and, if written, be legible, plays a key role in the governance of consumer contracts.Footnote 1 Businesses are incentivised to ensure plain and intelligible language in two ways. First, through protection from scrutiny of core terms expressed, inter alia, in plain and intelligible language.Footnote 2 Secondly, by the possibility of regulatory action (including an injunction) when a term or contract is not expressed in plain and intelligible language,Footnote 3 whether or not the terms are substantively unfair. Therefore, it is necessary to have a clear understanding of the meaning of plain and intelligible language, and for business and regulators to operationalise the concept in order to assess whether contracts satisfy regulatory requirements. Businesses can then draft compliant contracts, and regulators can take enforcement action against non-compliant ones.
Reading scores are an increasingly popular method for assessing contractual language and identifying contracts that are too difficult for the average consumer to understand. Advocates of the use of reading scores identify them as a ‘simple, inexpensive way to measure the comprehensibility of legal language’,Footnote 4 something that is particularly useful when enforcement budgets are strained.Footnote 5 Legislators, particularly in the United States, have used reading scores when seeking to reduce the complexity of legal documents, providing a clear target standard that documents must reach.Footnote 6 Compliance with such standards can, in theory, be assessed with the application of a simple computer program.
This paper seeks to explore plain and intelligible language requirements in UK consumer contract law, and consider whether reading scores can (or cannot) provide a simple mechanism for determining whether a clause, or a document, is compliant with these requirements. It begins by considering in Part 1 the requirement that contracts be expressed in plain and intelligible language in its regulatory context, before turning to examine the development of reading scores in Part 2, examining some of the most common formulae used to determine readability. The use of reading scores to assess legal language is considered in Part 3, before Part 4 reports on an empirical test of the utility of reading scores in determining whether a contract is expressed in plain and intelligible language. Part 5 draws on the results of this empirical examination to make policy recommendations about the proper role of reading scores in the process of determining whether contract language is plain and intelligible.
1. Consumer contracts and plain and intelligible language
Standard form contracts dominate the relationship between consumers and traders. Consumers are not in a position to negotiate the terms of the contracts, and such contracts are often made on a ‘take it or leave it’ basis, giving businesses a high level of control of the terms of those contracts.Footnote 7 The power imbalance has resulted in protection for consumers from the effects of the traders’ ability to impose contract terms determined by them.Footnote 8 This regulation of contract terms provides the background to the requirement that consumer contracts be expressed in plain and intelligible language.
(a) From contractual freedom to consumer contractual regulation
At common law there were few doctrines that allowed courts to interfere with the terms of a contract.Footnote 9 Exclusion and limitation clauses, which eliminated or reduced the innocent party's entitlement to damages, were seen as particularly likely to give rise to unfairness.Footnote 10 However, courts would not engage in the rewriting of contracts and would not refuse to enforce certain terms to prevent unfairness.Footnote 11 Interpretive rules were used to ameliorate harshness,Footnote 12 interpreting clauses in the manner most favourable to the party against whom they were used.Footnote 13 These contra proferentem rules were perhaps the closest common law precursor to the requirement to draft contracts in plain and intelligible language. Drafters were incentivised to ensure that the effect of their term was clear in order to ensure that the term would operate in the way they intended. Beyond interpretation, arguments were directed against the incorporation of such terms,Footnote 14 with emphasis placed on the prominence of a term which could not be incorporated into the contract unless the party had notice of them.Footnote 15
However, the common law was not thought to be sufficiently protective.Footnote 16 In order to provide a legislative safeguard the Unfair Contract Terms Act 1977 (UCTA) was passed. This applied to both business-to-business and business-to-consumer contracts. Under UCTA courts could, in closely defined circumstances,Footnote 17 adjudicate whether exclusion and limitation clausesFootnote 18 were ‘reasonable’. UCTA did not separately regulate contractual wording, but the intelligibility of a term could be taken into account in assessing whether a term was reasonable.Footnote 19
Further protection was provided for consumers by the Unfair Terms in Consumer Contracts Regulations 1994 (UTCCR 1994), which implemented the Unfair Terms Directive.Footnote 20 The UTCCR 1994 were repealed and replaced by the Unfair Terms in Consumer Contracts Regulations 1999 (UTCCR 1999). The primary function of the Unfair Terms Directive was to enable courts to decide whether terms were ‘fair’.Footnote 21 This statutory intervention provided significantly improved protection for consumers, widening the scope of protection beyond exclusion and limitation clauses and allowing substantive issues to be canvassed. As a subsidiary matter, regulators and courts were required to assess whether core terms were expressed in plain and intelligible language when determining their jurisdiction, and given a power to assess all terms for linguistic compliance.
As part of the recent attempt to consolidate consumer contract law,Footnote 22 rendering it less complex, less fragmented and clearer,Footnote 23 with respect to consumer contracts the Consumer Rights Act 2015 (CRA 2015) repealed and replaced UCTA and the UTCCR 1999.Footnote 24 UCTA continues to govern exclusion and limitation clauses in business to business contracts. Part 3 of the CRA 2015 implements the Unfair Terms Directive in a similar manner to UTCCR 1999. However, as the Directive requires only minimum harmonisation,Footnote 25 the UK has chosen to go beyond the minimum standards prescribed by providing that the CRA 2015 applies to individually negotiated contracts, imposing a requirement of prominence and legibility if a trader seeks to exempt a core term from scrutiny and applying the provisions to notices as well as contracts.
Under the CRA 2015 most of the terms in a consumer contract are subject to a test of fairness. The test is set out in CRA 2015, s 62(4). This provides that ‘[a] term is unfair if, contrary to the requirement of good faith, it causes a significant imbalance in the parties’ rights and obligations under the contract to the detriment of the consumer’.Footnote 26 If a term is unfair it does not bind the consumer in his or her dealings with the trader.Footnote 27 The rest of the contract will continue, ‘so far as practicable, to have effect in every other respect’.Footnote 28
The fairness test is not explored in detail in this paper. Instead, this paper focuses on the requirement that contractual terms be expressed in plain and intelligible language.Footnote 29 This requirement plays two roles in the scheme of the CRA 2015. First, (along with the requirements of prominenceFootnote 30 and legibilityFootnote 31) it governs the subject matter scope of the s 62 fairness jurisdiction. If a term which relates to the main subject matter of the contract or the price payable is not expressed in plain and intelligible language then it may be subjected to scrutiny to assess whether it is fair. If such a term is expressed in plain and intelligible language, is legible and is prominent then it will be exempt from the test of fairness (‘the core terms exception’).Footnote 32 The core terms exception seeks to encourage clarity in contract drafting.Footnote 33 Second, the requirement that a contract is expressed in plain and intelligible language is a standalone ground for regulatory action, which applies whether or not a contract term is fair.Footnote 34 This focuses on the drafting of the boilerplate terms of the contract rather than on the core terms.
(b) Plain and intelligible language
The requirement of plain and intelligible language is an important tool for ensuring that consumers are aware of the terms of trade.Footnote 35 Once a consumer is aware of the terms then he or she can make the choice not to engage with the trader on those terms,Footnote 36 forcing the trader to either change the term(s) or to exit the market.Footnote 37 Plain and intelligible terms are therefore an aid to market discipline,Footnote 38 and the requirement can be seen as a less interventionist approach to governance of unfair terms, as it empowers the consumer,Footnote 39 rather than interferes with the contract.Footnote 40 The linkage of the core terms ‘exemption’ and ‘plain and intelligible language’ allows consumers to understand those terms that they must read in order to make an informed choice,Footnote 41 with the fairness test policing those terms that the average consumer does not read.Footnote 42
The Unfair Terms Directive requires ‘plain intelligible language’ in Art 4(2) and ‘plain, intelligible language’ in Art 5. This presents a slight contrast to CRA 2015, s 64(3), which requires ‘plain and intelligible’ language (authors’ emphasis). It is unclear whether the textual difference, which seems to make ‘plain’ and ‘intelligible’ separate criteria to be satisfied, will make any difference to the interpretation of the concept by the UK courts. This question arises particularly post-Brexit, where it appears that recourse to the CJEU to ensure consistency of interpretation may not be possible.Footnote 43 However, it does raise the possibility that separate tests and methodologies may be appropriate for determining whether a contract is plain and whether it is intelligible.
The concept of plain and intelligible language has been subject to a variety of interpretations. The CJEU has held that the requirement should be interpreted broadly, particularly because of the role it plays in excluding core terms from scrutiny.Footnote 44 Therefore, linguistic assessment ‘cannot be reduced merely to [a contract] being formally and grammatically intelligible’,Footnote 45 although a contract which fails to meet this test may not be seen as sufficiently ‘plain’. The assessment of language must also take into account whether the average consumer would be able to understand ‘potentially significant economic consequences’ of the term of the contract.Footnote 46 The requirement of plain and intelligible language therefore requires drafters to ensure that the effect of the term is communicated to the consumer. If the drafter fails to carry out this task the core term can be subject to scrutiny using the fairness test.Footnote 47
The mechanism for testing contracts for plain and intelligible language is underdeveloped. Courts have tended to make evaluative decisions based on the reading of the terms in their contractual context. In Office of Fair Trading v Foxtons,Footnote 48 it was held that ‘just because a highly skilled lawyer can find (or contrive) some equivocation in a word, that does not make the language lacking in plainness or intelligibility’Footnote 49 and that the assessment ‘does not require an absolute and pedantic rigour’.Footnote 50 However, a contract would not comply where it uses ‘broad terms of uncertain meaning… [w]ithout some form of definition’.Footnote 51 This guidance leaves the determination to the judge, without fully fleshing out what is meant by plain and intelligible, particularly for the average consumer. In Office of Fair Trading v Ashbourne Management Services a term is seen as plain and intelligible if a consumer who reads the agreements ‘reasonably carefully’ can understand the effect of the terms.Footnote 52
In Office of Fair Trading v Abbey National Andrew Smith J decided that terms must be ‘sufficiently clear to enable the typical consumer to have a proper understanding of them for sensible and practical purposes’.Footnote 53 This does not explain how the assessment of this test is to be carried out. It is clear that non-contractual documentation available prior to the conclusion of the contract can be taken into account in determining whether a term was intelligible.Footnote 54 This has particular importance in complex financial transactions. Further, layout can be relevant to the assessment, with ‘useful headings and appropriate use of bold print’ making a contribution to the intelligibility of a document.Footnote 55
The role of the average consumer in the assessment of plain and intelligible language is under-explored.Footnote 56 Whilst the average consumer is specifically mentioned in the test of prominence,Footnote 57 it is not referred to in either the Unfair Terms Directive or the CRA 2015 when assessing plain and intelligible language.Footnote 58 At first instance in Abbey National, Andrew Smith J held ‘whether terms are in plain intelligible language is to be considered from the point of view of the … average consumer. The… “average consumer … who is reasonably well informed and reasonably observant and circumspect” … provides an appropriate yardstick guide to whether a term is in plain intelligible language’.Footnote 59
However, the identity of the average consumer is not explored in the cases, with little attention given to the characteristics of such a consumer in the unfair terms context. In Foxtons there was little ‘evidence or other material to assist [the judge] in determining the mindset, thinking or attributes of a typical consumer’ and therefore assessment would be made ‘on an analogous footing to that on which the court approaches the attributes of the reasonable man in other realms’.Footnote 60 It is likely that the average consumer for whom language is assessed will be the average consumer in the targeted group. So if an insurance contract is targeted at a particular group of individuals, when assessing plain and intelligible language the average consumer will be the average consumer of that group.Footnote 61 That the targeted standard applies in unfair terms cases is made clear in Ashbourne Management Services, where Kitchin J held that ‘[t]he question whether a particular term is expressed in plain intelligible language must be considered from the perspective of an average consumer. Here such a consumer is a member of the public interested in using a gym club which is not a high end facility and who may be attracted by the relatively low monthly subscriptions’.Footnote 62
Even if the average consumer is the benchmark against which plain and intelligible language is assessed, in contrast to the Consumer Protection from Unfair Trading Regulations 2008 (CPUTR 2008), no consideration is given to the proper consumer standard if a clearly identifiable group of consumers that is particularly vulnerable is foreseeably likely to enter into the contract.Footnote 63 The CPUTR 2008 provide that where a vulnerable group is ‘particularly vulnerable to the practice or the underlying product… in a way which the trader could reasonably be expected to foresee’ then ‘reference to the average consumer shall be read as referring to the average member of that group’.Footnote 64 The unfair terms case law does not seem to allow consumer vulnerabilities to be taken into account where the vulnerable group is not specifically targeted. The cases repeated reference to the ‘average consumer’Footnote 65 or the ‘typical consumer’Footnote 66 seem to suggest that a standardised approach will be adopted, without taking into account the reading skills of vulnerable consumers, even where they are foreseeably likely to enter into contracts on those terms.Footnote 67
In the light of the failure of the UK courts to flesh out the concept, the Competition and Markets Authority has offered an account of the characteristics of a plain and intelligible document.Footnote 68 This guidance is, amongst other things, intended to assist business in drafting contracts, protecting their core terms from scrutiny and ensuring consumers can make an informed choice.Footnote 69 First, the document should be jargon free, and should ‘as far as possible use ordinary words in their normal sense’. This will not prevent the use of technical language where the meaning of the language is clear to the consumer.Footnote 70 However, such statements raise the question of who the consumer is, what is clear to them, and whether some technical language can ever be clear. Secondly, the document should be unambiguous, meaning ‘clear and not open to misinterpretation or differing interpretations’; thirdly, it should be reader-friendly, including ‘organised so as to be easily understood (using, for example, short sentences and subheadings)’; fourthly, it must be comprehensible with ‘the meaning of the words or concepts used, as well as the reasons for them… explained if they are not capable of being readily understood by consumers’; fifthly, it should be informative, so that ‘a consumer should, on the basis of the information provided – if necessary in pre-contractual literature – be able to foresee and evaluate the consequences of all wording used’;Footnote 71 and sixthly, accompanied by pre-contractual literature as necessary, for example ‘if, for instance, the contract is complex or lengthy’.
Whilst providing a useful typology, and whilst such a multifactorial approach is useful allowing decision-makers to take into account various different matters in judging whether language is plain and/or intelligible, the concepts used in the typology suffer from a lack of theoretical or empirical grounding. They do not, by themselves, provide a simple way of measuring whether a clause or a contract is transparent, but instead provide a series of matters that a decision-maker may take into account in making an evaluative judgment about whether a contract is expressed in plain and intelligible language. It is accepted that providing a simple measure is a difficult task, but it is one that should be undertaken in an attempt to ensure that the important concept of plain and intelligible language is sufficiently certain. If not, there may be uncertainty for businesses, who cannot easily judge whether their contracts are compliant, and difficulty for consumers, who are unable to easily decide whether the core terms are challengeable or not, particularly in circumstances where they are not legally advised. Further, it is more expensive to conduct a multifactorial approach then to have a simple metric that can determine whether a contract is expressed in plain and intelligible language. Reading scores, examined in Parts 2 to 4, are an attempt to simplify the assessment of contractual language and reduce the time and expense of a multifactorial approach.
The problems caused by this broad multifactorial approach are exacerbated by the general requirement of plain and intelligible language imposed in the CRA 2015. When assessing the applicability of the core terms exception, the linguistic examination is limited to terms governing price and subject matter. However, s 68 provides that a trader must ensure that a ‘written term of a consumer contract… is transparent’. Therefore, all terms in consumer contracts must be expressed in plain and intelligible language. It is questionable whether the concept of plain and intelligible language is subject to the same test under s 68 as that set out in core terms exemption. Boilerplate terms may not have ‘significant economic consequences’,Footnote 72 but may significantly affect the rights and obligations of the parties to the contract.Footnote 73 It is, therefore, necessary to examine whether the term as written allows the contracting parties to understand this.
Where a contract is not transparent, regulators may take action under CRA 2015, Sch 3 and apply for an injunction (or accept an undertaking) that prevents a trader using that term. The injunction jurisdiction is necessary as ‘one cannot think of a more expensive or frustrating course than to seek to regulate… “contract” quality through repeated lawsuits against inventive “wrongdoers”’,Footnote 74 and consumers either cannot or will not take private law actions.Footnote 75 By giving power to regulators to take enforcement action to prevent the use of unclear contracts, regulation seeks to provide a better mechanism to improve the quality of contracts compared to consumer civil actions.Footnote 76 However, in times of austerity,Footnote 77 the enforcement of the linguistic requirements is likely to be limited. Regulators will only take action in high impact cases.Footnote 78 A compliance-focused, responsive regulation,Footnote 79 approach is likely to be adopted, with engagement with the trader in order to co-operatively lead to changed, plain and intelligible, terms. If a simple test could be developed, this would allow regulators to engage with lack of compliance on a more regular basis, providing a simple starting point to negotiated changes to the terms used by businesses.Footnote 80
In an attempt to develop a better understanding of the concept of plain and intelligible language, we considered whether reading scores could provide an accurate test of whether a document was expressed in plain and intelligible language. If this could be determined by a reading score it would be extremely helpful to consumers, regulators and traders, as documents could be scrutinised in an efficient and cost-effective manner using a simple computer program.Footnote 81 We chose to examine the exclusions in consumer insurance contracts. Whilst these exclusions would initially appear to be subject a fairness test and a s 68 assessment, in consumer insurance contracts these clauses are core terms as they define the subject matter (the scope of the insurers’ risk) and the price (as they contribute to ‘calculating the premium paid by the consumer’).Footnote 82 Therefore, in the rest of this paper we are considering the contribution that reading scores can make to the assessment of core terms, although, where necessary, we discuss the potential of reading score in assessing whether boilerplate is expressed in plain and intelligible language. Our findings are set out in the following sections.
2. The development of reading scores
Reading scores attempt to quantify the ease with which readers are able to read and comprehend written texts.Footnote 83 Such scores have long been used to assess a variety of texts. They seek to provide a simple measure of how readable a piece of text is, in order that a writer can assess and amend the text to make it understandable to readers.Footnote 84
In one of the earliest attempts, Thorndike compiled a list of 10,000 words occurring in general literature by frequency of use, suggesting that the readability of written texts could be determined mathematically.Footnote 85 Thorndike's list served as partial basis for one of the earliest readability formulae, published by Vogel and Washburne, known as the Winnetka formula.Footnote 86 It considered factors such as the number of different words per 1000 words, the number of uncommon words (words not on Thorndike's list) per 1000 words, the number of prepositions per 1000 words, and the number of simple sentences in 75 successive sentences. Readability scores were computed for passages from 700 books and validated against children's paragraph-meaning scores (a measure of reading comprehension of those passages).
Later work conducted by Waples and TylerFootnote 87 and OjemannFootnote 88 explored other factors beyond word frequency that may influence readability, and the following three decades saw the development of a number of readability measures whose subsequent revisions are still in widespread use today. Flesch developed the Flesch reading-ease score (FRES),Footnote 89 which assigns texts a numerical score between 0 and 100, with lower scores indicating more difficult texts. The formula uses the ratio between the total number of words and the total number of sentences, and the ratio between the total number of syllables and the total number of words. It is calculated as follows:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20190812153057241-0280:S0261387518000259:S0261387518000259_eqnU1.gif?pub-status=live)
The formula was subsequently revised with different weights to produce a score interpretable as the target US grade level of the text, or the number of years of formal schooling required to understand its content.Footnote 90 The Flesch-Kincaid (F-K) grade level is computed as follows:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20190812153057241-0280:S0261387518000259:S0261387518000259_eqnU2.gif?pub-status=live)
Another popular formula, the Gunning FOG Index, was created by Robert Gunning,Footnote 91 and is computed as follows:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20190812153057241-0280:S0261387518000259:S0261387518000259_eqnU3.gif?pub-status=live)
where ‘complex words’ are defined as words containing three or more syllables. McLaughlin subsequently formulated an alternative to the FOG Index,Footnote 92 called the SMOG Index, which produced a grade level estimate using the following formula:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20190812153057241-0280:S0261387518000259:S0261387518000259_eqnU4.gif?pub-status=live)
The measures described thus far include the number of syllables as a factor. However, the syllable structure of English is quite complex and varied, making automation of syllable counting problematic. To address this, two measures were developed to expedite the automation of readability computations, and were based on the number of characters (letters) per word instead of syllables. These are the Automated Readability Index (ARI)Footnote 93 and the Coleman-Liau Index (CLI)Footnote 94:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20190812153057241-0280:S0261387518000259:S0261387518000259_eqnU5.gif?pub-status=live)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20190812153057241-0280:S0261387518000259:S0261387518000259_eqnU6.gif?pub-status=live)
where L is the average number of characters per 100 words, and S the average number of sentences per 100 words.
Considerable debate exists as to the relative merits of formulae based on characters versus syllables, but ARI and CLI owe much of their popularity to the relative ease and reliability with which computer programs can compute the number of characters of English words as opposed to the number of syllables they contain. Syllable-based measures computed by different analytical tools can produce more or less significant discrepancies depending on the particular syllable parsers (computer programs that identify the syllables in the words assessed, in order that the number of syllables can take their place in the formula) employed and the specific assumptions these make. For example, some syllable parsers might treat ‘fine-tuning’ as two words, and others as one. These differences during parsing can have a noticeable impact on the grade level estimate. Marchand, Adsett and Damper provide a useful overview and comparison of different rule-based and data-driven syllabification algorithms.Footnote 95 It is important to note that there is no uniform methodology for parsing, and therefore a clear risk that different parsers come to different outcomes.
Beyond those examined in this section, hundreds of different readability measures have been formulated (for English and for other languages), and a review of all of them is beyond the purview of this paper. Despite the large number of readability measures, all the formulae described in this section remain in widespread use. One of the uses that the reading scores have been put to is examining legal documents. Before using these formula to examine sample contracts, it is useful to consider the academic and legislative uses of reading scores to scrutinise legal language, and particularly contracts.
3. The use of reading scores in assessing legal language
Reading scores are increasingly being used in legal scholarship and legal policy, often as an aid to challenges to ‘legalese’, which is seen as too arcane and difficult to understand.Footnote 96 The linguistic requirement in the CRA 2015 seeks to challenge such language. Therefore, it is natural to consider whether reading scores can help the requirement of plain and intelligible language achieve its policy goal.
Academically, a number of studies have examined legal, or quasi-legal, documents for readability. Reading scores have been used to examine tax legislation in AustraliaFootnote 97 and New Zealand.Footnote 98 Sutherland used the Flesch Reading Ease and Flesch-Kincaid Grade Level formulae to examine collective bargained agreements in Australia,Footnote 99 using the results to assess whether the policy goal of ensuring ‘simple agreements’ had been achieved. Rogers et al have used reading scores to analyse the readability of MirandaFootnote 100 warnings given to suspects in the USA.Footnote 101
In the UK, reading scores have been used to examine consumer contracts, including those on the internet.Footnote 102 Linsley and Lawrence examined the risk disclosure section of annual reports of Public Listed Companies using Flesch Reading Ease scores, and found the disclosures were difficult or very difficult to understand.Footnote 103 On the policy front, in their examination of children's digital lives, the Growing Up Digital Taskforce used a Flesch-Kincaid reading score to evaluate the terms and conditions of Instagram.Footnote 104 The terms were found to be ‘difficult to read’ with ‘language and sentence structure that only a postgraduate could be expected to understand’.Footnote 105 Consumer groups have used reading scores in an attempt to encourage traders to redraft their contracts. For example, Fairer Finance found that ‘the average insurance document is only accessible to someone in the last year of Sixth Form College’.Footnote 106
The use of reading scores is embedded in some US legislation,Footnote 107 requiring contracts (including insurance contracts) to have a particular level of readability.Footnote 108 For example, in Texas a consumer banking contract which is not concluded on model contract provisions must meet prescribed Flesch-Kincaid reading scores, as calculated by Microsoft Word.Footnote 109 Similarly, the South Carolina code requires that loan contracts have a Flesch-Kincaid score of ‘no higher than seventh grade’.Footnote 110 The Montana code provides that an insurance policy cannot be issued in Montana unless ‘the text achieves a minimum score of 40 on the Flesch reading ease test’.Footnote 111 There have been some moves towards the use of reading scores in financial documentation in Canada, but legislative action has been stalled because French language reading scores are not felt to be sufficiently developed to provide an appropriate legislative benchmark.Footnote 112 Sirico is critical of these developments, arguing that the legislation requiring the use of reading scores does not protect consumers.Footnote 113 He further argues that the calculation of Flesch-Kincaid scores on Microsoft Word is not in accordance with the standard formula, as it fails to count syllables, and instead uses number of characters as a substitute.Footnote 114 This has particular implications for the Texas statute discussed above.
Given the trend towards using reading scores to assess contracts, it is timely to consider whether such scores could, and should, be used in the assessment of plain and intelligible language. In the next part we apply the reading scores considered in Part 2 to consumer insurance contracts in order to consider their utility in assessing the linguistic compliance of contracts.
4. Testing insurance contracts
Textual extracts of general exclusion clauses from seven different consumer travel insurance documents were scrutinised.Footnote 115 These seven insurance policies were selected randomly from a population of consumer travel insurance contracts selected by the authors using a price comparison website. The authors made four separate searches and harvested the policy wordings returned in response.Footnote 116 The four searches were as follows: a two-week break in Europe;Footnote 117 a two-week break outside Europe (excluding US and Canada);Footnote 118 a two-week break in the US;Footnote 119 and an annual travel insurance policy.Footnote 120 A number of policy wordings were returned multiple times in response to each of the searches, and where the wording of the exclusions was the same these were removed from the population. Each exclusion wording was assigned a number and seven were randomly selected. The policies analysed were all issued by different companies. The analysed sections ranged between 568 and 1747 words in length. Where necessary, texts were formatted to remove numbered/alphabetised lists and bullet points, but leaving the sentences otherwise intact (including numbers when embedded in sentences).
The goal of this examination was to achieve a better understanding of the utility of reading scores for assessing the concept of plain and intelligible language. We also attempted to test if the reading scores measured consumer understanding using questioning about the effect of the terms contained in the contract. Consumers were provided with the contracts and were given a series of 28 vignettes relating to losses incurred during a holiday,Footnote 121 and asked whether the insurance responded to the risk. Two example vignettes are set out in Figure 1. The answers given were then coded as correct or incorrect, with incorrect answers indicating consumer inability to understand the effects of the term. The findings are used to draw conclusions about the usefulness of reading scores in assessing whether language is plain and intelligible, and whether they should be used by courts, regulators and traders.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20190812153057241-0280:S0261387518000259:S0261387518000259_fig1g.gif?pub-status=live)
Figure 1. Example vignettes
Consumer insurance contracts were selected as the subjects of the study for a number of reasons. First, plain and intelligible language, and the protection from scrutiny under the CRA 2015 it affords, is particularly important in the insurance industry as both insuring clauses and exclusions are ‘core terms’.Footnote 122 Collins suggests that this broad conception of core terms ‘threatens to exempt insurance contracts from control by the back door’.Footnote 123 This means that the conceptualisation of plain and intelligible language is particularly important for consumers and insurers, as large parts of insurance contracts can be protected from substantive scrutiny by virtue of the core terms exception.Footnote 124 Therefore, a well-developed concept of plain and intelligible language is particularly necessary in this area.
Second, consumer insurance contracts are pervasive. Most consumers will hold insurance against some risks.Footnote 125 The wording of insurance contracts can have important implications for consumers’ entitlements. The harshness of the contractual position, that consumers are not entitled to an indemnity in the event that they fall within exclusions, has been to some extent ameliorated by the availability of the Financial Ombudsman Service,Footnote 126 who have shown a willingness to uphold some complaints about the application of an exclusions where a matter appears to fall strictly within the policy wording.Footnote 127 However, it is still the case that consumers can often find themselves without compensation after suffering injury due to the terms of an exclusion in an insurance contract. Further, travel insurance is likely to be the most complex financial product that consumers buy on a year-to-year basis, and therefore provides a useful case study for considering reading scores.
We subjected each extract to analysis using a number of different reading score formulae. We computed the readability measures described above (F-K, FOG, SMOG, ARI, and CLI) for our seven extracts. The readability measures were computed for each text using six different analytical tools/calculators. These included the koRpus R package for text analytics (version 0.06-5);Footnote 128 using the English version of the TreeTagger parser (version 3.2.1);Footnote 129 an online implementation of koRpus; and four other freely available online tools capable of computing the required readability measures.
There are differences in the reading scores returned by the different indicators. This can be seen in Table 1, which shows the different scores returned by the different indicators using each different method of calculating the scores for one of the insurance documents examined during the project. The reading scores calculated vary from around 13 years of education using CLI to almost 20 of education using FOG. As seen above in Part 2, each indicator is examining different characteristics of the contract, and therefore each provides potentially valuable information on readability. By choosing one indicator, the complexity measured by other indicators is lost, and therefore documents that may be challenging in a way captured by a particular score will not be identified if a different reading score is chosen as the measure of plain and intelligible language.Footnote 130
Table 1. Reading score in years of education by formulae and calculator for travel insurance contract 1
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20190812153057241-0280:S0261387518000259:S0261387518000259_tab1.gif?pub-status=live)
Further, it was noticed that there were differences between the scores calculated by different calculators. For example, the calculations of FOG have a range for years of education that differs by 4.59. This is due to potential (and not necessarily transparent) differences between parsing algorithms for syllables used by different analytical calculators. This is potentially problematic for regulators, businesses and consumers, as although a reading score may be perceived to be fixed, it clearly is not. This means that the legality of contract terms, or a decision to take enforcement action, may be dependent on which calculator is used. This is particularly problematic if a stakeholder uses a reading score as a proxy for plain and intelligible language, where the same document can appear to be transparent when using one calculator, but not if another is used. One solution is to be prescriptive as to the methodology by which the score will be calculated, favouring one calculator and/or method of calculation.Footnote 131 However, such a prescription has the potential to embed problematic calculation errors within the concept of plain and intelligible language, with the score produced by a chosen calculator leading to a conclusion that a document does or does not reach a set reading score threshold that is used to determine that it is plain and/or intelligible, and therefore is or is not transparent.
Therefore, it is argued that a better approach would be to pool grade-level estimates produced by different measures and calculators together to produce a consensus as to the readability of each text under analysis. Pooling these estimates would produce a single Grand Weighted Mean grade level for each text under investigation. We suggest that such an approach has the potential to eliminate some of the measurement issues that arise in the legislative and regulatory use of reading scores. Such an approach has not been used by stakeholders, who often rely on one formula or one calculator or both. In order to produce the Grand Weighted Mean, each calculator was used to produce all five of the readability measures (ARI, CLI, FK, FOG, SMOG) for each text. Then, for each text, grand means were computed for each measure across the different calculators, and for each calculator across different measures. Greater weight was assigned to those measures and calculators providing more consistent results and less weight to those yielding greater variance. The process was repeated to obtain a single Grand Weighted Mean grade-level estimate of readability for each text. This is a novel approach, taking publically available tools and utilising them, whilst attempting to resolve inconsistencies in calculation. This approach has high utility for users such as regulators, business and consumers, who are not equipped to engage in ad hoc calculation of reading scores or to create bespoke ‘better’ reading score methodologies,Footnote 132 but who wish to utilise widely available tools to assess contractual language.
The Grand Weighted Means are set out in Table 2. Testing indicates that more than 19 years of formal education (beyond a Master's Degree) is required to understand Legal Text 2, while Legal Text 5 needs just under 14 years (second year of university). Despite Text 5 achieving the lowest reading score, none of our texts appear to be particularly plain and intelligible. Unless the average consumer (if involved in the linguistic analysis) were conceptualised as having the reading ability of at least a second year university student then it would appear that, in a system that assessed compliance by utilising reading scores alone, then all of our insurance texts would be seen as lacking transparency, and therefore subject to scrutiny under the fairness test. Rather than Collins’ concern of ‘exempt[ing] insurance contracts from control’,Footnote 133 such a test of plain and intelligible language has the potential to bring exclusions within the fairness regime, seemingly contrary to the intention of the drafters of the Unfair Terms Directive.Footnote 134 It may be that analysis at clause level would produce some clauses that achieve acceptable reading scores, whilst others do not, but such a close focus is likely to reduce any efficiency gains of using reading scores. Therefore, it appears that reading scores may not provide an appropriate method for assessing plain and intelligible language, at least for core terms in insurance contracts, as they may bring terms within the fairness test which were intended to be excluded. However, it may be that exclusions drafted like those in our sample should be subject to the test for fairness, in order to encourage more transparent drafting and protect the consumer.
Table 2. Mean years of education required to comprehend each contract
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20190812153057241-0280:S0261387518000259:S0261387518000259_tab2.gif?pub-status=live)
Following the application of the reading scores it was necessary to consider whether reading scores reflect the intelligibility of the contracts. One way of doing this is to consider whether consumers understand the terms of contracts better when the reading score is more favourable. A study that looks at intelligibility is Davis.Footnote 135 In one experiment he examined the ability of consumers to comprehend a contract. The contract was either an unchanged version of a consumer credit contract or was a redrafted version, simplified (reducing unnecessary clauses to reduce the possibility of information overloadFootnote 136) and amended for readability. The second group, who read the redrafted contract, scored 26% better on the test of their understanding. Vulnerable consumers (young and/or poor and/or African-American and/or inexperienced) showed the greatest improvement in their score when using the redrafted contract. Further experimentation showed that both simplification and redrafting for readability were necessary to achieve the increased understanding.
Further studies on comprehension have found that contractual simplification has a positive effect for consumers. Masson and WaldronFootnote 137 redrafted a contract by removing redundant or archaic terms, simplifying words and sentence structures and defining or simplifying legal terms. Comprehension, as measured by paraphrasing and question-answering tasks, was reliably and significantly enhanced by the use of simplified words and sentence structure. However, absolute levels of comprehension were still low. Further, defining and or simplifying legal terms did not have a significant impact on comprehension.Footnote 138
Using the vignettes to assess comprehension of the consequences of the terms, we found that lower reading scores did not necessarily translate to improved understanding. There was no significant correlation between reading score and correct applications of the contractual provisions to the vignettes. In all the contracts, our consumers did not, in general, understand the effect of the terms that they read. This may suggest that reading scores are not appropriate for operationalising ‘intelligibility’. However, a caveat must be advanced. None of the contracts examined had a particularly low reading level. All required a post-16 educational level. Therefore, it may be the case that none of the contracts were sufficiently intelligible, and if a lower reading score were achieved then an increase in intelligibility would be detected.Footnote 139
Conclusion: are reading scores useful for assessing contractual language
Reading scores are an attractive tool for assessing contractual readability. They are easy to operate and cheap, and the process of assessment can be automated. Reading scores have the potential to function as a regulatory tool, and can be used both by businesses in drafting contacts and regulators in assessing contract that have been drafted. However, they have both strengths and weaknesses, and it is important to bear these in mind when considering the potential utility of reading scores in assessing contractual language.
(a) The strengths of reading scores
Our empirical work suggests that reading scores can play some role in assessing plain and intelligible language, but cannot play a complete role. Our examination showed, perhaps unsurprisingly, that the general exclusions clauses of travel insurance contracts needed a high level of education to comprehend. This might suggest that the clauses contained in the general exclusions are vulnerable to fairness assessment as they are not expressed in sufficiently plain and intelligible language. Reading scores may, therefore, have a role to play in assessing contractual language.
The ease of conducting a reading scores assessment is an important factor in their favour. Reading score calculators are freely available, and computer programs can be written to enable the assessment of large numbers of contracts quickly. Evidence of readability is provided in an easily comparable format, and the results can be understood relatively easily.
However, when reading scores are used each different score formula will produce a different score for each contract. This is because of the different inputs that are taken into account. A trader may therefore be able to manipulate the reading score assigned to their contract through the choice of methodology. Therefore, if it were thought useful to use reading scores for the assessment of plain and intelligible language, it would be necessary to choose a particular reading score methodology. As has seen above, this has been done in the USA, with Flesch or Flesch-Kincaid scores favoured,Footnote 140 particularly because these are embedded within Microsoft Word.Footnote 141
If reading scores are to be used, a combination of scores should be used, rather than relying on a single score and a single calculator, which appears to be the case in some of the US jurisdictions. One such method is explored above, and produces the Grand Weighted Mean used in this paper. Combining scores has the benefit of smoothing some of the inconsistencies of the different methods of calculating reading scores, by taking all scores into account in making the calculation and producing a consensus as to the readability of each text under analysis. This approach is more likely to produce an operationalised measure of plain and intelligible language that assesses all dimensions of the concept.
(b) The weaknesses of reading scores
Nevertheless, further important caveats must be considered. First, as mentioned earlier, the measures that we applied to the extracts have been devised and adopted for various applications.Footnote 142 While the ones used in the present analysis tend to be the most widely used, and indeed, are often the indicators used in the assessment of contracts, that is not, in and of itself, evidence that they are necessarily the best predictors of ease of text processing in the context of contracts. A typical limitation of grade-level readability estimates is that they were designed and intended for the analysis of school-age texts.Footnote 143 For this reason, they might be less sensitive to differences in complexity between highly technical texts and less appropriate for the analysis of the textual content aimed at adults to which they are routinely applied. Therefore, reading scores may be providing information about readability that does not conceptualise plain and intelligible language for adult contracting parties.
Secondly, readability measures largely rely on assumptions based on surface-level features of texts, such as the number of different words or the frequency of rare words in a text, as well as the length and complexity of words and sentences. In so doing, they tend to ignore other factors such as overall text coherence,Footnote 144 and, perhaps more significantly, the content of the text itself.Footnote 145 For instance, a low-frequency or polysyllabic word will affect readability scores even though its meaning might be clearly and explicitly explained in the text, for example through a clear definition to which the reader is clearly signposted, and not pose a processing difficulty for the reader. Given this finding, reading scores appear to be better at assessing whether the contractual document is plain, rather than assessing whether it is intelligible. A document that has a low reading score may not explain the ‘potentially significant economic consequences’ of the term.Footnote 146 On the other hand, a term which initially appears difficult may be understandable because of the way that it is presented, taking into account factors such as definitions and layout.Footnote 147 Further, Pau and others, when examining the New Zealand tax code, found that changes from colons to semi-colons affected readability scores,Footnote 148 a change that is unlikely to affect intelligibility, and suggesting that reading scores could be manipulated without alterations to the text being made that benefit the consumer. Therefore, if reading scores are to be taken into account, they should be seen as one measure of the difficulty of the text, but this should be balanced against measures that assess whether a consumer could understand the concepts expressed by the contract.
This reflects Sirico's critique of use of reading scores in State-level legislation in the US.Footnote 149 He argues that the use of reading scores in the contractual context leads to overreliance on technology, with the score providing the determination that a contract is ‘fair’ and/or ‘plain and intelligible’ (to adopt the language of the CRA 2015) if it achieves a certain readability score. He argues that ‘common sense tells us that sometimes a sentence with few words and syllables can be difficult to read and a sentence with many words and syllables can be quite comprehensible’Footnote 150 and reliance solely on readability scores to protect consumers can therefore fail to achieve its goal. This reflects findings that reading scores are poor predictors of information gain,Footnote 151 meaning that they are not well equipped to assist with the determination of intelligibility.
Furthermore, readability measures computed on these bases conceptually divorce the reader's individual characteristics from the comprehension process and paint a relatively simplified picture of what is an otherwise highly interactive activity.Footnote 152 The average consumer does not feature in reading scores, and the vulnerable consumer certainly does not.Footnote 153 If the average consumer is to be the yardstick against which the language of terms is judged then simple reading score ares insufficient. It is necessary to, at least, consider the reading level of an average consumer. The Growing Up Digital Taskforce, looking at the understanding of terms and conditions by children, resolved this challenge by choosing a reading score matched to the level of education of an average 12–13 year old.Footnote 154 This choice is not fully explained, as the report mentions users of Instagram in the 8–11 years old range, although Instagram's terms require users to be over 13.Footnote 155 This option will not be available whether the consumer is vulnerable for reasons other than age, as the expected comprehension level of the vulnerable consumer will not be so easily calculable.
Further, it is necessary to explore what a being a reasonably well-informed, reasonable observant and reasonably circumspect consumer means in the contractual context. Even if a baseline of reading comprehension could be set, it is unlikely that a simple reading score can capture whether a reasonable consumer could understand a contract. Intelligibility, and being able to calculate the economic effects of a contract, goes beyond simply being able to read a contract. Indeed, it is possible that a contract, whilst understandable to a very young audience (as demonstrated by a low reading score), would not accurately communicate the effects of a contract. In some cases some level of complexity (although not excessive levels) may contribute to understanding of economic effects, despite increasing the reading score of a particular contract.
Further, reading scores cannot account for behavioural effects. A consumer may perceive the economic effects of a contract differently depending on how the effects are framed. The consumer's understanding of the effects of the contract will vary with the presentation of the terms, despite the contracts having the same or similar reading scores. Similarly, reading scores cannot account for the effects of layout on the understanding of consumers.
It seems therefore, that reading scores cannot provide determinative evidence that consumers can understand the effects of a contract. It may be very good evidence that a consumer cannot understand the effects. A reading score cannot be the sole evidence that a contract is not in plain, intelligible language. However, it may play a role in a multifactorial approach.
(c) What role for reading scores?
The weaknesses identified above do not mean that readability has no place in the process of composing and revising a text to maximise readability and comprehension; it does suggest that they should only represent one tool in the arsenal of a regulator, trader or lawyer. In other words, while readability measures can alert the writer of a text as to its relative complexity, they cannot identify specific processing and comprehension bottlenecks. They may be able to identify whether a contract is expressed in plain language, but they cannot assess its intelligibility. A high reading score may function as good evidence that a document does not meet the plain and intelligible language requirements of the CRA 2015, but should not be the only way that one assesses compliance.
Reading scores can, when used as part of a more comprehensive evaluation of a text, play a role in assessing plain and intelligible language. In particular, they may help businesses when redrafting contracts. If reading scores are to be used, the tendency for different measures and different calculators to rely on various assumptions points to the need for an analysis approach that can at least partly offset these biases. The use of a Grand Weighted Mean, as used in this paper, is to be encouraged. However, the issues identified in this paper suggest that reading scores alone should not be relied upon by business, regulators or courts in determining that contracts are transparent, and particularly determining that they can be understood by consumers. Assessing whether a contractual clause is expressed in plain and intelligible language requires the synergistic use of other methodologies that can more directly probe the processing and understanding of texts on the part of the reader. In this sense, a typical paradigm might involve the use of methods to evaluate the ways that consumers read textsFootnote 156 and comprehension questions to assess understanding and retention of information.Footnote 157 Measures of other factors included in the CMA 2015 guidance may also be taken into account. Such an approach is unlikely to provide the simple, cheap, answers that reading scores might, but is more likely to appropriately gauge both the plainness and intelligibility of text by being able to consider both how the average consumer reads, and how a particular consumer reads.
Author ORCIDs
Richard Hyde http://orcid.org/0000-0002-2576-3448; Kathy Conklin http://orcid.org/0000-0003-2347-8018; Fabio Parente http://orcid.org/0000-0003-4789-9511.