Introduction
Information is essential to inform patient choice and can aid in improving patients’ experiences. Patients require information about their diagnosis, treatment options and other issues surrounding their treatment if choices are to be informed and guidance followed. Health professionals have a responsibility to provide this information which can take a number of different forms. A common medium used to provide patients with information used widely within the National Health Service (NHS) and other institutions to supplement verbal information is that of printed information leaflets available within the departments and now ever more increasingly as online material via the internet. The internet has seen a significant increase in its use as a source of information with over 80% of the adult population using the internet in 2016 compared with ~35% in 2006. 1 The use of the internet is different at different life stages, however one of the main uses of the internet is that of finding healthcare information 1 , Reference Dutton and Blank 2 which both the employed (45%) and retired (39%) user groups are more effective at when compared with students (32%).Reference Dutton and Blank 2
Using information which can be read can offer several advantages over verbal communication as patients are frequently distressed and may not fully comprehend or remember information provided in a face to face meeting. It is also possible to review the information once provided, which allows the document to be used as a reference source throughout their treatment, giving patients a greater awareness of what to expect and allowing them to make more informed choices. Written information is a key element for the patient, therefore, it is important that information is easy to understand. The NHS information standard principles state that ‘each product is in plain language.’ 3 The National Cancer Patient Experience Survey 4 stated that of patients who received information about the type of cancer that they had 28% did not find the information easy to understand. This approximately corresponds with the findings of both the Organisation for Economic Co-operation and Development (OECD) reportReference Kuczera, Field and Windisch 5 and skills for life survey 2015. 6 The OECD report stated that ~9 million working aged adults in England (over a quarter of adults aged 16–65) have low literacy or numeracy skills or both. In terms of literacy, this would mean that they would have difficulty with simple written information. The Skills for Life survey reported that half of businesses stated that they were aware of problems in basic literacy among some of their employees, and more than 40% of businesses in the past year have had to provide remedial training in basic skills for at least some of their adult employees. In most countries, younger people (16–24 years of age) have better literacy levels than those nearing retirement, however, in the UK, figures for both groups are very similar.Reference Kuczera, Field and Windisch 5 Because of this and other factors that affect reading, the UK Government recommends a reading age of 9 for web-based materials. 7
Several key issues need to be considered when reviewing whether written information is effective. First, authors need to know what information is required by the patient group, and whether everything relevant is included and correct. Once the information has been quality assured, it is essential that the document is constructed in such a way as to be easily understood by the intended patient group. How effective any written communication will be is dependent on a number of factors, including how legible and readable the target patient group finds the document.
Legibility
It is a measure of how well the text can be viewed or read. Legibility does not consider how well the reader can interpret the information, but merely how apparent the written information is. Legibility can be affected by a number of factors such as structure and design of the written information and includes numerous factors such as, the font, contrast between the text and the background, the size of the font, letter and word spacing or how far away the screen is; it is also affected by age.Reference Wolfe, Dobres, Kosovicheva, Rosenholtz and Reimer 8
Readability
It can be defined as how easy the text is to comprehend due to the style of writing. A common way of measuring readability is to use readability formulae that were first developed in the USA in the early 20th century. Readability formulas are simple algorithms that aid in the objective comparison of text. Most readability formulas look at the complexity of the words used within the document (semantic) and the sentence length (syntactic) elements of the work.Reference Waller 9 , Reference Kang, Elhadad and Weng 10
The formulas have the advantage of being a quick method of predicting the approximate readability of a document with evidence suggesting that they are related to the speed of reading, the probability that the document will be read and the knowledge of content after the article has been read.Reference Janan and Wray 11 They are, however, not without fault and have been subject to much criticism. Although there are validation studies most readability formulae are not based on any particular theory of reading, but rather on observed correlations.Reference Crossley, Skalicky, Dascalu, McNamara and Kyle 12 However, the main criticisms are reserved for the fact they fail to account for many other factors known to affect readability, such as whether the material makes sense, the style of writing, prior knowledge of the reader, the appropriateness of the vocabulary or the design features that may hinder or help the reader.Reference Bailin and Grafstein 13 , Reference Doak, Doak and Root 14
The aim of this study was to look at literature aimed at patients undergoing radiotherapy, specifically to the information about treatment and advise on side-effects. Due to large variation of cancers, the study was limited to the Web pages offering advice on prostate and breast cancer aiding direct comparison of the documents. The rationale behind using these two cancers was that they represent the largest patient groups and the evidence collected could be compared with a similar study undertaken in 2010.Reference Flinton 15
Method
An online search was conducted using Google to source patient information pages linked to radiotherapy treatments of either the prostate or breast. In order to source patient information pages general search terms were used, for example, terms such as ‘breast,’ ‘prostate,’ ‘radiotherapy,’ ‘treatment’ and ‘information.’ Data collection of information ended in January 2016. Once suitable internet pages had been sourced the pages were copied into Microsoft Word files; the resulting documents were then cleaned; a process that involved removing headings, headers and footers, titles, copyright information and contact details. Also, any punctuation within the sentence such as, ‘thirty-two,’ or ‘C.T.’ were removed. This process was undertaken in order to improve the consistency of the readability statistics generated by each document, which may vary by up to two reading grades due to the way each different formula treats punctuation.Reference Zouh, Jeong and Green 16
Files were then imported into Readability Studio 2015, Oleander Software Ltd (Vandalia, OH, USA), for readability analysis. This study utilised well established formulae authors had previously used to calculate readability within healthcare settings: the four formulae were, Flesch–Kincaid which is the most commonly used test in healthcare documents (57·42%),Reference Wang, Miller, Schmitt and Wen 17 Simple Measure of Gobbledygook (SMOG) which WangReference Zouh, Jeong and Green 16 states appears best suited for healthcare applications and both the New Dale–Chall and Gunning FOG measures.
Readability Studio produces both the reading age and reading grade for all four tests. This article will utilise the reading grade, which corresponds to the US grade level of education, as this is the more commonly quoted figure used in publications. Conversion between reading grade and reading age is relatively straight forward with the reading age equalling the reading grade level +5Reference Johnson 18 , for example the average age of students in the 8th grade being 13 years old. Details of what the tests look for in a document to work out readability score can be found in Table 1. The fifth formula utilised, the Flesch Reading Ease scale does not report a reading age or grade, but rather grades text on a 0–100 scale with scores of 0–30 corresponding to very difficult; 30–50, difficult; 50–60, fairly difficult; 60–70, standard; 70–80, fairly easy; 80–90, easy; and 90–100, very easy. The Flesch Reading Ease scale was also included as again it is a commonly used test and it is commonly available, being included in many word processing software packages such as Microsoft Word.
Table 1 Semantic and syntactic measures used in readability formulae

a Note: The Flesch–Kincaid score is a conversion of the Flesch Reading Ease score.
Abbreviation: SMOG, Simple Measure of Gobbledygook.
Some studies have shown that an excessive use of passive voice construction can be associated with reduced levels of readability. Passive voice is when the focus of the sentence is on the action, not who or what is performing the action, and it is recommended that its use should be minimised to less than 10%. 19 The amount of passive voice used in the text is also calculated by the Readability Studio software and this metric was also recorded for each document.
Statistical analysis was performed using IBM SPSS 22, the significance level was set at 0·05.
Results
Readability
In total, 48 separate information sources were found and analysed, which were divided equally between breast and prostate information documents. The vast majority 43 (89·6%) of the information Websites were from hospitals posting advice online, 5 (10·4%) came from support groups such as Cancer Research UK.
The Flesch Reading Ease scale rated 26 (54%) of the articles as being of a standard level, 20 (42%) as difficult and 2 (4%) as very difficult with an average score for all documents of 59·9 (95% confidence interval=58·2–61·6). Collectively, when averaged using the four readability scales, 48 internet pages were found to be written at a mean reading grade of 9·39, standard deviation (SD) 0·52, with very comparable results for both prostate documents, mean 9·84, SD 0·75, and breast documents, mean 9·83, SD 0·67. The individual documents had an average reading grade of between 8·4 and 11·5. Figure 1 shows the article scores.

Figure 1 Reading grade of web documents downloaded.
There was considerable variation in length of the information, the mean length being 2,240 words with a range between 535 and 5,108. The length of the document had no correlation to the readability of the text r=−0·041, p=0·713.
Passive voice
Overall the mean use of passive voice was 12·4%, SD 4·5, ranging from 2·78 to 20·23% with only 15 documents (31·3%) having a passive voice being less than 10% of the time (Figure 2). There was a medium positive correlation between the use of passive voice in the documents and their readability score r=0·389, p=0·006 (Figure 3).

Figure 2 Passive voice usage in web documents downloaded.

Figure 3 Relationship between passive voice use and readability grade.
Comparison with data from 10 years ago
A sample of 85 patient information leaflets were collected in 2006 from 17 radiotherapy departments for readability analysis. The results were presented at the Radiotherapy in Practice conference and published in 2008.Reference Flinton 15 The sample examined included 20 breast and 15 prostate information leaflets. These leaflets were re-examined using Readability Studio using the same parameters as those used for the more recent documents to try and establish if the readability of the documents had changed over time. Like the documents sourced in the more recent data collection, there was a considerable range of document size ranging between 332 and 4,264 with a mean of 1,287 words.
The overall average scores for readability were 9·98 in 2006 compared with 9·39 in 2016. This change in readability scores was significant, t=3·063, p=0·03. An improvement in the use of passive voice was also observed decreasing from 15·01% in 2006 to 12·64% in 2016 although this difference did not reach significance, t=1·629, p=0·107 (Figures 4 and 5).

Figure 4 Readability of 2006 documents compared with 2016 documents.

Figure 5 Comparison in the use of passive voice in 2006 and 2016.
Discussion
The internet provides anonymous, convenient access to a wealth of healthcare information with an ever-increasing reliance on the web for health information.Reference Blair 20 , Reference Penson, Benson, Parles, Chabner and Lynch 21 Over 70% of patients report that the information they collect from the internet influences their treatment decisions.Reference Hansberry, Agarwal and Baker 22 In order to ensure the information is readable by their target audience, the American Medical Association and the National Institutes of Health recommend that patient education resources should be written at no higher than the 6th-grade level. The International Patient Decision Aid Standards Collaboration 23 recommends health documents be written at grade 8 equivalent or less and even go so far as to recommend which readability test be used, SMOG or Fry.Reference Smith 24 The average reading grade of the documents in this study was 9·39, which is slightly higher than that recommended, by the above institutions and well above the target figure of the reading age of a 9 year old (grade 4) suggested by the Government. 7
No document was written at a reading grade of less than 8, but four documents (8·3%) did have a reading grade between 8 and 9. The reading grade reported here although higher than that recommended, is better than that found in many other studies. One study looking at patient education materials available on the European Society of Radiology Website found articles to have a mean grade level of 13·0±1·6 with a range from 10·8 to 17·2.Reference Hansberry, Ann John, John, Agarwal, Gonzales and Baker 25 Similarly, three other studiesReference Fitzsimmons, Michael, Hulley and Scott 26 – Reference Weiss, Vargas, Ho., Chuang, Weiss and Lee 28 found that compliance to the guidelines were poor. FitzsimmonsReference Fitzsimmons, Michael, Hulley and Scott 26 found 60–89% of online Parkinson’s disease information Web pages were written above the 12th-grade level, and TrivediReference Trivedi, Trivedi and Hannan 27 noted or reported that medicinal labels had an average reading age of a 16 year old (reading grade 11). Finally WeissReference Weiss, Vargas, Ho., Chuang, Weiss and Lee 28 who looked at online lung cancer information found an average reading grade of 11·2. No material from any study complied with the American maximum recommended 6th-grade level. A further study looking at readability of laryngeal cancer patient information leafletsReference Narwani, Nalamada, Lee, Kothari and Lakhani 29 and reported a Flesch Reading Ease score of 48·2 difficult. The results in the current study were again better, the mean score reported being 59·9 fairly difficult, which was very close to the cut-off point of them being classed as a standard document or plain English which starts at 60.
There was no significant relationship between document length and readability, suggesting that reading difficulty is independent of the word count.
Despite the fact that the online information did not reach the recommended reading grade, it must be acknowledged that there had been a significant improvement in the readability of patient information in respect to the articles examined in 2006.Reference Flinton 15
One issue with readability formulae is that they ignore vocabulary and tend to assume a strong negative correlation between word length and readability. For example, ‘iff’ meaning ‘if and only if’ only has one syllable and therefore has a low semantic measure in many of the readability scales outlined in Table 1, despite many people probably not knowing what the word means, whereas ‘wheelbarrow’ would give a larger semantic measure despite most people being able to read it and understand what it means. Our analysis utilised four readability scales to determine the level of readability of the text to try and overcome the limitations present in the metric of any one formula. A number of different formulae could have been used, Readability Studio being able to produce readability outputs for over 30 such tests. The justification outlined earlier was the common use of these tests in other readability publications in healthcare settings, however, a different selection of tests would have resulted in a slightly different readability score. Finally writing that utilises technical language does tend to have a slightly raised level of difficultyReference Janan and Wray 30 and technical words such as radiotherapy and physiotherapy could raise the readability level of the document and it is difficult to think of alternatives to make the writing easier, however, the use of supraclavicular, and telangiectasia, terms present in seven documents examined as part of this study would be definite cases where alternative, simpler terms, could and should have been used.
The average use of passive voice within the documents was 12·4%, which was in keeping with that found by Harwood and HarrisonReference Harwood and Harrison 31 (12·8%), but higher than that found by PothierReference Pothier, Day, Harris and Pothier 32 who reported average passive voice figures of 8·31%. The figure is also slightly higher than that recommended by the Plain English Campaign of 10%. As can be seen in Figure 2, there is considerable variance in the amount of passive voice used, with many documents falling into the acceptable category (48%). The information present on Web pages will probably have been written by healthcare professionals with degrees, where the use of the third person and passive voice is promoted in many forms of written work. The moderation of passive voice within the documents could lead to a further improvement in the readability level of many of the documents and comparison with the 2006 data did show a marked, but non-significant reduction in the use of passive voice so perhaps this is an area of writing style that is improving.
Conclusions
The results of this study revealed a difference among Websites of both reading level and the levels of use of the passive voice with most online material being written above the proposed target reading grade of 8, however the results were noticeably better than those obtained in 2006 and that reported in other publications of online material. Some authors could improve their writing style and improve the readability of the documents produced, by more careful selection of terms and by restricting the use of passive voice, the level of which was very variable in the documents. The correlation between the two indices, reading age and passive voice probably reflects the increased syntactic difficulty of the documents that tends to occur due to the longer sentence structure when writing in the passive voice and the benefit in terms of making the documents simpler to read can be gained from looking at this aspect of the writing.
Acknowledgements
None.