Readers' models of text structures: the case of academic articles
This item is not the definitive copy. Please use the following citation when referencing this material: Dillon, A. (1991) Readers' models of text structures. International Journal of Man-Machine Studies, 35, 913-925.
Hypertext is often described as a liberating technology, freeing readers and authors from the constraints of "linear" paper document formats. However there is little evidence to support such a claim and theoretical work in the text analysis domain suggests that readers form a mental representation of a paper document's structure that facilitates non-serial reading. The present paper examines this concept empirically for academic articles with a view to making recommendations for the design of a hypertext database. The results show that experienced journal readers do indeed possess such a generic representation and can use this to organise isolated pieces of text into a more meaningful whole.This representation holds for text presented on screens. Implications for hypertext document design are discussed.
With the advent of hypertext, issues associated with organising and structuring information presentation are being raised with respect to devising electronic documents. It is now possible to embody alternative structures for texts that have traditionally conformed to a relatively standard paper format. Typically, advocates of the "new structures" approach dismiss paper as limiting since it demands a linear format for presentation and consumption, contrasting this with the supposedly liberating characteristics of hypertext (see. e.g. Beeman et al 1987 )
It is debatable whether this is a fair representation of paper (or hypertext). Though paper texts may be said to have a physical linear format there is little evidence to suggest that readers are constrained by this or only read such texts in a straightforward start-to-finish manner. For example, Dillon et al (1989) identified three reading strategies in readers of academic journals, only one of which could be described as linear, and one has only to think of one's own interactions with a newspaper to demolish arguments of constrained linear access.
Apart from the putative constraints of paper texts or browser-friendly attributes of hypertexts, it seems certain that readers possess some form of mental representation for a document type that provides information on the likely structure and organisation of key elements within it. For example, when we pick up a book we immediately have access to a whole host of information about the likely contents, its size, subject matter and so forth. When we open it, we have expectations about what we will find inside the front cover, such as details of where and when it was published, perhaps a dedication, and then a Contents page. We know, for example, that contents listings describe the layout of the book in terms of chapters, proceeding from the front to the back. Chapters are organised around themes and an index at the back of the book, organised alphabetically, provides more specific information on where topics are located in the body of the text. Experienced readers know all this before even opening the text. It would strike us as odd if such structures were absent or their positions within the text were altered e.g., the contents page was at the back or in the middle, there were no chapter divisions or the index was not arranged alphabetically.
The same might be said of a newspaper. Typically we might expect a section on the previous day's political news at home, foreign coverage, market developments and so forth. News of sport will be grouped together in a distinct section and there will also be a section covering that evening's television and radio schedules. If this can be said to hold true for all established text forms, then developers of hypertext systems need to consider carefully their designs in terms of whether they support or violate such assumptions.
The relationship of structure to comprehension of prose has been examined empirically by van Dijk and Kintsch (1983) . They proposed a model of discourse comprehension that involves readers analysing the propositions of a text and forming a macropropositional hierarchy. According to this theory, readers acquire (through experience) schemata, which van Dijk and Kintsch term 'superstructures', that facilitate comprehension of material by allowing readers to predict the likely ordering and grouping of constituent elements of a body of text. To quote van Dijk (1980) :
Apart from categories and functional rules, van Dijk adds that a superstructure must be socioculturally accepted, learned, used and commented upon by most adult language users of a speech community.
They have applied this theory to several text types. For example, with respect to newspaper articles they describe a schema consisting of headlines and leads (which together provide a summary), major event categories each of which is placed within a context (actual or historical), and consequences. Depending on the type of newspaper (e.g., weekly as opposed to daily) we might expect elaborated commentaries and evaluations. Experiments by Kintsch and Yarborough (1982) showed that articles written in a way that adhered to this schema resulted in better grasp of the main ideas and subject matter (as assessed by written question answering) than ones which were re-organised to make them less schema conforming. However, when given a cloze test of the articles no difference was observed. The authors suggest that schematic structures are not particularly relevant as far as ability to remember specific details such as words is concerned (as measured by a cloze test) but have major importance at the (macropropositional) level of comprehension.
Such work has direct relevance to the human factors practitioner addressing issues in the design of hypertexts or multi-media documentation. If readers possess such representations and employ them in the manner outlined by van Dijk and Kintsch then it is important that we identify and use them to inform the design of hypertext versions of existing texts and to facilitate suitable superstructural abstractions for innovative hypermedia documents. The present paper describes two investigations carried out at the specification stage of the design of a hypertext database of academic journal articles that attempted to shed light on such issues for this text type.
2. Designing hypertext versions of academic articles
Theoretically, a full-text searchable database of academic articles would be a boon to researchers and academics. It should facilitate rapid access and retrieval of information as well as supporting literature searches and manipulations that would prove difficult or extremely time-consuming with paper. Previous investigations of the possibilities for electronic journals have identified the constraints of the technology such as the limited document structure, poor image quality and restricted manipulation facilities as obstacles (see e.g. the BLEND system, Shackel 1987 ). Such factors are now known to be vital to user acceptance of such systems (see e.g., Gould et al 1987a,b). A hypertext system, being implemented on better technology, should not only overcome the low level ergonomic problems inherent in older technology but also support innovative formats that paper could not.
At HUSAT (HUman Sciences and Advanced Technology, part of Loughborough University of Technology in the UK) we have been attempting to build an exemplar system for such users. Designed to run on an Apple Macintosh using available hypertext applications software, this system is currently in prototype form. The present studies were carried out by the author to inform the design team on the role of document structure in such a system.
3. Overview of the Experiments
The aims of this experimental work were to identify the extent to which readers' possessed a superstructural representation or model of a typical academic article and to examine how it might be affected by screen presentation.
3.1 Experiment 1
If readers possess a model of the how typical articles are structured then they should be able to use this to form whole articles out of isolated chunks of text. They might still be able to perform this without such a mental representation if headings and other cues in the text such as referential continuity are present. The present study examined this suggestion by presenting subjects with cut-up articles and requiring them to piece the articles together. To limit the influence of referential continuity cues, every second paragraph was removed and subjects performed this task on texts with and without the presence of headings.
Twelve subjects participated in this experiment (6 male, 6 female) Ages ranged from 21 to 35 (mean=29) years. All were professional researchers in the domain of human factors whose first degrees were in computer science, psychology, sociology or ergonomics. All were experienced in the use of academic articles.
Two articles were selected from one journal in the field of relevance to the researchers. These were matched approximately for size and number of paragraphs, presence of figures and tables, and conformance to the single experiment report style. Though roughly in the area of interest to the researchers they were also selected so as to be unlikely to have been read by these researchers. This was subsequently confirmed during the trial.
The rules for removing paragraphs were not formal. Where possible, every second paragraph was removed but if this left only very large or very small paragraphs (i.e. greater than 20 lines or less than 5 lines respectively) some adjustments were made and experimenter discretion was employed to retain comparability between texts. Every second table and figure was removed from each text. Selected paragraphs, headings, tables and figures were pasted to pieces of card to aid physical manipulation.
A repeated measures design was employed such that each subject assembled both texts, one with headings, one without. Order of texts and presence/absence of headings per text were counter-balanced to avoid any systematic order effects.
Subjects were run individually in an experimental room at HUSAT. The experimenter explained the task and answered any questions from the subjects. They were told to avoid reading every word in the text if possible and to concentrate on assembling an ordered article as quickly as they could. The text was presented in a jumbled order on the desktop.
After the first text had been assembled to the subject's satisfaction, subjects were asked to move to another desk and write down a brief summary of what they thought the article was about. This enabled the experimenter to score their performance on the first assembly task and prepare the second text. The instructions were then repeated and the subject proceeded to assemble this article. After completion the subject again went to the other desk and wrote a brief summary of their impressions of the article's content. Upon completion a brief discussion of the experiment ensued covering any points the subject wished to raise.
3.2 Results of Experiment 1
In the first instance data were scored by noting the relative position of each text chunk in a subject's assembled text and comparing it with its correct position. This gave a measure of the absolute accuracy of assembly. Not surprisingly, no subject manifested a high degree of absolute accuracy, mean rate was 16.7% i.e. approximately five correct placements per 30 paragraph task. A repeated measures t-test revealed no significant effect for headings (t=0.31, df=11, p>.7).
Despite the low levels of absolute accuracy it was clear that subjects were imposing a structure on the article of the form Introduction/Method/ Results/Discussion (hereafter referred to as the IMRD format) Indeed all subjects assembled the article around this format. Analysing their assemblies in these terms it was clear that much higher general accuracy levels were present (mean accuracy rate=82.58%). Table 1 presents the individual error scores for this broader analysis.
Table 1. Error scores per subject in broader classification
These data indicate that subjects can predict location of isolated paragraphs of text in their correct general sections with high levels of accuracy. Once more, the effect of headings was assessed using a related samples t-test and this revealed no significant difference (t=1.6, df=11, p >.1).
All times to completion were recorded in seconds. They are presented in table 2. Times in the no headings condition were slightly faster than those in the headings present condition. Both distributions were slightly negatively skewed (no headings=-0.22; headings=-0.97) indicating that the majority of extreme scores are below the mean in both conditions (i.e. some individuals were much faster than the majority). A related samples t-test between conditions showed an almost significant difference at the 5 per cent level (t=2.07, df=11, p<.065). However such a difference was expected as the no headings conditions always involved slightly fewer pieces of text than the headings present condition.
Error typesApart from the general accuracy levels observed, it was interesting to note the type of mistakes made by subjects in this task. Three basic errors can be identified in the present data:
The most obvious problem occurred with the secondary headings. Primary headings (Introduction etc) were easily placed but these are relatively standard, secondary headings tend to be unique to the article, reflecting the author's views of a section's contents. For example, a heading such as "The Effect of Display Size" might fit logically into the results section when read in context but taken as an isolated piece of text could easily be a heading in the Introduction or Discussion sections of an academic article.
Figures and tables posed problems in terms of absolute accuracy too, although subjects usually placed these in the correct section. This is not too difficult to explain; their occurrence in articles of this form is rare outside of the results section. Non-graph/numerical types might pose more of a problem but even they are unlikely to occur in Introduction/Discussion sections.
A common error was the confusion of Introduction and Discussion paragraphs. All subjects made this mistake at least once. In terms of the type of text usually found in these sections this is understandable. Both contain general text with references to other related work, a form atypical of other sections. Thus while it is easy to identify isolated paragraphs as belonging to these sections, it is less easy to distinguish correctly between them.
Awareness of text's contents
All subjects were required to describe briefly the contents of the text they had just assembled. Of the 12 subjects, 10 remarked that they had little memory of the text and had not read it for comprehension. As a result they claimed not to be able to write very much. While it is interesting that they could assemble the text without reading it for comprehension purposes, all subjects were capable of providing a rough sketch of the article. Typically they accurately reported the subject matter, that it was an experimental paper, the design or analysis, and its broad aims. In some cases parts of the results or their implications were grasped.
There were inaccuracies however and most of the written reports were in the form of keywords or short phrases suggesting little attempt to grasp the development of the argument within the text. This supports the claims of subjects not to have read for comprehension.
Conclusions from Experiment 1
It is clear from these findings that readers of academic articles possess some form of mental representation for the text's typical structure that allows them to accurately predict a paragraph's location. In the present case this representation seems to be of the form IMRD and a quickly read paragraph can be placed in this framework with approximately 80% accuracy. Problems occur with secondary headings, absolute placement of items within the framework and distinguishing between introduction and discussion text.
3.3 Experiment 2
It is clear from experiment 1 that readers do possess a model or mental representation of a text's structure independent of its semantic content. However, much work in this area of electronic text has shown that when text is presented on screen many of the findings from the paper domain cease to hold. The present study therefore set out to examine the ability of readers to predict location by applying the superstructural representation of an article to information on screen.
Eight subjects (4 male/4 female) participated in this study. Ages ranged from 21 to 41 (mean=32) years. As before, all were experienced users of academic articles. Three of the subjects had participated in the previous study but given the seven week break between the studies and the use of different texts and experimental procedure this was not seen as a source of contamination. All were habitual users of Apple Macintosh computers.
As before two similar articles conforming to the criteria described above were selected from a relevant journal. This time only text was selected (i.e., no figures, tables or headings were used), five paragraphs from each major section resulting in 20 paragraphs per text. These were presented in randomised order which was consistent between media.
A repeated measures design was employed with order of presentation (paper and screen) and text counterbalanced to avoid any systematic ordering effects.
Subjects were run in an experimental room at HUSAT. The experimenter explained that they had to read two series of 20 paragraphs and identify the probable location of each in terms of the major sections Introduction/ Method/Results/ Discussion. To do this they marked I, M,R,or D on an answering sheet provided. They were told to perform this task as fast as they could.
In the screen condition paragraphs were presented mid-screen as black text on a totally white background using HyperCard on an Apple Macintosh Plus. The only other information present was the number of the paragraph (1 to 20) in the top right corner and a "button" in the lower centre of screen facilitating movement to the next card. In the paper condition paragraphs were presented on 20 sheets of paper printed from this HyperCard stack, of similar size to the screen and stapled together in the top left corner. They contained identical information except for the "button".
Subjects were allowed to familiarise themselves with the task and the software (usage of which only required them to press the mouse button) using example texts and the experiment commenced when they expressed confidence with both. A rest period of approximately two minutes occurred between the two trials.
3.4 Results of Experiment 2
Time taken to complete each trial was recorded in seconds and these are shown in table 3:
Table 3. Time to complete tasks per condition
As this demonstrates, mean performance time with paper was faster than with screen presented text. A related samples t-test indicated that this difference was significant at the 2 per cent level (t=3.16, df=7, p<.02).
The number of errors made during each trial by each subject is shown in table 4. This demonstrates that the mean number of errors per subject is similar for each presentation medium although there is greater but non-significant variance among the scores in the screen condition (F=0.11, df =15, p>.7). Interestingly, six of the eight subjects performed better or as well with the electronic text suggesting a possible speed/accuracy trade-off. A related samples t-test showed no significant difference however (t=0.32, df=7,p>.7).
Table 4. Number of errors made by subjects per condition
Overall accuracy levels are similar to experiment 1, 81.55% for combined conditions, 80.6% for paper alone, 82.5% for screen alone, confirming the earlier finding that the ability to predict location on the basis of limited information is highly developed for experienced readers of this text type.
The absence of headings and figures/tables in the present study made quantifiable analysis of the error types easier and qualitative analysis less informative. Twelve possible errors could be made (No. of categories X No. of incorrect categories per item). In total, 59 errors were made. These are summarised in table 5.
Table 5. Error type and frequency expressed as a % of total errors
As before the greatest difficulty subjects had was distinguishing between the Introduction and Discussion sections, these accounting for almost 40% of errors. Inability to distinguish between the Results and Discussion sections accounted for 30% of errors while the Method and Results distinction proved the stumbling block in 17% of cases.
Conclusions from Experiment 2
Readers' models of this text type's structure allow them to predict accurately the general location of paragraphs even when presented on screen. Though significantly faster with paper there were no differences between the media in terms of accuracy. As before, greatest difficulties occurred in distinguishing between Introduction and Discussion sections.
4. General Discussion
It is clear from these findings that readers who are experienced in the use of a certain text type possess a superstructure or model of that text which enables them to predict with high levels of accuracy where information is located. In the case of the text type analysed here this superstructure is of the form: Introduction, Method, Results, Discussion and readers can place paragraphs correctly within this framework with approximately 80% accuracy under time pressure.
The existence of this superstructure probably results from the relatively standard form of such articles. There are few published accounts of experimental work in this (and other) disciplines that do not conform to this type. Obviously, frequent readers of this text type would acquire an awareness of such a form over time.
However, it is also worth noting that the classic IMRD structure acts as a framework for or model of the scientific process itself. Research usually takes the form of examining the current literature to formulate a hypothesis for investigation, designing an experimental procedure to test this hypothesis, gathering and analysing data, and finally examining the results in the light of other work. Each of these activities has its parallel in the resulting description i.e., the experimental report. Generations of undergraduates are taught this model of investigation and reportage so it is not surprising to find superstructures for this emerging. In a very real sense therefore, text structures can reflect conventions and standards of behaviour and cognition as argued by van Dijk (1980) and van Dijk and Kintsch (1983).
It must be recognised however that an alternative interpretation cannot be entirely ruled out. It is possible that the control on referential continuity was not strong enough and that the successful piecing together of articles in experiment 1 may have been helped by cues that could not be completely removed by using only every second paragraph. It is the author's contention that such factors did not play a significant role in subjects' performances, not least because of the subjects' reports that they did not read with the intention of seeking reference cues between paragraphs and the fact that when reading isolated paragraphs one at a time in a randomised order in experiment 2 (thus guarding against any reasonable use of such cues), general accuracy levels were equally high. However, further credence in the theory of supertructures would be gained if a control group of non-experienced article readers were examined and found to manifest significantly less accurate scores. Such a study would also overcome any possible demand effect that may have been present as a result of any subject's possible knowledge of van Dijk and Kintsch's work. This is a potential weakness of the present design that should be addressed in any future work.
Furthermore, the number of subjects used in these studies was relatively small. It is possible that some of the non-significant differences, particularly those concerning the effect of headings on ability to piece together an article (experiment 1) and the possible existence of a speed/accuracy trade-off (experiment 2) may hide real issues that would have been uncovered had more subjects been used. The reason for not using greater sample sizes in the present work stems from the demands on the design team for a prototype system and the subsequent pressing need for quick answers to questions on document structure. In an ideal world such constraints would not operate and large sample sizes could always be employed. While this is an area for improvement in subsequent research, the basic findings of the present studies do appear robust.
Regardless of any hypothetical cognitive representations underlying text usage, what is interesting from a human factors perspective is the high degree of accuracy shown by all subjects in these experiments. From a rapid scan of the available text they can deduce the most likely location of that part in the whole and by extension, what is likely to precede, accompany and follow it. The results of experiment 2 clearly demonstrate that this representation holds also for screen presented text.
If we are to consider seriously alternative structures for electronic or hypertext versions then we would need to overcome this acquired processing tendency of experienced readers. This is an all too unlikely occurrence given the embedded nature of this representational structure in the minds of readers, the teaching of the scientific process and the communication format of scientists.
Thus a hypertext journal article would need to retain the broad structure of the paper version if it is to be immediately usable. The superstructure or model should be used to enhance the reader's ability to navigate, reportedly a major problem for many hypertext users (see e.g. Edwards and Hardman 1989). For example, keeping the major headings and their standard order as the "backbone" of the text would facilitate rapid exploration of required sections and narrow the search space for information location. Combined with the rapid access facilities of hypertext, such a format could result in the development of an electronic text that would ideally suit several of the reading tasks common to this text type identified by Dillon et al (1989) such as rapid scanning of particular sections to get a feel for the article's contents or directly accessing specific details such as analysis method or experimental hypotheses that the reader knows are usually located in certain sections. Though it is entirely possible to present material in unique and innovative electronic forms it would seem on the basis of these results to make little sense to do so for academic journal articles. Experienced readers of this text type know its broad structure, where particular details will be found and how different sections relate to each other. Rendering such knowledge redundant in some misguided quest for new forms seems less a liberation than an abuse of technological advancement that should be avoided
One other interesting aspect of these studies that warrants discussion is the significant speed deficit of 17% for screen-presented text found in experiment 2.The issue of differences between reading from paper and screen has been examined for several years and it is clear that no one factor can account for the observed differences.Current research emphasises the role of perceptual, physical and cognitive factors in reading and it is clear that the reading task is also a major variable (McKnight et al 1991). Within the context of the present experiments two factors seem likely sources of the speed difference.
Firstly, when using the HyperCard stack, subjects manipulated the text by positioning the mouse on the "Next" button and pressing once. It is possible that between manipulating text in this manner and writing the answer by hand on paper more demands were being placed on the subject than when using paper. Given the usual habit of subjects to control a mouse with their preferred (i.e. writing) hand this might have slowed them down. Against this however it must be remembered that subjects needed only to position the mouse on the first card, after which each button depression left the cursor positioned correctly on the next card.
A second explanation is what might be termed the image quality hypothesis. According to current research, reading from screens is approximately 20-30% slower than reading from paper due to the poorer image quality of screens (Gould et al 1987a). The only screens that seem capable of matching the image quality of paper are very high resolution with black text on white backgrounds using anti-aliased characters (see e.g. Gould et al 1987b). In the present situation, the screen was a standard one, the text, though black on white, was not presented in an optimal screen font such as Geneva but in New York (10 point), which is a screen optimised version of a paper optimised font, Times. It is possible therefore that image quality was responsible for the speed deficit which in this case showed screens to be almost 20% slower. Against this argument however, it must be stated that the amount of text being read was very limited, and image quality effects should have been very subtle. It is probable that both explanations are contributory factors to the observed significant difference.
It is perhaps to be expected that the format of IMRD is very familiar to readers of academic articles. However, the main point of these studies was not to confirm this fact but to examine the extent to which the perception of structure influenced readers organisation of the text. The ease and speed with which these subjects arranged the material or predicted its location suggests that for this text type at least it is a very potent aid to organisation. Other text types are likely to have less clear superstructures and in these cases, alternative structures for hypertext versions should be investigated. What seems likely though is that readers do acquire some knowledge of structure for all texts, and that it increases with experience in using that text type. In use, it is likely to combine with spatial memory for layout (Rothkopf 1971 ) to form a mental map of the specific text being read, facilitating searching and browsing of the material. Such issues must be addressed by the designer of any text presentation system if usability is to be ensured.
In the present context these results convinced the design team that radically altering the structure of academic articles in the hypertext database was not a good idea. Instead it was decided to utilise the observed superstructure as a skeleton for grouping and presenting the articles on screen. The current prototype, implemented in GUIDE (TM), presents the user with a set of primary headings as closely related to IMRD as is relevant to the article in question. From here the user can "unfold" sections of interest, jump to particular sections and so forth, rapidly performing several of the reading behaviours identified in Dillon et al (1989) while also having the ability to "unfold" the complete text and have, if so desired, a complete "linear" version of the text. The database also supports the direct accessing of related material by following linked references. Each time a relevant reference is selected in the text of an article a full text copy of that article is presented according to the same IMRD framework of headings which in turn can be "unfolded" and manipulated, itself supporting further reference linking and so forth. This database is currently undergoing usability evaluation at HUSAT. Further details of the database and its design process can be found in McKnight et al (1991).
This work was funded by the British Library Research and Development Dept. as part of Project Quartet. The author is grateful for comments and criticisms of the studies and subsequent manuscript from Prof. Brian Shackel, Dr. Cliff McKnight and John Richardson at HUSAT The author would also like to thank two anonymous referees whose insightful comments led to clear improvements in the discussion section of this paper.
Beeman, W., Anderson, K., Bader, G., Larkin, J., McClard, A., McQuillan, M. and Shields, M. (1987) Hypertext and Pluralism: From lineal to non-lineal thinking. In: Proceedings of Hypertext'87. University of North Carolina. 67-88.
Dillon, A., Richardson, J. and McKnight, C. (1989) The human factors of journal usage and the design of electronic text, Interacting with Computers, 1,2, 183-189.
Edwards, D. and Hardman, L. (1989) "Lost in Hyperspace": Cognitive Mapping and Navigation in a Hypertext Environment. In R.McAleese (ed.) Hypertext:Theory into Practice, Oxford:Intellect
Gould,J.D., Alfaro, L. Barnes, V., Finn, R., Grischkowsky, N. and Minuto, A. (1987a) Reading is slower from CRT displays than from paper: Attempts to isolate a single variable explanation. Human Factors, 29(3), 269-299.
Gould, J.D., Alfaro, L., Finn, R., Haupt, B. and Minuto, A. (1987b) Reading from CRT displays can be as fast as reading from paper. Human Factors, 29(5), 497-517.
Kintsch, W. and Yarborough, J. (1982) The role of rhetorical structure in text comprehension. Journal of Educational Psychology, 74, 828-834.
McKnight, C., Dillon, A. and Richardson, J. (1991) Hypertext in Context Cambridge: Cambridge University Press.
Rothkopf, E. Z. (1971) Incidental memory for location of information in text. Journal of Verbal Learning and Verbal Behaviour, 10, 608-613.Shackel, B. (1987) An overview of research on electronic journals. In: G. Salvendy (ed.) Cognitive Engineering in the Design of Human Computer Interaction and Expert Systems pp193-206 Elsevier, Amsterdam.
van Dijk, T.A., (1980) Macrostructures. Hillsdale, NJ.: Lawrence Erlbaum Associates.
van Dijk, T.A. and Kintsch, W. (1983) Strategies of Discourse Comprehension, London:Academic Press
 A cloze test is a traditional comprehension test for readers that requires them to fill in the blanks within sentences taken from the text they have just read.