Visitor Studies examine the fit between different visitor-group segments and museum offerings (Klingler & Graft, Reference Klingler, Graft, Catlin-Legutko and Klingler2012). These studies were consolidated in the 1980s with the work of psychologists such as Chan Screven, Harris Shettell, and Ross Loomis (Bitgood, Reference Bitgood2011). These studies make part of a generalized design focus centered on users (Abras, Maloney-Krichmar, & Preece, Reference Abras, Maloney-Krichmar, Preece and Bainbridge2004) and focus on studying visitor behavior at exhibits using multiple levels of analysis (Asensio & Asenjo, Reference Asensio and Asenjo2011; Heath & Lehn, Reference Heath, Lehn and Parry2010). Understanding visitor processes and their interactions with exhibits requires to evaluate the exhibit on display and thus be able to predict and design future exhibits and new activities (Falk & Dierking, Reference Falk and Dierking2013).
Classical museums are based on exhibits made of elements that give precedence to a disciplinary discourse, normally encrypted at a high conceptual level with few communicative aims (Carbonell, Reference Carbonell2012; Knell et al., Reference Knell, Aronsson, Amundsen, Barnes, Burch, Carter and Kirwan2011). New exhibits, supposedly based on more-powerful communicative mediators, have raised the bar concerning concepts for the users to comprehend (MacDonald, Reference MacDonald2006; Weil, Reference Weil2002). Visitor Studies are increasingly more necessary (Daignault, Reference Daignault2011); evaluation and planning studies aspire to a minimum of 5% of the total project budget (Harlow, Reference Harlow2014).
The Serrell Study
Despite the great number of visitor studies, few exist that are accumulative or comparative. The museum evaluation field is a restricted area for publishing. It is not possible to find primary studies with the original data because the comprehensive evaluations have both positive and negative results. Those affect the institution image and they include confidentiality compromises. However, Serrell (Reference Serrell1998) published a study, Paying attention: Visitors and Museum Exhibition, where they got databases from more of one hundred studies directly from the evaluators in those studies. It offers a unique source to find quantitative references for comparing the results of new studies and for establishing quantitative goals of interventions in the field.
Our study takes as its starting point the report by Serrell (Reference Serrell1998). That study assesses five variables to compare the results for time spent and stops made in exhibition spaces in a broad set of United States museums. The authors generated two indexes for comparing data between studies. The first, Sweep Rate Index (SRI), is the total median time used by visitors, divided by the square footage of the exhibit. Because it divides time by space, it involves the inverse of velocity (Iv). The second, Percentage of Diligent Visitors (%DV), is the percentage of visitors who stop by at least half of the exhibit features. The authors analyzed the data with descriptive statistics, standard models of ANOVA and correlations.
Advantages of reanalyzing data with meta-analytic techniques
The term meta-analysis was coined Glass (Reference Glass1976) to refer to the methodology conceived for statistically analyzing and synthesizing an original set of statistical results from a sample of studies with a certain degree of homogeneity. The most significant components in developing the meta-analytical methodology were observed in the pioneering work of Glass in 1976 (Botella & Sánchez-Meca, Reference Botella and Sánchez-Meca2015), but main advances in statistical methods came from the contributions of Hedges and Olkin (Reference Hedges and Olkin1985).
Meta-analyses consider effect size (ES) values. When the ES is calculated for each study, the measures are transformed to standardized measures allowing direct comparison between studies. The ES quantifies the relevance of an effect obtained within a particular field of study, helping us clarify if a statistically significant effect is actually relevant. For the variables of interest, a combined estimation of ES is calculated, which, if applicable, allows assessing the effect of potentially moderating variables.
Goals of the present study
Our general purpose is performing a re-analysis of the database collected by Serrell employing meta-analytic techniques. This involves some specific goals. First, proposing ES indices suitable for the kind of studies synthesized, which involves adapting formulas for the variances of those indices. Second, obtaining weighted combinations of the values in the database. Third, analyzing any variability observed in the values, offering explanations by fitting models that include moderator variables.
Having estimates of these indices will be a significant progress in visitors’ studies. On the one hand, the results of new studies will be compared with reference values; secondly, when designing new exhibitions the actual behavior of real visitors estimated from a large sample of studies can be taken in account. Furthermore, the results of the analysis of the moderating variables allow that both the reference values and the estimates used in the design can be adjusted for the characteristics of each new exhibition.
Methods
Studies included
The source of studies is the compilation published in the Serrell (Reference Serrell1998) report. It includes data on exposition space usage in 110 exhibits in 62 different museums. The authors include studies performed on exhibits with a concrete expository message following a criterion of suitability and accessibility, for a significant sampling of museums. Each study has a sample of 40 or more randomly selected visitors. The data-gathering method was non-participant observation, recording data related with time spent and stops throughout the exhibit space (Asensio & Pol, Reference Asensio, Pol, Santacana and Serrat2005).
Eight of the studies included in the original report were excluded of our analyses (those with numbers 19, 25, 32, 47, 56, 60, 92, and 99) because there was not enough information for calculating the statistics and the indices employed. As described below, Serrell’s report (1998) provided histograms of two observed variables of each primary study. They also provided the mean value of those variables on each study (in the eight studies excluded the histogram was missing). Moreover, study 41 was also excluded because there was a major incongruence between the mean time spent by visitors as calculated from the histogram and the value reported in the text of the report (see below).
Variables in the report
The Serrell report included for the primary studies the following information: the size of the sample of visitors in the study, six variables related with characteristics of the exhibition (four coded and two quantitative variables), and data of two outcome variables. The four coded variables and their distributions appear in table 1; that table also includes descriptive statistics of the sample size and the two quantitative variables. The two observed variables in the study were time spent and number of stops. For (almost) each of those variables the report included histograms built with the data of the visitors and the means.
Table 1. Summary of moderator variables
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170620144440-37773-mediumThumb-S1138741616000391_tab1.jpg?pub-status=live)
We directly assumed the reported values of the six moderators. However, as we needed the variances of the outcome variables in each study the complete frequency distributions of both variables were reconstructed from the histograms (with a millimeter rule). The distributions allowed calculate both the average and the variance of each study in both variables. As the report provided arithmetic means the coincidence between our calculated means and those in the report allowed validate this process. Thus, our final database included the mean and the variance of both observed variables in each primary study of the final set.
Effect Size Indexes
Meta-analytic techniques are applied to values of some effect size (ES) index that reflects information related to some question of interest (Kelley & Preacher, Reference Kelley and Preacher2012). While in the majority of meta-analyses the most well-known indexes are suitable (Borenstein, Hedges, Higgins, & Rothstein, Reference Borenstein, Hedges, Higgins and Rothstein2009; Cooper, Hedges, & Valentine, Reference Cooper, Hedges and Valentine2009; Ellis, Reference Ellis2010), sometimes the meta-analyst must define specific indexes that reflect better the requirements of a particular context—as is the case here. Following, we define the four ES indexes that will be analyzed, specifying also the procedure for calculating their variances, as an estimate of the variance is necessary for weighting by the inverse of the variances.
Average Time per Feature (ATF)
It is defined as the average time spent to the items in the exhibit during the visit, in minutes. The total time in the visit is taken from the histograms. Still, these values reflect the trivial fact that visitors spend more time in larger exhibits. For this reason, we prefer to use ATF. If
${\bar T_j}$
is the average time of the sample in exhibit j, which is made up of F
j
features, its ATF is calculated as follows:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170620125951517-0908:S1138741616000391:S1138741616000391_eqn1.gif?pub-status=live)
The ATF
j
variance is calculated by the following formula (Botella, Suero, & Ximénez, Reference Botella, Suero and Ximénez2012), in which
$S_j^2$
is the variance of the visit time for the sample and N
j
is the number of visitors in the sample.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170620125951517-0908:S1138741616000391:S1138741616000391_eqn2.gif?pub-status=live)
Percentage of Diligent Visitors (dv)
It is the percentage of sample visitors who stopped at least in 50% of the museum features. In the original review, the variable used was %DV. In order to analyze those data we first transformed the %DV values dividing them by 100 and calculating their logit to mitigate distribution problems (Newcombe, Reference Newcombe2012). So, the values analyzed were calculated as:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170620125951517-0908:S1138741616000391:S1138741616000391_eqn3.gif?pub-status=live)
The variance of logit(dv) is given by (Newcombe, Reference Newcombe2012):
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170620125951517-0908:S1138741616000391:S1138741616000391_eqn4.gif?pub-status=live)
Inverse of Velocity (Iv)
It is the mean of the time spent by the visitors per hundred square meters (m2) of the exhibit space. As the exhibit area was given in square feet in the original report (ft2) the values were converted to m2 before the analysis (m2 = ft2 0.0929). Then, we calculated the SU value, which represents the surface in units of 100 m2.
The ES and Iv indexes and their variance were calculated as (
${\bar T_j}$
and N
j
were defined above):
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170620125951517-0908:S1138741616000391:S1138741616000391_eqn5.gif?pub-status=live)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170620125951517-0908:S1138741616000391:S1138741616000391_eqn6.gif?pub-status=live)
Stops per Feature (PF)
It is the average number of stops made by the visitors per feature in the exhibit. First,
${\bar P_j}$
was calculated, which is the mean number of stops made by the sample (via histograms). However, for the same reason as mentioned for ATF, we were more interested in the average number of stops per feature, that is:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170620125951517-0908:S1138741616000391:S1138741616000391_eqn7.gif?pub-status=live)
The variance for this index is as follows:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170620125951517-0908:S1138741616000391:S1138741616000391_eqn8.gif?pub-status=live)
Where
$S_P^2$
is the sample variance of the number of stops in study j.
Meta-analysis Procedures
Statistical analyses were performed with the procedures developed by Hedges and Olkin (Reference Hedges and Olkin1985), weighting the studies by the inverse of the variance. We assumed a random effects (RE) model for all of the analyses because it is more conservative than a fixed effect (FE) model, and allows generalizing the results beyond the specific set of studies analyzed (Borenstein, Hedges, Higgins, & Rothstein, Reference Borenstein, Hedges, Higgins and Rothstein2010; Hedges & Vevea, Reference Hedges and Vevea1998; Raudenbush, (Reference Raudenbush, Cooper, Hedges and Valentine2009). The inter-study variance was estimated by the maximum likelihood method (DerSimonian & Laird, Reference DerSimonian and Laird1986). Assuming an RE model means that the set of studies estimate a distribution of ES in the population with θ
i
∼ N (μθ; τ2). Each individual ES estimates a population ES, assuming that the error variability for that study is due to sampling (
${\rm{\sigma }}_j^2$
). Then, the estimated between studies variance (τ2) is added to the variance of each study. The RE model involves applying an inverse weight of variance such that:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170620125951517-0908:S1138741616000391:S1138741616000391_eqn9.gif?pub-status=live)
Because the variances are unknown, corresponding estimators are used. We have already defined the sampling variances of our four ES indexes in formulas [2], [4], [6], and [8].
Calculations were done with METAFOR (Viechtbauer, Reference Viechtbauer2010; asymmetry tests and funnel plots). We also performed sensitivity analyses, paying attention to outliers (Higgins & Green, Reference Higgins and Green2008). Those analyses confirmed that the results hardly change when the outliers were excluded. A special treatment has not been performed on missing values, given that these values belong to the studies we excluded from the meta-analysis, as explained in the following section.
To assess the heterogeneity of the estimates we used the Q and I2 statistics. Q involves a significance test of whether the amount of variability observed exceeds the expected under a fixed effect model. The I2 statistic allows assessing the degree of variability beyond that dichotomous decision. There are good reasons to use both indexes in combination (Huedo-Medina, Sánchez-Meca, Marín-Martínez, & Botella, Reference Huedo-Medina, Sánchez-Meca, Marín-Martínez and Botella2006).
Moderating Variables
The six characteristics of the studies provided by the original report were treated as potentially moderating variables, analyzing their associations with the values in the four ES indices (table 1). As anticipated above, we directly assumed the codes offered in the Serrell report. The only exception was the conversion of square feet (ft2) to the more international square meters (m2) units, as explained above.
Results
Average time per feature (ATF)
The combined weighted estimate of the population value (ATF ● ) is 0.43 minutes per feature (CI: 0.49; 0.37). The values show a high degree of heterogeneity [Q w (100) = 4826.69, p < .001; I 2 = 99.53% (CI: 100.13; 98.93)]. The estimate of the specific variance (τ2) equals 0.0866 (SE = 0.0125). Table 2 summarizes the results obtained when fitting meta-regression models with the two quantitative moderators, and the results when fitting models with the four categorical moderators.
Table 2. Effect size index: ATF. Summary of results for the moderator variables
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170620144440-15919-mediumThumb-S1138741616000391_tab2.jpg?pub-status=live)
The moderator number of features explains a significant percentage of the variance in the ES values, even although it is low in absolute terms. The negative sign of the slope means that the greater the number of features, the lower the ATF spent by visitors. The surface moderator is not significant.
Only one of the categorical moderators, exhibition age, was significant. The combined values for the categories created with that model are ATF ● (new) = 0.45 and ATF ● (old) = 0.19. Thus, in new exhibits visitors devote an average of 0.26 minutes (15.06 seconds) more per feature than in old exhibits. We have fitted a combined model with the two moderating variables that are significant (number of features and exhibition age). It explains a significant portion (16.39%) of the observed heterogeneity [Q M (2) = 17.9456; p < .001]; the regression equation is:
ATF´ i = 0.0036 – 0.0037 · Number of Features i + 0.2903 · Age i
According to this model, for each additional feature in the exhibit, the mean time spent by a typical visitor is reduced by 0.0037 minutes (about 0.2 seconds). By contrast, the mean time used per visitor increases 0.2903 minutes (17.41 seconds) in the case of a new exhibit, as compared to an old exhibit.
Percentage of Diligent Visitors (dv)
The combined weighted estimate (dv ● ), after the inverse logit transformation is completed, is 0.30 (CI: 0.39; 0.23), indicating that the percentage of visitors who stop by at least 50% of exhibit features is 0.30, that is, 30%. The values show a high degree of heterogeneity [Q w (100) = 1096.75, p < .001; I 2 = 93.80% (CI: 96.20; 91.4)]. The estimated specific variance (τ2) equals 1.5190 (SE = 0.2408). We did not observe any significant association with the moderating variables for dv (table 3).
Table 3. Effect size index: dv. Summary of results for the moderator variables
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170620144440-09751-mediumThumb-S1138741616000391_tab3.jpg?pub-status=live)
Inverse of Velocity (Iv)
The combined weighted estimate of the population value (Iv ● ) is 4.07 minutes / 100 m2 (CI: 4.55; 3.59). The values show a high degree of heterogeneity [Q w (100) = 8191.11, p < .001; I 2 = 99.68% (CI: 104.38; 94.98)]. The estimated specific variance (τ2), equals 5.8446 (SE = 0.8509).
The moderator number of features explains a significant proportion of the variance, although its absolute value is low. As the slope is positive, the greater the number of features the slower the speed (and the higher its inverse). Likewise, the moderator surface area also explains a significant proportion of the variance, although this continues to be low (table 4). In this case the slope is negative, meaning that the larger the exhibit, the more quickly visitors move through it.
Table 4. Effect size index: Iv. Summary of results for the moderator variables
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170620144440-77743-mediumThumb-S1138741616000391_tab4.jpg?pub-status=live)
Regarding Age variable, there is an enormous difference between categories (4.28 versus 0.70): visitors spend an average of 3.58 minutes more for every 100 m2 in new exhibits than in old.
We fitted the combined model including number of features, surface area, and age variables, which explains 41.46% of the variance [Q M (3) = 51.5794, p < .001]. The equation is:
Iv´ i = 1.41 + 0.030 · N. of Features i – 0.464 · Surface area i + 1.821 · Age i
The time/surface area relationship is greater with more features, such that with more features, visitors move more slowly. However, the time/surface area relationship is greater with greater surface areas, meaning the greater the surface area covered by visitors, the faster visitors move through the exhibit. The age variable adds 1.821 when it is a new exhibit. Other models, including other moderating variables, were also explored but they do not significantly increase the predictive capacity of the model.
Stops per Feature (SF)
The combined weighted estimate of the population value (SF ● ) is 0.35 stops per feature (CI: 0.38; 0.33). The values show a high degree of heterogeneity [Q w (100) = 21816.10, p < .001; I 2 = 99.05% (CI: 99.32; 98.78)]. The specific variance (τ2) is equal to 0.0192 (SE = 0.0028).
The number of features variable explains a low but significant amount of variance (table 5). The slope is negative, which means the greater the number of features the lower the mean of stops per feature made by visitors.
Table 5. Effect size index: SF. Summary of results for the moderator variables
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170620144440-49300-mediumThumb-S1138741616000391_tab5.jpg?pub-status=live)
Assessing the asymmetry of the distributions
Conventionally, a meta-analysis includes analyses aimed at detecting possible publication bias in the field of study it refers to (Rothstein, Sutton, & Borenstein, Reference Rothstein, Sutton and Borenstein2005). The most studied source of bias is that toward studies with statistically significant results. But this source of bias is not relevant for our study because the original studies synthesized do not make significance tests. Furthermore, the reports have not been published in scientific journals based in peer-review processes.
Nevertheless, we have created funnel plots (Light & Pillemer, Reference Light and Pillemer1984) for the four ES indexes (figure 1). For three of the four indicators, the values are better when the visitor sample used for the estimate is lower: ATF increases (they spend longer), Iv increases (slower movement), and PF increases (number of times they stop through the visit). Contrariwise, the indicator dv exhibits a different and more complex relationship pattern.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170620144440-34474-mediumThumb-S1138741616000391_fig1g.jpg?pub-status=live)
Figure 1. Funnel plot for the four ES indexes.
Discussion
The results of our analysis show that less ATF is spent as larger is the number of features in the exhibit; that, in the average, more time is spent in new exhibits than in old exhibits; that users move more slowly when there are more features; that in exhibits with larger surface areas the users move more rapidly; and that the time spent per feature is not related to the surface area. In general terms, our conclusions agree with those of Serrell.
However, our conclusions are reached by means of statistical tests that satisfy the assumptions of the statistical models. Furthermore, the estimates obtained with the meta-analytical procedures are more reliable, as they are unbiased, consistent, and of maximum efficiency. Using ES indexes the values from different studies can be compared, as they are analyzed with a shared metric. As a consequence, our combined point estimates can be employed as a reference for evaluation and research in the field of visitors’ studies. We resume them: ATF ● = 25.8 seconds; SF ● = 0.35 stops; dv ● = 30% diligent visitors; Iv ● = 4.07 minutes per 100 m2. Analyses of complex variables are well received among researchers and professionals, as these measures allow quantifying effects and measure problems in a tangible way, offering more powerful tools for evaluation.
We have chosen the Paying Attention report of Serrell (Reference Serrell1998) because it is a unique effort in the field of Museum Studies, a field where the visitors studies are confidential, not accessible to the general public. Still, our study has gone further, as we have proposed new statistical indices and have applied statistical techniques borrowed from the meta-analysis methodology. The sophistication of those techniques allows calculating efficient combined estimates and testing the role of potentially relevant moderators.
Our results show that visitor’s behavior can be partially explained by factors such as the type of museum, the exhibit set-up, the characteristics of the expository features, and their layout. This gives us an idea of the type of potentially moderating variables of great importance in future meta-analysis studies. However, studying the funnel plots we have also shown unexpected strong relationships between several effect size indices and the sample size in the study. We assume that the sample size is directly related to the amount of visitors in the exhibit, so that the studies on more popular exhibits are based on larger samples. Assuming that, the funnel plots and the asymmetry tests show that when the visitor sample is small, it is also more (self) selective. As a consequence, in exhibits with smaller samples more time is spent, visitors walk more slowly, and they make more stops per feature. These effects are probably related to each visitor’s specialty or personal interest. On the contrary, general population visiting massively a museum yields smaller amounts of time spent, speeded walks, and less stops.
A major contribution of our study is that the re-analysis of the database published by Serrell (Reference Serrell1998) provides combined estimates of four indices that point trends that can serve as a reference. We know that the greater the number of elements, less time spent by visitors per item in the exhibition and fewer stops are made. Therefore, overloaded exhibitions induce superficial visits, where people do not spend enough time to the items and acquisition of new knowledge is limited. We also know that visitors devote more time to new exhibits, so that a re-design of our exhibitions can foster deeper visits. Our analysis also revealed that in the exhibitions with more elements visitors go more slowly, probably to pay attention to all the displays. However, an exhibition in which the visitor is confronted with countless pieces (usually very similar) can lead to loss of interest (and care). To design a more fulfilling and meaningful visitor experience, it may be more useful to exhibit only those most representative and important pieces, which can focus their full attention, without overloading the visitor. Thanks to our estimates it is possible predict the average behavior of a sample of visitors, respect to the time spend per item, if we know the number of elements in our exhibition and take in account whether it is a new or an old exposure. We can also estimate the average speed at which a sample of visitors make the visit, taking in account the exhibition surface, number of items, and length of exposure. All this information can help us in planning our exposure, objectively estimating how much time the average visitor will spend in the visit and making decisions when planning the museum. The values of the four indices of effect size can be helpful for comparisons with the values obtained in evaluations of our exhibition. The goal, then, is to improve these values in new exhibitions, taking this data as references when adapting to our users.