Accuracy Measures (accuracy + measure)

Distribution by Scientific Domains

Selected Abstracts

Accuracy of technetium-99m SPECT-CT hybrid images in predicting the precise intraoperative anatomical location of parathyroid adenomas

Luke Harris MD
Abstract Background. This study evaluated the accuracy of single photon emission computed tomography (SPECT)-CT imaging for the preoperative localization of parathyroid adenomas. Methods. This study included both a quantitative and qualitative accuracy measure. The quantitative measure was the distance between the location of the adenoma on the SPECT-CT scan and the location of the adenoma intraoperatively. Qualitatively, surgeons were asked whether or not the adenoma was in the exact location predicted by the SPECT-CT scan. The time from initial incision to identification of the parathyroid was recorded. Patients referred to London Health Sciences Centre for a suspected parathyroid adenoma were eligible for this study. Results. Twenty-three patients participated in this study. Eighteen (78.3%) had a single adenoma, 2 (8.7%) had double adenomas, and 3 (13.0%) had multiglandular hyperplasia. SPECT-CT correctly detected and localized 16 of 18 (88.9%) cases of single parathyroid adenomas. The mean distance between the location of the adenoma on the SPECT-CT scan and the location of the adenoma intraoperatively was 16.3 mm (95% , 19.0 mm). For single adenomas, the median time from skin incision to identification was 14 minutes (range, 8,40 minutes). The preoperative detection and localization of a single focus of sestamibi uptake yielded a parathyroid adenoma in the specified location in 80.0% of cases (95% CI, 97.4,66.5%). Conclusions. SPECT-CT predicted the intraoperative location of a single parathyroid adenoma within 19.0 mm with 95% confidence. The correct detection and localization of multiglandular disease remains difficult. 2007 Wiley Periodicals, Inc. Head Neck, 2008 [source]

Is Teaching Experience Necessary for Reliable Scoring of Extended English Questions?

Lucy Royal-Dawson
Hundreds of thousands of raters are recruited internationally to score examinations, but little research has been conducted on the selection criteria for these raters. Many countries insist upon teaching experience as a selection criterion and this has frequently become embedded in the cultural expectations surrounding the tests. Shortages in raters for some of England's national examinations has led to non-teachers being hired to score a small minority of items and changes in technology have fostered this approach. For a National Curriculum test in English taken at age 14, this study investigated whether teaching experience was a necessary selection criterion for all aspects of the examination. Fifty-seven raters with different backgrounds were trained in the normal manner and scored the same 97 students' work. Accuracy was investigated using a cross-classified multilevel model of absolute score differences with accuracy measures at level 1 and raters crossed with candidates at level 2. By comparing the scoring accuracy of graduates with a degree in English, teacher trainees, experienced teachers and experienced raters, this study found that teaching experience was not a necessary selection criterion. A rudimentary model for allocation of raters to different question types is proposed and further research to investigate the limits of necessary qualifications for scoring is suggested. [source]

Evaluation of best system performance: Human, automated, and hybrid inspection systems

Xiaochun Jiang
Recently, 100% inspection with automated systems has seen more frequent application than traditional sampling inspection with human inspectors. Nevertheless, humans still outperform machines in most attribute inspection tasks. Because neither humans nor automation can achieve superior inspection system performance, hybrid inspection systems where humans work cooperatively with machines merit study. In response to this situation, this research was conducted to evaluate three of the following different inspection systems: (1) a human inspection system, (2) a computer search/human decision-making inspection system, and (3) a human/computer share search/decision-making inspection system. Results from this study showed that the human/computer share search/decision-making system achieve the best system performance, suggesting that both should be used in the inspection tasks rather than either alone. Furthermore, this study looked at the interaction between human inspectors and computers, specifically the effect of system response bias on inspection quality performance. These results revealed that the risky system was the best in terms of accuracy measures. Although this study demonstrated how recent advances in computer technology have modified previously prescribed notions about function allocation alternatives in a hybrid inspection environment, the adaptability of humans was again demonstrated, indicating that they will continue to play a vital role in future hybrid systems. 2003 Wiley Periodicals, Inc. Hum Factors Man 13: 137,152, 2003. [source]

Clustering technique for risk classification and prediction of claim costs in the automobile insurance industry

Ai Cheo Yeo
This paper considers the problem of predicting claim costs in the automobile insurance industry. The first stage involves classifying policy holders according to their perceived risk, followed by modelling the claim costs within each risk group. Two methods are compared for the risk classification stage: a data-driven approach based on hierarchical clustering, and a previously published heuristic method that groups policy holders according to pre-defined factors. Regression is used to model the expected claim costs within a risk group. A case study is presented utilizing real data, and both risk classification methods are compared according to a variety of accuracy measures. The results of the case study show the benefits of employing a data-driven approach. 2001 John Wiley & Sons, Ltd. [source]

Evaluating forecasts: a look at aggregate bias and accuracy measures

Benito E. Flores
Abstract In this paper an investigation is made of the properties and use of two aggregate measures of forecast bias and accuracy. These are metrics used in business to calculate aggregate forecasting performance for a family (group) of products. We find that the aggregate measures are not particularly informative if some of the one-step-ahead forecasts are biased. This is likely to be the case in practice if frequently employed forecasting methods are used to generate a large number of individual forecasts. In the paper, examples are constructed to illustrate some potential problems in the use of the metrics. We propose a simple graphical display of forecast bias and accuracy to supplement the information yielded by the accuracy measures. This support includes relevant boxplots of measures of individual forecasting success. This tool is simple but helpful as the graphic display has the potential to indicate forecast deterioration that can be masked by one or both of the aggregate metrics. The procedures are illustrated with data representing sales of food items. Copyright 2005 John Wiley & Sons, Ltd. [source]

Can cointegration-based forecasting outperform univariate models?

An application to Asian exchange rates
Abstract Conventional wisdom holds that restrictions on low-frequency dynamics among cointegrated variables should provide more accurate short- to medium-term forecasts than univariate techniques that contain no such information; even though, on standard accuracy measures, the information may not improve long-term forecasting. But inconclusive empirical evidence is complicated by confusion about an appropriate accuracy criterion and the role of integration and cointegration in forecasting accuracy. We evaluate the short- and medium-term forecasting accuracy of univariate Box,Jenkins type ARIMA techniques that imply only integration against multivariate cointegration models that contain both integration and cointegration for a system of five cointegrated Asian exchange rate time series. We use a rolling-window technique to make multiple out of sample forecasts from one to forty steps ahead. Relative forecasting accuracy for individual exchange rates appears to be sensitive to the behaviour of the exchange rate series and the forecast horizon length. Over short horizons, ARIMA model forecasts are more accurate for series with moving-average terms of order >1. ECMs perform better over medium-term time horizons for series with no moving average terms. The results suggest a need to distinguish between ,sequential' and ,synchronous' forecasting ability in such comparisons. Copyright 2002 John Wiley & Sons, Ltd. [source]

Effects of stochastic parametrizations in the Lorenz '96 system

Daniel S. Wilks
Abstract Stochastic parametrization of the effects of unresolved variables is studied in the context of the Lorenz '96 system. These parametrizations are found to produce clear improvements in correspondence between the model and ,true' climatologies; they similarly provide clear improvements in all ensemble forecast verification measures investigated, including accuracy of ensemble means and ensemble probability estimation, and including measures operating on both scalar (each resolved forecast variable evaluated individually) and vector (all forecast variables evaluated simultaneously) predictands. Scalar accuracy measures for non-ensemble (i.e. single integration) forecasts are, however, degraded. The results depend very strongly on both the amplitude (standard deviation) and time-scale of the stochastic forcing, but only weakly on its spatial scale. In general there seems not to be a single clear optimum combination of time-scale and amplitude, but rather there exists a range of combinations producing similar results. Copyright 2005 Royal Meteorological Society. [source]