Distribution by Scientific Domains
Distribution within Medical Sciences

Kinds of Raters

  • different rater
  • experience rater
  • expert rater
  • independent rater
  • trained rater

  • Selected Abstracts


    This study investigated (a) the relative importance of a number of biographic (e.g., age, race, gender) and contextual (e.g., span of control, functional area) variables and their interactions on self-other agreement and (b) the relationship between self-other agreement and outcome variables such as performance and compensation. Usable data were collected from 3,217 managers and their multi-source raters in 527 organizations. Multivariate regression procedures (as opposed to categorization procedures) were used to determine the sources of rating disagreement. Results indicated that a significant portion of variance in self-other ratings was accounted for by the set of background/context variables. Self-other agreement was also related to performance, compensation, and organizational level, though rating patterns differed. [source]

    Rater and occasion impacts on the reliability of pre-admission assessments

    MEDICAL EDUCATION, Issue 12 2009
    Rick D Axelson
    Context, Some medical schools have recently replaced the medical school pre-admission interview (MSPI) with the multiple mini-interview (MMI), which utilises objective structured clinical examination (OSCE)-style measurement techniques. Their motivation for doing so stems from the superior reliabilities obtained with the OSCE-style measures. Other institutions, however, are hesitant to embrace the MMI format because of the time and costs involved in restructuring recruitment and admission procedures. Objectives, To shed light on the aetiology of the MMI's increased reliability and to explore the potential of an alternative, lower-cost interview format, this study examined the relative contributions of two facets (raters, occasions) to interview score reliability. Methods, Institutional review board approval was obtained to conduct a study of all students who completed one or more MSPIs at a large Midwestern medical college during 2003,2007. Within this dataset, we identified 168 applicants who were interviewed twice in consecutive years and thus provided the requisite data for generalisability (G) and decision (D) studies examining these issues. Results, Increasing the number of interview occasions contributed much more to score reliability than did increasing the number of raters. Conclusions, Replicating a number of interviews, each with one rater, is likely to be superior to the often recommended panel interview approach and may offer a practical, low-cost method for enhancing MSPI reliability. Whether such a method will ultimately enhance MSPI validity warrants further investigation. [source]

    Reliability: on the reproducibility of assessment data

    MEDICAL EDUCATION, Issue 9 2004
    Steven M Downing
    Context, All assessment data, like other scientific experimental data, must be reproducible in order to be meaningfully interpreted. Purpose, The purpose of this paper is to discuss applications of reliability to the most common assessment methods in medical education. Typical methods of estimating reliability are discussed intuitively and non-mathematically. Summary, Reliability refers to the consistency of assessment outcomes. The exact type of consistency of greatest interest depends on the type of assessment, its purpose and the consequential use of the data. Written tests of cognitive achievement look to internal test consistency, using estimation methods derived from the test-retest design. Rater-based assessment data, such as ratings of clinical performance on the wards, require interrater consistency or agreement. Objective structured clinical examinations, simulated patient examinations and other performance-type assessments generally require generalisability theory analysis to account for various sources of measurement error in complex designs and to estimate the consistency of the generalisations to a universe or domain of skills. Conclusions, Reliability is a major source of validity evidence for assessments. Low reliability indicates that large variations in scores can be expected upon retesting. Inconsistent assessment scores are difficult or impossible to interpret meaningfully and thus reduce validity evidence. Reliability coefficients allow the quantification and estimation of the random errors of measurement in assessments, such that overall assessment can be improved. [source]

    Generalizability of Cognitive Interview-Based Measures Across Cultural Groups

    Guillermo Solano-Flores
    We addressed the challenge of scoring cognitive interviews in research involving multiple cultural groups. We interviewed 123 fourth- and fifth-grade students from three cultural groups to probe how they related a mathematics item to their personal lives. Item meaningfulness,the tendency of students to relate the content and/or context of an item to activities in which they are actors,was scored from interview transcriptions with a procedure similar to the scoring of constructed-response tasks. Generalizability theory analyses revealed a small amount of score variation due to the main and interaction effect of rater but a sizeable magnitude of measurement error due to the interaction of person and question (context). Students from different groups tended to draw on different sets of contexts of their personal lives to make sense of the item. In spite of individual and potential cultural communication style differences, cognitive interviews can be reliably scored by well-trained raters with the same kind of rigor used in the scoring of constructed-response tasks. However, to make valid generalizations of cognitive interview-based measures, a considerable number of interview questions may be needed. Information obtained with cognitive interviews for a given cultural group may not be generalizable to other groups. [source]

    The Quality of Content Analyses of State Student Achievement Tests and Content Standards

    Andrew C. Porter
    This article examines the reliability of content analyses of state student achievement tests and state content standards. We use data from two states in three grades in mathematics and English language arts and reading to explore differences by state, content area, grade level, and document type. Using a generalizability framework, we find that reliabilities for four coders are generally greater than .80. For the two problematic reliabilities, they are partly explained by an odd rater out. We conclude that the content analysis procedures, when used with at least five raters, provide reliable information to researchers, policymakers, and practitioners about the content of assessments and standards. [source]

    Alcohol consumption increases attractiveness ratings of opposite-sex faces: a possible third route to risky sex

    ADDICTION, Issue 8 2003
    Barry T. Jones
    ABSTRACT Aims, To measure the effect of moderate alcohol consumption on males' and females' attractiveness ratings of unfamiliar male and female faces. Participants Eighty undergraduate volunteers were used in each of three experiments. Design, Participants' ratings on a 1,7 scale was the dependent variable. A three-factor mixed design was used. For experiments 1 and 2: one within-factor, sex-of-face to be rated (male/female); two between-factors, sex-of-rater (male/female) and alcohol status of rater (0 UK units/1,6 UK units). For experiment 3, the two levels of sex-of-face were replaced by two levels of a non-face object. In experiment 1, the faces were rated for attractiveness; in experiment 2, the faces were rated for distinctiveness and in experiment 3, the non-face objects were rated for attractiveness. Setting, Quiet, prepared corners of bars and licensed eating areas on a civic university campus. Method, For each experiment, 118 full-colour photographic images were presented randomly on a laptop computer screen, each remaining until a rating response was made. Findings, There was a significant alcohol consumption enhancement effect only for attractiveness ratings of opposite-sex faces in experiment 1. This indicates that the opposite-sex enhancement effect is not due simply to alcohol consumption causing the use of higher points of ratings scales, in general. Conclusion, Since Agocha & Cooper have shown that the likelihood of intentions to engage in risky sex increases as the facial attractiveness of the potential sexual partner increases, through the opposite-sex enhancement effect we identify a new possible link between risky sex and alcohol consumption. [source]

    The Influence of Gender, Ethnicity, and Individual Differences on Perceptions of Career Progression in Public Accounting

    D. Jordan Lowe
    Prior research examining gender and diversity issues has generally lacked supporting theory and experimental investigation. This study provides theory-based experimental evidence regarding the effects of gender, ethnicity, and other individual differences on performance evaluations of audit seniors. We utilized organizational socialization theory in examining the accounting profession's view of diversity issues. The process model of performance evaluation provided guidance in the selection of ratee, rater, and contextual characteristics as factors to analyze. An experiment was conducted with 95 audit seniors from one of the Big 5 public accounting firms. Results indicate that gender and ethnic heritage are important factors in the career prospects of audit seniors. The demeanor of an auditor was also important as an interactive factor and influences judgments differently depending on the gender or ethnic origin of the auditor evaluated. These results suggest that diversity is a very complex issue. Examining single factors without considering the interactions of a variety of factors may lead to incorrect conclusions. [source]

    Comparing clock tests for dementia screening: naïve judgments vs formal systems,what is optimal?

    James M. Scanlan
    Abstract Background Clock drawing tests (CDTs) vary in format, scoring, and complexity. Herein, we compared the dementia screening performance of seven CDT scoring systems and the judgements of untrained raters. Methods 80 clock drawings by subjects of known dementia status were selected, 20 from each of four categories (Consortium to Establish a Registry for Alzheimer's disease [CERAD] defined normal, mild, moderate, and severe abnormality). An expert rater scored all clocks using published criteria for seven systems. Additionally, 20 naïve raters judged clocks as either normal or abnormal, without formal instructions. Clocks were then classified by drawers' dementia status for comparison of dementia detection across systems. Results Naïve and formal CDT systems showed 90,100% agreement in CERAD normal, moderate and severe categories, but poor agreement (mean,=,39%) for mildly impaired clocks. When CDT systems were compared for accurate dementia classification, the Mendez and CERAD systems correctly identified the greatest proportion of subjects (84,85%), and Wolf-Klein the smallest (58%). The better systems correctly identified>,70% of mildly demented individuals (CDR,=,1). In contrast, medical records from patients' personal physicians correctly identified only 24% of the mildly demented. Strikingly, naïve raters' CDT judgements were as effective as five of the seven CDT systems in dementia identification. Conclusions While the Mendez system was the most accurate overall, it was not significantly better than CERAD, which had simpler scoring rules. Untrained raters discriminated normal from abnormal clocks with acceptable accuracy for community screening purposes. Results suggest that, if used, most CDT systems would improve personal physicians' dementia recognition in difficult to detect mildly demented subjects. Copyright © 2002 John Wiley & Sons, Ltd. [source]

    Real-Time Feedback on Rater Drift in Constructed-Response Items: An Example From the Golden State Examination

    Machteld Hoskens
    In this study, patterns of variation in severities of a group of raters over time or so-called "rater drift" was examined when raters scored an essay written under examination conditions. At the same time feedback was given to rater leaders (called "table leaders") who then interpreted the feedback and reported to the raters. Rater severities in five successive periods were estimated using a modified linear logistic test model (LLTM, Fischer, 1973) approach. It was found that the raters did indeed drift towards the mean, but a planned comparision of the feedback with a control condition was not successful; it was believed that this was due to contamination at the table leader level. A series of models was also estimated designed to detect other types of rater effects beyond severity: a tendency to use extreme scores, and tendency to prefer certain categories. The models for these effects were found to be showing significant improvement in fit, implying that these effects were indeed present, although they were difficult to detect in relatively short time periods. [source]

    Use of Shade Guides for Color Measurement in Tooth-Bleaching Studies

    ABSTRACT Several different methods are used to measure tooth color in bleaching studies. The ADA Acceptance Program Guidelines for Home Use Tooth Whitening Products specify the use of a value-oriented shade guide and/or electronic color measurement devices. Since people perceive color differently, shade guides are a subjective measure. Differences between raters and by the same rater are well documented in the dental literature. The purposes of this article will be to discuss the advantages and disadvantages using shade guides to measure color change related to tooth whitening, and to evaluate the correlation of data gathered from the use of shade guides to electronic color measurement devices. Using an order published by the manufacturer, both the TRUBYTE® Bioform and Vita Classical guides can be arranged by value. A study by O'Brien demonstrated however, that the order is flawed and the change in brightness from tab to tab varies greatly. Despite these disadvantages, a review of data from several clinical trials demonstrates that Vita Classical shade guide data is consistent with data gathered using electronic color measurements. Furthermore, the O'Brien data can be used to make both these guides better measurement systems. The ADA Certification program standards define the degree of overall color change that should be considered clinically important. This issue is as critical as the measurement system used. Reporting color changes that are neither detectable to the human eye nor considered by the public to be important offers the profession little usable information. Given that any standard for color change during bleaching must relate to the abilities of the human eye, it is the conclusion of the author that shade guides should remain a critical element of any bleaching study. CLINICAL SIGNIFICANCE Clinicians are frequently exposed to reports of bleaching agents that have been shown to result in a change of 6, 7, 8, etc., tabs. Without understanding the limitations of the shade guide used, reports of a specific shade tab change are of little use and may actually be misleading. [source]

    The comparison of Nail Psoriasis Severity Index with a less time-consuming qualitative system

    N Kaçar
    Abstract Objective Reliable assessment of severity in nail psoriasis is essential to document treatment responses in clinical trials and routine clinical usage. In this study the correlation between Nail Psoriasis Severity Index (NAPSI) and Cannavo's scoring system was assessed, and inter-rater correlation of NAPSI scores were evaluated. Materials and Methods Forty-five patients with nail psoriasis were included. Target nails were selected and graded by the first dermatologist with both scoring systems. The nails were reevaluated by the second dermatologist with NAPSI. Results The two systems were highly correlated (P < 0.001). For NAPSI inter-rater correlation was also significant (P < 0.001). Conclusion Our results showed that the qualitative and quantitative evaluations of the same rater were similar. Although the qualitative scoring system of Cannavo's is less time consuming than NAPSI, to suggest this system inter-rater correlations should be evaluated. [source]

    Evaluation of an AIF correction algorithm for dynamic susceptibility contrast-enhanced perfusion MRI

    Peter Brunecker
    Abstract For longitudinal studies in patients suffering from cerebrovascular diseases the poor reproducibility of perfusion measurements via dynamic susceptibility-weighted contrast-enhanced MRI (DSC-MRI) is a relevant concern. We evaluate a novel algorithm capable of overcoming limitations in DSC-MRI caused by partial volume and saturation issues in the arterial input function (AIF) by a blood flow stimulation-study. In 21 subjects, perfusion parameters before and after administration of blood flow stimulating L -arginine were calculated utilizing a block-circulant singular value decomposition (cSVD). A total of two different raters and three different rater conditions were employed to select AIFs: Besides 1) an AIF selection by an experienced rater, a beginner rater applied a steady state-oriented strategy, returning; 2) raw; and 3) corrected AIFs. Highly significant changes in regional cerebral blood flow (rCBF) by 9.0% (P < 0.01) could only be found when the AIF correction was performed. To further test for improved reproducibility, in a subgroup of seven subjects the baseline measurement was repeated 6 weeks after the first examination. In this group as well, using the correction algorithm decreased the SD of the difference between the two baseline measurements by 42%. Magn Reson Med 60:102,110, 2008. © 2008 Wiley-Liss, Inc. [source]

    Generalisability in unbalanced, uncrossed and fully nested studies

    MEDICAL EDUCATION, Issue 4 2010
    Ajit Narayanan
    Medical Education 2010: 44: 367,378 Objectives, There is growing interest in multi-source, multi-level feedback for measuring the performance of health care professionals. However, data are often unbalanced (e.g. there are different numbers of raters for each doctor), uncrossed (e.g. raters rate the doctor on only one occasion) and fully nested (e.g. raters for a doctor are unique to that doctor). Estimating the true score variance among doctors under these circumstances is proving a challenge. Methods, Extensions to reliability and generalisability (G) formulae are introduced to handle unbalanced, uncrossed and fully nested data to produce coefficients that take into account variances among raters, ratees and questionnaire items at different levels of analysis. Decision (D) formulae are developed to handle predictions of minimum numbers of raters for unbalanced studies. An artificial dataset and two real-world datasets consisting of colleague and patient evaluations of doctors are analysed to demonstrate the feasibility and relevance of the formulae. Another independent dataset is used for validating D predictions of G coefficients for varying numbers of raters against actual G coefficients. A combined G coefficient formula is introduced for estimating multi-sourced reliability. Results, The results from the formulae indicate that it is possible to estimate reliability and generalisability in unbalanced, fully nested and uncrossed studies, and to identify extraneous variance that can be removed to estimate true score variance among doctors. The validation results show that it is possible to predict the minimum numbers of raters even if the study is unbalanced. Discussion, Calculating G and D coefficients for psychometric data based on feedback on doctor performance is possible even when the data are unbalanced, uncrossed and fully nested, provided that: (i) variances are separated at the rater and ratee levels, and (ii) the average number of raters per ratee is used in calculations for deriving these coefficients. [source]

    Rater and occasion impacts on the reliability of pre-admission assessments

    MEDICAL EDUCATION, Issue 12 2009
    Rick D Axelson
    Context, Some medical schools have recently replaced the medical school pre-admission interview (MSPI) with the multiple mini-interview (MMI), which utilises objective structured clinical examination (OSCE)-style measurement techniques. Their motivation for doing so stems from the superior reliabilities obtained with the OSCE-style measures. Other institutions, however, are hesitant to embrace the MMI format because of the time and costs involved in restructuring recruitment and admission procedures. Objectives, To shed light on the aetiology of the MMI's increased reliability and to explore the potential of an alternative, lower-cost interview format, this study examined the relative contributions of two facets (raters, occasions) to interview score reliability. Methods, Institutional review board approval was obtained to conduct a study of all students who completed one or more MSPIs at a large Midwestern medical college during 2003,2007. Within this dataset, we identified 168 applicants who were interviewed twice in consecutive years and thus provided the requisite data for generalisability (G) and decision (D) studies examining these issues. Results, Increasing the number of interview occasions contributed much more to score reliability than did increasing the number of raters. Conclusions, Replicating a number of interviews, each with one rater, is likely to be superior to the often recommended panel interview approach and may offer a practical, low-cost method for enhancing MSPI reliability. Whether such a method will ultimately enhance MSPI validity warrants further investigation. [source]

    The effect of differential rater function over time (DRIFT) on objective structured clinical examination ratings

    MEDICAL EDUCATION, Issue 10 2009
    Kevin McLaughlin
    Context, Despite the impartiality implied in its title, the objective structured clinical examination (OSCE) is vulnerable to systematic biases, particularly those affecting raters' performance. In this study our aim was to examine OSCE ratings for evidence of differential rater function over time (DRIFT), and to explore potential causes of DRIFT. Methods, We studied ratings for 14 internal medicine resident doctors over the course of a single formative OSCE, comprising 10 12-minute stations, each with a single rater. We evaluated the association between time-slot and rating for a station. We also explored a possible interaction between time-slot and station difficulty, which would support the hypothesis that rater fatigue causes DRIFT, and considered ,warm-up' as an alternative explanation for DRIFT by repeating our analysis after excluding the first two OSCE stations. Results, Time-slot was positively associated with rating on a station (regression coefficient 0.88, 95% confidence interval [CI] 0.38,1.38; P = 0.001). There was an interaction between time-slot and station difficulty: for the more difficult stations the regression coefficient for time-slot was 1.24 (95% CI 0.55,1.93; P = 0.001) compared with 0.52 (95% CI , 0.08 to 1.13; P = 0.09) for the less difficult stations. Removing the first two stations from our analyses did not correct DRIFT. Conclusions, Systematic biases, such as DRIFT, may compromise internal validity in an OSCE. Further work is needed to confirm this finding and to explore whether DRIFT also affects ratings on summative OSCEs. If confirmed, the factors contributing to DRIFT, and ways to reduce these, should then be explored. [source]

    Reliability of the Clinical Teaching Effectiveness Instrument

    MEDICAL EDUCATION, Issue 9 2005
    H H Van Der Hem-Stokroos
    Introduction, The Clinical Teaching Effectiveness Instrument (CTEI) was developed to evaluate the quality of the clinical teaching of educators. Its authors reported evidence supporting content and criterion validity and found favourable reliability findings. We tested the validity and reliability of this instrument in a European context and investigated its reliability as an instrument to evaluate the quality of clinical teaching at group level rather than at the level of the individual teacher. Methods, Students participating in a surgical clerkship were asked to fill in a questionnaire reflecting a student,teacher encounter with a staff member or a resident. We calculated variance components using the urgenova program. For individual score interpretation of the quality of clinical teaching the standard error of estimate was calculated. For group interpretation we calculated the root mean square error. Results, The results did not differ statistically between staff and residents. The average score was 3.42. The largest variance component was associated with rater variance. For individual score interpretation a reliability of >,0.80 was reached with 7 ratings or more. To reach reliable outcomes at group level, 15 educators or more were needed with a single rater per educator. Discussion, The required sample size for appraisal of individual teaching is easily achievable. Reliable findings can also be obtained at group level with a feasible sample size. The results provide additional evidence of the reliability of the CTEI in undergraduate medical education in a European setting. The results also showed that the instrument can be used to measure the quality of teaching at group level. [source]

    Investigating the use of sampling for maximising the efficiency of student-generated faculty teaching evaluations

    MEDICAL EDUCATION, Issue 2 2005
    Clarence D Kreiter
    Purpose, Surveys of medical students are widely used to evaluate course content and faculty teaching within the medical school. Gathering information that accurately reflects student perceptions requires that students buy into the evaluation process and be willing to provide thoughtful responses to the teaching evaluation. To maintain student commitment, it is important that medical students are not overburdened with poorly planned evaluations. Sampling might decrease the number of evaluations required of students and might also reduce the proportion of non-responses and other forms of inattentive response biases. Methods, A sampling technique employed within a large medical lecture is described and evaluated. A generalisability study of the teacher evaluations is conducted. Results, A high response rate and high levels of reliability were obtained by sampling a small proportion of the total class. The largest source of error was related to rater and utilising sufficient numbers of student-raters is critical to achieving reliable results. Conclusion, Sampling can reduce evaluation demands placed on students, and preserve reliability and increase the validity of mean evaluation scores. With computer presentation, efficient sampling techniques become practical and should be part of software packages used to present teacher evaluations. [source]

    Evaluating surgeons' informed decision making skills: pilot test using a videoconferenced standardised patient

    MEDICAL EDUCATION, Issue 12 2003
    Sarah L Clever
    Background, Standardised patients (SPs) are effective in evaluating communication skills, but not every training site may have the resources to develop and maintain SP programmes. Objectives, To test whether videoconferencing technology (VT) could enable an interaction between an SP and an orthopaedic surgeon that would allow the SP to accurately evaluate the surgeon's informed decision making (IDM) skills. We also assessed whether this sort of interaction was acceptable to orthopaedic surgeons as a means of learning IDM skills. Methods, We trained an SP to represent a 75-year-old woman considering hip replacement surgery. Orthopaedic surgeons in Chicago individually consulted with the SP in Philadelphia; each participant could see and hear the other on large television screens. The SP evaluated the surgeons' advice using a 23-item checklist of IDM elements, and gave each surgeon verbal and written feedback on his IDM skills. The surgeons then gave their evaluations of the exercise. Results, Twenty-two surgeons completed the project. The SP was ,,80% accurate in classifying 20 of the 23 IDM skills when compared to a clinician rater. Although 12 (55%) of the orthopaedic surgeons felt that some aspects of the technology were distracting, most were pleased with it, and 19 of 22 (86%) would recommend the videoconferenced SP interaction to their colleagues as a means of learning IDM skills. Conclusions, These results suggest that VT allows accurate evaluation of IDM skills in a format that is acceptable to orthopaedic surgeons. Videoconferencing technology may be useful in long-distance SP communication assessment for a variety of learners. [source]

    Effect of vagal nerve stimulation in a case of Tourette's syndrome and complex partial epilepsy

    MOVEMENT DISORDERS, Issue 8 2006
    Alan Diamond DO
    Abstract We report on a 30-year-old man with Tourette's syndrome (TS) and medication-refractory epilepsy whose tics improved after implantation of a vagal nerve stimulator (VNS). To verify the patient's observation, we performed a blinded video assessment using the modified Rush video-based tic rating scale. The patient underwent two separate video recordings (VNS on and VNS off). A rater, blinded to patient's VNS status, evaluated the videos with the modified Rush video-based tic rating scale. There were improvements in total tic score and motor and phonic tic frequency. If verified by controlled clinical trials, this observation may provide insights into the pathophysiology of tics and may lead to a novel therapy for patients with severe TS. © 2006 Movement Disorder Society [source]

    Comparing the Psychometric Properties of the Checklist of Nonverbal Pain Behaviors (CNPI) and the Pain Assessment in Advanced Dementia (PAIN-AD) Instruments

    PAIN MEDICINE, Issue 3 2010
    FAAN, Mary Ersek PhD
    Abstract Objective., To examine and compare the psychometric properties of two common observational pain assessment tools used in persons with dementia. Design., In a cross-sectional descriptive study nursing home (NH) residents were videotaped at rest and during a structured movement procedure. Following one training session and one practice session, two trained graduate nursing research assistants independently scored the tapes using the two pain observation tools. Setting., Fourteen NHs in Western Washington State participating in a randomized controlled trial of an intervention to enhance pain assessment and management. Participants., Sixty participants with moderate to severe pain were identified by nursing staff or chosen based on the pain items from the most recent Minimum Data Set assessment. Measures., Checklist of Nonverbal Pain Indicators (CNPI) and the Pain Assessment in Advanced Dementia (PAINAD), demographic and pain-related data (Minimum Data Set), nursing assistant reports of participants' usual pain intensity, and Pittsburgh Agitation Scale. Results., Internal consistency for both tools was good except for the CNPI at rest for one rater. Inter-rater reliability for pain presence was fair (K = 0.25 for CNPI with movement; K = 0.31 for PAINAD at rest) to moderate (K = 0.43 for CNPI at rest; K = 0.54 for PAINAD with movement). There were significant differences in mean CNPI and PAINAD scores at rest and during movement, providing support for construct validity. However, both tools demonstrated marked floor effects, particularly when participants were at rest. Conclusions., Despite earlier studies supporting the reliability and validity of the CNPI and the PAINAD, findings from the current study indicate that these measures warrant further study with clinical users, should be used cautiously both in research and clinical settings and only as part of a comprehensive approach to pain assessment. [source]

    A cognitive aid for neonatal resuscitation: a randomized controlled trial

    M.D. Bould
    Introduction:, Anaesthetists are among several health care practitioners responsible for neonatal resuscitation in Canada. The Neonatal resuscitation program (NRP) courses are the North American educational standard. NRP has been shown to be an effective way of learning skills and knowledge but retention has been found to be problematic [1]. The use of cognitive aids is mandatory in industries such as aviation, to avoid dependence on memory when decision making in critical situations. Visual cognitive aids have been studied retrospectively in resuscitation and performance was found to correlate to the frequency of use of the aid [2]. Cognitive aids have been found to be of benefit in an unblinded prospective study [3]. We aimed to conduct the first blinded study on the effect of a cognitive aid on the performance of simulated resuscitation. Methods:, We conducted a single-blind randomized controlled trial to investigate whether the presence of a cognitive aid improved performance in a simulated neonatal resuscitation. After ethics board approval we recruited 32 anaesthesia residents who had previously passed the NRP. Subjects were randomized to an intervention group that had a poster detailing the NRP algorithm and a control group without the poster. The cognitive aid was positioned so that it could not be seen on the video recordings of the simulation that was used to assess performance. The scenario was piloted to confirm adequate blinding. Both groups had their performance in a simulated neonatal resuscitation recorded and subsequently analyzed by a peer, an expert anaesthetist and an expert neonatologist, using a previously validated checklist. A further rater observed the scenario in real time to examine frequency of use of the cognitive aid. Results:, The inter-rater reliability of the checklist was excellent with an intraclass correlation coefficient of 0.88. Consequently the mean of the scores assigned by all three raters was used for analysis. The median checklist score in the control group 18.2 [15.0,20.5 (10.7,25.3)] was not significantly different from that in the intervention group 20.3 [18.3,21.3 (15.0,24.3)] (P = 0.08). Retention of NRP skills and knowledge of was poor: when evaluated by the neonatologist none of the subjects correctly performed all life-saving interventions necessary to pass the checklist. Although only one subject in the intervention group did not use the aid at all, only 26.7% used the aid frequently and none used it extensively. Discussion:, Retention of skills after NRP training was poor. Our study confirms previous findings of poor retention of skills after NRP training: Kaczorowski et al. investigated family medicine trainees and found that none of 44 residents that were retested 6,8 months after an NRP course would have passed the course due to errors in life-saving interventions [1]. Previous research has shown that the presence of a cognitive aid can improve performance in the simulated management of a rare, high stakes scenario: malignant hyperthermia [3]. Our negative findings contrast with this and another previous study [2]. A potential reason for this discrepancy is that the raters in the previous studies were not blinded to group allocation, nor were the rating scales used validated. The infrequent use of the cognitive aid may be the reason that it did not improve performance in. Further research is required to investigate whether cognitive aids can be useful if their use is incorporated into NRP training. Conclusion:, A randomized single-blinded trial found that a cognitive aid did not improve performance at simulated resuscitation, in contrast to previous retrospective and unblended studies. Retention of skills and knowledge after resuscitation training remains an ongoing challenge for medical educators. [source]

    Good, bad, or in-between: How does the daily behavior report card rate?

    Sandra M. Chafouleas
    Our purpose here was to define and review the daily behavior report card (DBRC) as a monitoring and/or intervention technique. We considered a measure of a DBRC to be if a specified behavior was rated at least daily, and that information was shared with someone other than the rater. In general, it has been suggested that DBRCs may be feasible, acceptable, effective in promoting a positive student, and a way to increase parent/teacher communication. In addition, DBRCs are highly adaptive in that they represent a broad array of both monitoring and intervention possibilities rather than having a single, scripted purpose. All of these characteristics make the DBRC appealing for use in applied settings. However, an extensive, methodologically sound literature base does not yet exist. Despite the appeal of using DBRCs, widespread endorsement cannot be made without caution. We conclude with implications for use in practice and highlight areas in need of further investigation. © 2002 Wiley Periodicals, Inc. [source]

    Multisource feedback in the assessment of physician competencies

    Jocelyn Lockyer PhD DirectorArticle first published online: 22 APR 200
    Abstract Multisource feedback (MSF), or 360-degree employee evaluation, is a questionnaire-based assessment method in which ratees are evaluated by peers, patients, and coworkers on key performance behaviors. Although widely used in industrial settings to assess performance, the method is gaining acceptance as a quality improvement method in health systems. This article describes MSF, identifies the key aspects of MSF program design, summarizes some of the salient empirical research in medicine, and discusses possible limitations for MSF as an assessment tool in health care. In industry and in health care, experience suggests that MSF is most likely to succeed and result in changes in performance when attention is paid to structural and psychometric aspects of program design and implementation. A carefully selected steering committee ensures that the behaviors examined are appropriate, the communication package is clear, and the threats posed to individuals are minimized. The instruments that are developed must be tested to ensure that they are reliable, achieve a generalizability coefficient of Ep2 = .70, have face and content validity, and examine variance in performance ratings to understand whether ratings are attributable to how the physician performs and not to factors beyond the physician's control (e.g., gender, age, or setting). Research shows that reliable data can be generated with a reasonable number of respondents, and physicians will use the feedback to contemplate and initiate changes in practice. Performance may be affected by familiarity between rater and ratee and sociodemographic and continuing medical education characteristics; however, little of the variance in performance is explained by factors outside the physician's control. MSF is not a replacement for audit when clinical outcomes need to be assessed. However, when interpersonal, communication, professionalism, or teamwork behaviors need to be assessed and guidance given, it is one of the better tools that may be adopted and implemented to provide feedback and guide performance. [source]

    Rapid tryptophan depletion as a treatment for acute mania: a double-blind, pilot-controlled study

    BIPOLAR DISORDERS, Issue 8 2007
    Julia Applebaum
    Objectives:, Rapid reduction of up to 80% in plasma tryptophan level can be accomplished by administering an oral tryptophan-free amino acid solution, which induces hepatic protein synthesis and thereby depletes available plasma tryptophan. Rapid tryptophan depletion (RTD) has been shown to induce transient depressive symptoms in patients with remitted major depression. The effect of RTD in acutely manic patients has not been studied. Methods:, We carried out a double-blind, placebo-controlled pilot study of RTD in acutely manic patients. Patients were randomized to the treatment groups. Sodium valproate treatment was started at a dose of 1000 mg/day and continued throughout the 7-day study. On days 1-7, patients received a daily tryptophan-free amino acid drink or a placebo drink. The tryptophan-free amino acid drink contained a mix of amino acids without tryptophan. The placebo drink contained the additives and constituents of the real mixture to provide identical flavor and texture without the amino acids. Ratings were administered at baseline and then repeated on days 3, 5, and 7. All ratings were carried out by an experienced rater who was blind to the group assignment of patients. Results:, A total of 23 patients entered the study and 17 completed the 7-day treatment protocol. The patients who received the amino acid drink showed greater improvement in mania ratings. The differences in Young Mania Rating Scale (YMRS) and Clinical Global Impression (GCI) scores were significant. However, the intolerance rate was high (23%) and the findings in this pilot study are based only on results from those patients who were able to tolerate the drink. Conclusions:, Rapid tryptophan depletion may have an antimanic effect. [source]

    A study examining inter-rater and intrarater reliability of a novel instrument for assessment of psoriasis: the Copenhagen Psoriasis Severity Index

    J. Berth-Jones
    Summary Background, There is a perceived need for a better method for clinical assessment of the severity of psoriasis vulgaris. The most frequently used system is the Psoriasis Area and Severity Index (PASI), which has significant disadvantages, including the requirement for assessment of the percentage of skin affected, an inability to separate milder cases, and a lack of linearity. The Copenhagen Psoriasis Severity Index (CoPSI) is a novel approach which comprises assessment of three signs: erythema, plaque thickness and scaling, each on a four-point scale (0, none; 1, mild; 2, moderate; 3, severe), at each of 10 sites: face, scalp, upper limbs (excluding hands and wrists), hands and wrists, chest and abdomen, back, buttocks and sacral area, genitalia, lower limbs (excluding feet and ankles), feet and ankles. Objectives, To evaluate the inter-rater and intrarater reliability of the CoPSI and to provide comparative data from the PASI and a Physician's Global Assessment (PGA) used in recent clinical trials on psoriasis vulgaris. Methods, On the day before the study, 14 dermatologists (raters) with an interest in psoriasis participated in a detailed training session and discussion (2·5 h) on use of the scales. On the study day, each rater evaluated 16 adults with chronic plaque psoriasis in the morning and again in the afternoon. Raters were randomly assigned to assess subjects using the scales in a specific sequence, either PGA, CoPSI, PASI or PGA, PASI, CoPSI. Each rater used one sequence in the morning and the other in the afternoon. The primary endpoint was the inter-rater and intrarater reliability as determined by intraclass correlation coefficients (ICCs). Results, All three scales demonstrated ,substantial' (a priori defined as ICC > 80%) intrarater reliability. The inter-rater reliability for each of the CoPSI and PASI was also ,substantial' and for the PGA was ,moderate' (ICC 61%). The CoPSI was better at distinguishing between milder cases. Conclusions, The CoPSI and the PASI both provided reproducible psoriasis severity assessments. In terms of both intrarater and inter-rater reliability values, the CoPSI and the PASI are superior to the PGA. The CoPSI may overcome several of the problems associated with the PASI. In particular, the CoPSI avoids the need to estimate a percentage of skin involved, is able to separate milder cases where the PASI lacks sensitivity, and is also more linear and simpler. The CoPSI also incorporates more meaningful weighting of different anatomical areas. [source]

    A study examining inter- and intrarater reliability of three scales for measuring severity of psoriasis: Psoriasis Area and Severity Index, Physician's Global Assessment and Lattice System Physician's Global Assessment

    J. Berth-Jones
    Summary Background, There is a lack of consensus as to the best way of monitoring psoriasis severity in clinical trials. The Psoriasis Area and Severity Index (PASI) is the most frequently used system and the Physician's Global Assessment (PGA) is also often used. However, both instruments have some drawbacks and neither has been fully evaluated in terms of ,validity' and ,reliability' as a psoriasis rating scale. The Lattice System Physician's Global Assessment (LS-PGA) scale has recently been developed to address some disadvantages of the PASI and PGA. Objectives, To evaluate the inter-rater and intrarater reliability of the PASI, PGA and LS-PGA. Methods, On the day before the study, 14 dermatologists (raters), with varied experience of assessing psoriasis, received detailed training (2·5 h) on use of the scales. On the study day, each rater evaluated 16 adults with chronic plaque psoriasis in the morning and again in the afternoon. Raters were randomly assigned to assess subjects using the scales in a specific sequence, either PGA, LS-PGA, PASI or PGA, PASI, LS-PGA. Each rater used one sequence in the morning and the other in the afternoon. The primary endpoint was the inter-rater and intrarater reliability as determined by intraclass correlation coefficients (ICCs). Results, All three scales demonstrated ,substantial' (a priori defined as ICC > 80%) intrarater reliability. The inter-rater reliability for each of the PASI and LS-PGA was also ,substantial' and for the PGA was ,moderate' (ICC 75%). Conclusions, Each one of the three scales provided reproducible psoriasis severity assessments. In terms of both intrarater and inter-rater reliability values, the three scales can be ranked from highest to lowest as follows: PASI, LS-PGA and PGA. [source]

    Psychopathy and offence severity in sexually aggressive and violent youth

    Amber Fougere
    Background,A large proportion of violent crimes are committed by youths. Youths with psychopathic traits may have a higher risk for recidivism and violence. Aims/hypotheses,Our aim was to compare sexually aggressive with violent young men on offence severity and psychopathy. Three hypotheses were proposed: first, young men with previous offences would display a progressive increase in seriousness of offence during their criminal career; secondly, the sexually aggressive and violent young men would not differ in scores on the Hare Psychopathy Checklist: Youth Version (PCL:YV); but, thirdly, PCL:YV scores would be positively correlated with the severity of the index crime, as measured by the Cormier,Lang System for Quantifying Criminal History. Methods,Information was collected from the files of 40 young men in conflict with the law, and the PCL:Youth Version (YV) rated from this by trained raters. Results,The offences of these young men became more serious over time, but we found no association between PCL:YV scores and offence type or seriousness. Conclusions and implications,This exploratory research suggests the importance of understanding the progression in offending careers, but a limited role for the PCL:YV in doing so. Given the small sample size, however, and the limit on access to information about details of age, the findings need replication. Copyright © 2009 John Wiley & Sons, Ltd. [source]

    Reliability and validity of a structured interview guide for the Hamilton Anxiety Rating Scale (SIGH-A)

    M. Katherine Shear M.D.
    Abstract The Hamilton Anxiety Rating Scale, a widely used clinical interview assessment tool, lacks instructions for administration and clear anchor points for the assignment of severity ratings. We developed a Structured Interview Guide for the Hamilton Anxiety Scale (SIGH-A) and report on a study comparing this version to the traditional form of this scale. Experienced interviewers from three Anxiety Disorders research sites conducted videotaped interviews using both traditional and structured instruments in 89 participants. A subset of the tapes was co-rated by all raters. Participants completed self-report symptom questionnaires. We observed high inter-rater and test-retest reliability using both formats. The structured format produced similar but consistently higher (+ 4.2) scores. Correlation with a self-report measure of overall anxiety was also high and virtually identical for the two versions. We conclude that in settings where extensive training is not practical, the structured scale is an acceptable alternative to the traditional Hamilton Anxiety instrument. Depression and Anxiety 13:166,178, 2001. © 2001 Wiley-Liss, Inc. [source]

    Making Scents: Improvement of Olfactory Profile after Botulinum Toxin-A Treatment in Healthy Individuals

    BACKGROUND The axilla is particularly associated with body odor and putative pheromone production in humans. Although botulinum toxin type A (BT-A) is injected increasingly into the axillary skin to stop excessive sweating, its potential to control body odor is largely unexplored. OBJECTIVE The objective was to measure the impact of BT-A on human axillary odor in an objective and reproducible fashion. METHODS This study was a randomized, double-blind, placebo-controlled trial with 51 healthy volunteers receiving 50 U of BOTOX (Allergan, Inc.) in one axilla and placebo in the other. Odor quality was assessed by treated subjects (questionnaire) as well as by independent raters who were exposed to blinded T-shirt samples. RESULTS No major side effects occurred, and no subject withdrew from the study for medical reasons. Samples from the BT-A,treated side smelled less intense (p<.001) and better (p<.001) according to self-assessments. Likewise, independent raters found the BT-A,treated samples to smell less intense and better (p<.001). They preferred "to work together with the respective person" and found the odor "more erotic" (p<.001). CONCLUSION Side-by-side comparison of odor samples (T-shirt sniff test) by independent raters showed that axillary odor in healthy individuals is significantly more appealing after BT-A injection. [source]

    Prospective comparison of course of disability in antipsychotic-treated and untreated schizophrenia patients

    J. Thirthalli
    Objective:, To compare the course of disability in schizophrenia patients receiving antipsychotics and those remaining untreated in a rural community. Method:, Of 215 schizophrenia patients identified in a rural south Indian community, 58% were not receiving antipsychotics. Trained raters assessed the disability in 190 of these at baseline and after 1 year. The course of disability in those who remained untreated was compared with that in those who received antipsychotics. Results:, Mean disability scores remained virtually unchanged in those who remained untreated, but showed a significant decline (indicating decrement in disability) in those who continued to receive antipsychotics and in those in whom antipsychotic treatment was initiated (P < 0.001; group × occasion effect). The proportion of patients classified as ,disabled' declined significantly in the treated group (P < 0.01), but remained the same in the untreated group. Conclusion:, Disability in untreated schizophrenia patients remains unchanged over time. Treatment with antipsychotics in the community results in a considerable reduction in disability. [source]