The CDC's LMBP “A-6 Cycle” systematic review methods for evaluating quality improvement practices was used for conducting this review. The methodology, reported in detail elsewhere (
24), is derived from previously validated methods. It is designed to assess the results of studies of practice effectiveness that lead to best-practice recommendations that are evidence based. Using this method, a review coordinator (author Mark T. LaRocco) and individuals trained to apply the LMBP methods (authors Alice S. Weissfeld and Elizabeth K. Leibach) conducted the systematic review with guidance from an expert panel. The expert panelists (authors Nancy E. Cornish, Colleen S. Kraft, Vickie Baselski, Robert L. Sautter, Edward J. Peterson, and Debra Rodahl) were chosen based on their breath of experience and perspective in clinical microbiology and laboratory management. A description of their scientific credentials and professional affiliations can be found in the author biography section. Lastly, the team was supported by a statistician with expertise in evidence review methodologies and meta-analysis (author Jacob Franek). The expert panel reviewed the results of the evidence review and drafted the evidence-based best-practice recommendations. The recommendations were then approved by the LMBP Workgroup, consisting of 13 invited members with broad expertise in laboratory medicine, clinical practice, health services research, and health policy, as well as one
ex officio representative from the Centers for Medicare and Medicaid Services. A list of the members of the LMBP Workgroup is provided in Appendix 1.
Review Question, Analytical Framework, and Search Strategy
The review question addressed by this analytical review was as follows: “Are there preanalytic practices related to the collection, preservation, transport, and storage of urine for microbiological culture that improve the diagnosis and management of patients with urinary tract infection?” Components of the preanalytic phase of urine culture were studied in the context of an analytical framework for factors affecting specimen contamination and diagnostic accuracy, depicted in
Fig. 1. The population, intervention, comparison, and outcome (PICO) elements are as follows.
•
“Population” is any patients who have urine cultures collected.
•
“Intervention” is clinical practice.
•
“Comparison” is made of
◦
immediate versus delayed processing of urine held at room temperature,
◦
immediate versus delayed processing of refrigerated urine or urine preserved in boric acid,
◦
midstream clean-catch collection of urine without cleansing versus with cleansing (men and women),
◦
midstream clean-catch collection of urine without cleansing versus with cleansing versus collection with a sterile urine bag versus diaper collection for infants and children.
•
“Outcomes” are the results of determining the contamination rate and the diagnostic accuracy of urine culture.
Specific practices involving the preanalytic phase of urine culture covered in this evidence-based review were addressed by asking the following eight clinical questions.
1.
What is the difference in colony counts when comparing immediate versus delayed processing of fresh urine stored at room temperature after collection?
2.
What is the difference in colony counts when comparing immediate versus delayed processing of urine kept refrigerated or preserved in boric acid?
3.
What is the difference in contamination rates between midstream urine collected with cleansing versus without cleansing in women being tested for a UTI?
4.
What is the diagnostic accuracy of midstream urine collected with or without cleansing compared to bladder catheterization for the diagnosis of UTI in women?
5.
What is the difference in contamination rates between midstream urine collection, with or without cleansing, and first-void collection in men?
6.
What is the diagnostic accuracy of midstream urine collected, with or without cleansing, compared to that of bladder catheterization or suprapubic aspiration for the diagnosis of UTI in men?
7.
What are the differences in contamination rates between midstream collection with cleansing, midstream collection without cleansing, and sterile urine bag or diaper collection in children?
8.
What is the diagnostic accuracy of midstream clean-catch, sterile urine bag, or diaper collection compared with that of suprapubic aspiration or catheterization for the diagnosis of UTI in children?
The search for studies of practice effectiveness was conducted to identify those with measurable outcomes collected to the rigor of review requirements. With input from the expert panel and assistance of a research librarian at the Jesse Jones Library at the Texas Medical Center in Houston, TX, a literature search strategy and set of terms were developed. A search of three electronic bibliographic databases (PubMed, SCOPUS, and CINAHL) for English-language articles published between 1965 and 2014 was conducted. In addition, hand searching of bibliographies from relevant information sources was performed. All search results were catalogued and maintained using a Web-based, commercial reference software package (RefWorks; ProQuest LLC, Ann Arbor, MI). Finally, solicitation of unpublished quality improvement studies was attempted by posting requests for data on both the Laboratory Medicine Best Practices website (
https://wwwn.cdc.gov/futurelabmedicine/) and two listservs supported by the American Society for Microbiology: clinmicronet (
http://www.asm.org/index.php/online-community-groups/listservs) and DivCNet (
http://www.asm.org/division/c/divcnet.htm).
The search contained the following medical subject headings (MESH) and key text words: “urinary tract infections” (MESH) OR UTI (text word) OR urinary tract infect* (text word); “urine/analysis” (major) OR “urine/microbiology” (major) OR “urinalysis” (MESH); “specimen handling” (major); “preservation, biological” (MESH) OR preservation, biological (text word) OR “boric acids” (MESH) OR boric acid (text word) OR boric acid/borate (text word) OR boric acids (text word) OR “refrigeration” (MESH) OR refrigeration (text word) OR preserv* (text word); storage (text word); “time factors” (MESH) OR “transportation” (MESH) OR transport time (text word) OR delay (text word) OR time delay (text word) OR time factor (text word) OR timing (text word); “urine specimen collection” (MESH) OR urine specimen collection (text word) OR “catheters, indwelling” (MESH) OR catheters, indwelling (text word) OR “urinary reservoirs, continent” (MESH) OR urinary reservoirs, continent (text word) OR “urinary catheterization” (MESH) OR urinary catheterization (text word) OR “intermittent urethral catheterization” (MESH) OR intermittent urethral catheterization (text word) OR clean voided (text word) OR midstream (text word) OR midstream (text word) OR midstream (text word) OR foley (text word) OR suprapubic (text word); and “bacteriological techniques” (MESH) OR bacteriological technique (text word) OR bacteriological techniques (text word) OR “microbiological techniques” (MESH) OR microbiological technique (text word) OR microbiological techniques (text word).
Titles and abstracts were initially screened by the review coordinator, with assistance from the expert panel when necessary, to select studies for a full review. A study was included if it was considered likely to provide valid and useful information and met the PICO criteria previously discussed. Specifically, these inclusion criteria required that a study (i) address a defined population/definable group of patients, (ii) evaluate a specific intervention/practice included in this review, (iii) describe at least one finding for a relevant outcome measure (percent contamination, diagnostic accuracy) reproducible in comparable settings, and (iv) present results in a format which was useful for statistical analysis. Studies failing to meet the inclusion criteria (not considered to report a relevant practice, did not include a practice of interest, or did not present an outcome measure of interest) were excluded from further review.
Studies that cleared this initial screening were then abstracted and evaluated by the expert panel. For eligible studies, information on study characteristics, interventions, outcome measures, and findings of the study was extracted using a standardized form and assigned a quality rating derived from points awarded for meeting quality criteria. Individual quality ratings were based on four dimensions: study quality, practice effectiveness, defined outcome measure(s), and findings/results. The objective for rating individual study quality was to judge whether sufficient evidence of practice effectiveness was available to support inclusion in an overall body of evidence for evaluation of a best-practice recommendation (that is, a practice likely to be effective in improving one or more outcomes of interest in comparison to other commonly used practices).
The four study quality dimensions were rated separately, with a rating score assigned up to the maximum for a given dimension. The rating scores for all four dimensions were added to reach a single summary score reflecting overall study quality. A total of 10 points were available for each study. Reviewers assigned one of three quality ratings to each study: good (8 to 10 points), fair (5 to 7 points), or poor (4 points or less). Each study was reviewed and rated by two expert panel members to minimize subjectivity and bias. Any study ranked as poor by one reviewer but good by the second reviewer was assigned to a third expert panel member for resolution. More detail on the rating process of individual studies can be found elsewhere (
24 – 26). Studies that did not meet a study quality rating of fair or good were excluded from further consideration. Data from published studies that passed a full review were transformed to a standardized, common metric according to LMBP methods (
24). Summary data and quality scores for each publication included in this evidence-based review can be found in Appendix 3 below.
The study quality ratings and results from the individual studies for each clinical question were aggregated into bodies of evidence. The consistency of effects and patterns of effects across studies and the rating of overall strength of the body of evidence (high, moderate, low, suggestive, and insufficient) were based on both qualitative and quantitative analyses. Estimates of effect and the strength of the body of evidence were then used to translate results into one of three evidence-based recommendations (recommend, no recommendation for or against, recommend against). The ratings criteria are described in greater detail elsewhere (
24).
While recommendations are based on the entire body of evidence, meta-analyses to generate summary estimates of effect were undertaken for outcomes that provided sufficient data for measurements of diagnostic accuracy and contamination, i.e., proportions of specimens containing periurethral, perianal, epidermal, or vaginal flora. For the outcome of contamination proportion, summary odds ratios were calculated using Mantel-Haenszel methods in a random-effects model performed using Review Manager (RevMan) software version 5.0 (2008; The Nordic Cochrane Centre, The Cochrane Collaboration, Copenhagen, DK). A contamination event was defined according to how individual studies defined contamination because definitions varied between studies. Wherever possible, contamination proportions were determined for the entire test population rather than a subset population (such as only among those individuals that tested negative for urinary tract infection). The I
2 statistic, which describes the percentage of variability in effects estimates due to statistical heterogeneity rather than sampling error, was used to assess between-study heterogeneity. For the outcomes of diagnostic accuracy, it was planned that point estimates of sensitivity and specificity would be summarized using the bivariate model when similar cutoff points were used; however, all models failed to converge due to a too-small number of study or sample sizes. Similarly, hierarchical summary receiver operator characteristic curves (HSROC) could not be generated because these models too failed to converge. Solutions for failure of convergence, including removing individual studies, were explored but did not improve convergence. Meta-analysis of diagnostic accuracy outcomes and curve fitting were not pursued further given the limitations of univariate methods. All work on summarizing diagnostic accuracy outcomes was performed using SAS software version 9.2 (2008; SAS Institute Inc., Cary, NC, USA) and the MetaDAS macro, version 1.3 (
27). Significant growth (i.e., a positive sample) was defined according to how each individual study defined significant growth because cutoff points tended to vary among studies. All other growth, including contamination and no growth, were considered nonsignificant growth (i.e., a negative sample), as this most closely reflects actual clinical practice. Two-by-two tables were used to determine sensitivity and specificity, and exact 95% confidence intervals were calculated.