Key messages
• When used with RT-PCR tests (a molecular test that detects genetic material in COVID-19 using a technique called reverse transcription polymerase chain reaction), self-collected gargle and deep throat saliva samples have a similar sensitivity compared to trained healthcare worker-collected nasopharyngeal samples (taken from the back of the throat through the nose) in detecting COVID-19.
• When used with RT-PCR, samples collected from the nose, oropharynx (throat via the mouth), oral cavity, and other saliva collection methods are less sensitive for detecting COVID-19 compared to healthcare worker-collected nasopharyngeal samples.
• When used with rapid antigen tests (Ag-RDTs; at-home/self-tests), samples collected from the nose have a similar sensitivity to healthcare worker-collected nasopharyngeal samples in detecting COVID-19.
Why is improving the diagnosis of COVID-19 important?
Coronavirus disease (COVID-19) is caused by infection with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). People with suspected COVID-19 may decide to take a test to know whether they are infected, so that they can receive treatment, and follow recommended guidance to self-isolate and inform close contacts. Not detecting COVID-19 when it is present (a false negative result) risks spreading infection and results in missed opportunities for treatment.
Types of sample collection methods for diagnosing COVID-19?
The type and quality of sample taken for confirmation of COVID-19 affects the reliability of diagnosis. The most accurate type of sample to diagnose COVID-19 is that taken by a trained healthcare worker from the back of the throat through the nose (a nasopharyngeal sample). This type of test detects genetic material in the virus using a technique called reverse transcription polymerase chain reaction (RT-PCR). However, this sample is difficult to obtain correctly, causes discomfort and risks spreading infection if individuals cough or sneeze when the sample is taken. Alternative sample types, particularly those that can be self-collected using rapid antigen tests (Ag-RDTs; i.e. self-tests), may reduce cost and discomfort and improve the safety of sampling. This may, in turn, improve access to and uptake of testing.
What did we want to find out?
We wanted to compare the sensitivity of different sample sites and collection methods in detecting COVID-19 with molecular tests (RT-PCR tests) or self-based tests (Ag-RDT tests).
What did we do?
We searched for studies that had compared the accuracy of nasopharyngeal samples to any alternative that could be used in patients outside of hospital, including nose (nasal) samples, throat samples taken through the mouth (oropharyngeal), gargle samples and saliva samples. We looked at the use of samples with either RT-PCR or Ag-RDTs. We also searched for studies that had compared different methods for taking samples, such as samples collected by a healthcare worker compared to those collected by individuals with no or minimal instructions.
What did we find?
The review included 106 studies with a total of 60,523 participants, of whom 11,045 had COVID-19 infection. Fifty-nine per cent of studies were conducted on adults and 79% on symptomatic or mixed symptomatic and asymptomatic participants. Sixty per cent of studies took place in Europe or the USA; just over half (55%) took place in dedicated COVID-19 testing centres or in outpatient settings.
Main results
With RT-PCR, on average:
- 100% of positive nasopharyngeal samples collected by healthcare workers would also test positive on self-collected gargle samples or saliva samples (collected by coughing and then spitting (deep throat saliva));
- 88% of positive nasopharyngeal samples collected by healthcare workers would also test positive with self- or healthcare worker-collected nose samples;
- 87% of positive nasopharyngeal samples collected by healthcare workers would also be detected by saliva self-collected using spitting, 84% by saliva self-collected using drooling and 79% by saliva self-collected by sucking on a swab; and
- 83% of positive nasopharyngeal samples collected by healthcare workers would also be detected by self- or healthcare worker-collected oropharyngeal samples.
With Ag-RDTs, on average:
- 100% of positive nasopharyngeal samples collected by healthcare workers would also be detected by self-collected or healthcare worker-collected nose samples.
Summary results
The results of these studies indicate that in a group of 1000 people, of whom 230 (23%) have COVID-19, then:
when used with PCR, compared to healthcare worker-collected nasopharyngeal samples:
- no cases of COVID-19 would be missed using self-collected gargle samples (12 less to 5 more) or deep throat saliva samples (2 less to 48 more);
- 28 (16 to 39) fewer cases of COVID-19 infection would be detected using healthcare worker- or self-collected nose sample;
- 30 (18 to 41) fewer cases of COVID-19 infection would be detected by saliva self-collected using spitting, 37 (12 to 62) fewer by saliva collected by drooling and 48 (12 to 85) fewer by saliva collected by sucking on a swab;
- 39 (12 to 67) fewer cases of COVID-19 infection would be detected by healthcare worker- or self-collected oropharyngeal samples; and
when used with Ag-RDTs, compared to healthcare worker-collected nasopharyngeal samples:
no cases of COVID-19 infection would be missed using healthcare worker- or self-collected nose samples.
What are the limitations of the evidence?
It was often not clear whether included studies deliberately excluded inadequate samples or whether the results of the more accurate nasopharyngeal sample were known when alternative samples were interpreted. This may have resulted in alternative sample types appearing more accurate than they are in practice, decreasing the number of missed cases of COVID-19 infection.
More than half of studies did not give information about how long participants had had symptoms at the time of sampling. This reduces our confidence in the comparison of different sample types.
Most studies evaluated self-collected samples by adults with symptoms for use with RT-PCR; therefore, the findings of this review may not be applicable to asymptomatic individuals or children. For studies conducted with Ag-RDTs, it is unclear whether sensitivity estimates of nose samples are applicable to home use (self-collected and self-interpreted).
How up to date is this review?
The evidence is current to 22 February 2022.
When used with RT-PCR, there is no evidence for a difference in sensitivity of self-collected gargle or deep-throat saliva samples compared to nasopharyngeal samples collected by healthcare workers when used with RT-PCR. Use of these alternative, self-collected sample types has the potential to reduce cost and discomfort and improve the safety of sampling by reducing risk of transmission from aerosol spread which occurs as a result of coughing and gagging during the nasopharyngeal or oropharyngeal sample collection procedure. This may, in turn, improve access to and uptake of testing. Other types of saliva, nasal, oral and oropharyngeal samples are, on average, less sensitive compared to healthcare worker-collected nasopharyngeal samples, and it is unlikely that sensitivities of this magnitude would be acceptable for confirmation of SARS-CoV-2 infection with RT-PCR.
When used with Ag-RDTs, there is no evidence of a difference in sensitivity between nasal samples and healthcare worker-collected nasopharyngeal samples for detecting SARS-CoV-2. The implications of this for self-testing are unclear as evaluations did not report whether nasal samples were self-collected or collected by healthcare workers. Further research is needed in asymptomatic individuals, children and in Ag-RDTs, and to investigate the effect of operator expertise on accuracy.
Quality assessment of the evidence base underpinning these conclusions was restricted by poor reporting. There is a need for further high-quality studies, adhering to reporting standards for test accuracy studies.
Sample collection is a key driver of accuracy in the diagnosis of SARS-CoV-2 infection. Viral load may vary at different anatomical sampling sites and accuracy may be compromised by difficulties obtaining specimens and the expertise of the person taking the sample. It is important to optimise sampling accuracy within cost, safety and accessibility constraints.
To compare the sensitivity of different sampling collection sites and methods for the detection of current SARS-CoV-2 infection with any molecular or antigen-based test.
Electronic searches of the Cochrane COVID-19 Study Register and the COVID-19 Living Evidence Database from the University of Bern (which includes daily updates from PubMed and Embase and preprints from medRxiv and bioRxiv) were undertaken on 22 February 2022. We included independent evaluations from national reference laboratories, FIND and the Diagnostics Global Health website. We did not apply language restrictions.
We included studies of symptomatic or asymptomatic people with suspected SARS-CoV-2 infection undergoing testing. We included studies of any design that compared results from different sample types (anatomical location, operator, collection device) collected from the same participant within a 24-hour period.
Within a sample pair, we defined a reference sample and an index sample collected from the same participant within the same clinical encounter (within 24 hours). Where the sample comparison was different anatomical sites, the reference standard was defined as a nasopharyngeal or combined naso/oropharyngeal sample collected into the same sample container and the index sample as the alternative anatomical site. Where the sample comparison was concerned with differences in the sample collection method from the same site, we defined the reference sample as that closest to standard practice for that sample type. Where the sample pair comparison was concerned with differences in personnel collecting the sample, the more skilled or experienced operator was considered the reference sample.
Two review authors independently assessed the risk of bias and applicability concerns using the QUADAS-2 and QUADAS-C checklists, tailored to this review.
We present estimates of the difference in the sensitivity (reference sample (%) minus index sample sensitivity (%)) in a pair and as an average across studies for each index sampling method using forest plots and tables. We examined heterogeneity between studies according to population (age, symptom status) and index sample (time post-symptom onset, operator expertise, use of transport medium) characteristics.
This review includes 106 studies reporting 154 evaluations and 60,523 sample pair comparisons, of which 11,045 had SARS-CoV-2 infection. Ninety evaluations were of saliva samples, 37 nasal, seven oropharyngeal, six gargle, six oral and four combined nasal/oropharyngeal samples. Four evaluations were of the effect of operator expertise on the accuracy of three different sample types. The majority of included evaluations (146) used molecular tests, of which 140 used RT-PCR (reverse transcription polymerase chain reaction). Eight evaluations were of nasal samples used with Ag-RDTs (rapid antigen tests). The majority of studies were conducted in Europe (35/106, 33%) or the USA (27%) and conducted in dedicated COVID-19 testing clinics or in ambulatory hospital settings (53%). Targeted screening or contact tracing accounted for only 4% of evaluations. Where reported, the majority of evaluations were of adults (91/154, 59%), 28 (18%) were in mixed populations with only seven (4%) in children. The median prevalence of confirmed SARS-CoV-2 was 23% (interquartile (IQR) 13%-40%).
Risk of bias and applicability assessment were hampered by poor reporting in 77% and 65% of included studies, respectively. Risk of bias was low across all domains in only 3% of evaluations due to inappropriate inclusion or exclusion criteria, unclear recruitment, lack of blinding, nonrandomised sampling order or differences in testing kit within a sample pair.
Sixty-eight percent of evaluation cohorts were judged as being at high or unclear applicability concern either due to inflation of the prevalence of SARS-CoV-2 infection in study populations by selectively including individuals with confirmed PCR-positive samples or because there was insufficient detail to allow replication of sample collection.
When used with RT-PCR
• There was no evidence of a difference in sensitivity between gargle and nasopharyngeal samples (on average -1 percentage points, 95% CI -5 to +2, based on 6 evaluations, 2138 sample pairs, of which 389 had SARS-CoV-2).
• There was no evidence of a difference in sensitivity between saliva collection from the deep throat and nasopharyngeal samples (on average +10 percentage points, 95% CI -1 to +21, based on 2192 sample pairs, of which 730 had SARS-CoV-2).
• There was evidence that saliva collection using spitting, drooling or salivating was on average -12 percentage points less sensitive (95% CI -16 to -8, based on 27,253 sample pairs, of which 4636 had SARS-CoV-2) compared to nasopharyngeal samples. We did not find any evidence of a difference in the sensitivity of saliva collected using spitting, drooling or salivating (sensitivity difference: range from -13 percentage points (spit) to –21 percentage points (salivate)).
• Nasal samples (anterior and mid-turbinate collection combined) were, on average, 12 percentage points less sensitive compared to nasopharyngeal samples (95% CI -17 to -7), based on 9291 sample pairs, of which 1485 had SARS-CoV-2. We did not find any evidence of a difference in sensitivity between nasal samples collected from the mid-turbinates (3942 sample pairs) or from the anterior nares (8272 sample pairs).
• There was evidence that oropharyngeal samples were, on average, 17 percentage points less sensitive than nasopharyngeal samples (95% CI -29 to -5), based on seven evaluations, 2522 sample pairs, of which 511 had SARS-CoV-2.
A much smaller volume of evidence was available for combined nasal/oropharyngeal samples and oral samples.
Age, symptom status and use of transport media do not appear to affect the sensitivity of saliva samples and nasal samples.
When used with Ag-RDTs
• There was no evidence of a difference in sensitivity between nasal samples compared to nasopharyngeal samples (sensitivity, on average, 0 percentage points -0.2 to +0.2, based on 3688 sample pairs, of which 535 had SARS-CoV-2).