In December 2019, the novel coronavirus severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) emerged in the city of Wuhan in Hubei province, People’s Republic of China, as the etiologic agent of coronavirus disease 2019 (COVID-19), which has hence spread worldwide causing a global pandemic (
1–3). The epidemic has been growing exponentially in Italy for the last month, affecting over 60,000 individuals so far and with a heavy mortality burden. Italy is only anticipating what will be the trend in the whole of Europe and elsewhere. At the beginning of March 2020, the first nasopharyngeal swabs positive for SARS-CoV-2 started to be detected in the Northern Eastern Region of Friuli-Venezia Giulia. These identifications followed the expansion of the two clusters in Lombardy and Veneto that emerged in the previous weeks in northern Italy (
4). Swab contents were seeded on Vero E6 cells and monitored for cytopathic effect and by an RT-PCR protocol using primers for the N region (
5). Cell culture supernatants from passage 1 (P1) of four isolates were collected, and RNA was extracted with QIAamp viral RNA minikit (Qiagen) and quantified with an
in vitro-transcribed RNA standard (S. Rajasekharan and A. Marcello, unpublished data). The quantity and quality of the RNA were assessed using Qubit 2.0 fluorometer (Thermo Fisher Scientific) and Agilent 2100 Bioanalyzer (Agilent Technologies). For each sample, 100 ng of total RNA was processed using Zymo-Seq RiboFree ribosomal depletion library preparation kit (Zymo Research). All the obtained libraries passed quality check and were quantified before being pooled at equimolar concentration and sequenced on Illumina Nano MiSeq 2- by 150-bp paired-end mode following standard procedures. Sequenced reads that passed the quality check (Phred score ≥30) were adaptor and quality trimmed, and the remaining reads were assembled
de novo using Megahit (v.1.2.9) with default parameter settings. Megahit generated in all cases 7 contigs with more than 1,000 bp and 100× coverage; all of these assembled contigs were compared (using BLASTn) against the entire nonredundant (nr) nucleotide and protein databases. In all cases the longest and more covered contigs were identified as
MT019532.1, “Severe acute respiratory syndrome coronavirus 2 isolate BetaCoV/Wuhan/IPBCAMS-WH-04/2019, complete genome,” with 99% identity and 0 gaps. The longer sequences were named hCoV-19/Italy/FVG/ICGEB_S1, _S5, _S8, and _S9 and were deposited in GISAID (see below). Sequence analysis showed an uneven coverage along the SARS-CoV-2 genome, with an average range from 126 to 7,576 reads and a mean coverage per sample of 1,169× (
Fig. 1). Phylogenetic trees were inferred using the maximum likelihood method implemented in the MEGAX program using the GISAID sequences available at 03-16-2020 (
6). Bootstrap support values were calculated from 500 pseudoreplicate trees of the whole data set (
Fig. 2).
Despite a high burden of COVID-19 in Italy, very little information is available to date from full-length high-quality sequences. The first sequences deposited in GISAID (EPI_ISL_410545 and EPI_ISL_410546) were collected in Rome from a Chinese tourist from Hubei province who got infected before visiting Italy, and another one (EPI_ISL_412974) was from a test-positive Italian citizen returning from China. Only two sequences were reported from the Lombardy cluster (EPI_ISL_412973 and EPI_ISL_413489). In this report four additional sequences from cases epidemiologically linked to northern Italy have been examined. All infected individuals were connected to the city of Udine; S1 and S5 were from the same cluster of closely related cases, while S9 got infected probably in Lombardy and S8 visited Udine from a neighboring city (
Table 1). Sequence analysis showed a good coverage along the SARS-CoV-2 genome for all four isolates (
Fig. 1). Based on the marker variant S D614G, all four sequences grouped in the Bavarian rooted subclade G, which is dominant in Europe, including the sequence from Lombardy, but distinct from the three sequences mentioned above originating directly from China (
7). Intriguingly, the new isolates were more closely related to EPI_ISL_412973, while EPI_ISL_413489 was more distant (
Fig. 2). No evidence could be found for the putative 382-nucleotide (nt) deletion in ORF8 detected in Singapore, which has been proposed to indicate an attenuated phenotype (
8).
These findings strongly urge the need for comprehensive studies that combine genomic data with epidemiological data and clinical records of symptoms from patients with COVID-19.