<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ mafft - freeCodeCamp.org ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ mafft - freeCodeCamp.org ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Thu, 14 May 2026 22:44:40 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/tag/mafft/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ The Novel Coronavirus Epidemic in China: How to Help Researchers Using Sequence Alignment on 2019-nCoV with MAFFT ]]>
                </title>
                <description>
                    <![CDATA[ By Shen Huang Novel Coronavirus (2019-nCoV) is a deadly virus that seems to have originated in Wuhan, China. As of January 26, the virus has already caused 76 deaths. As a coronavirus targeting human respiratory systems, 2019-nCoV is highly infectiou... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/china-novel-coronavirus-epidemic-sequence-alignment-2019-ncov-mafft/</link>
                <guid isPermaLink="false">66d460f03a8352b6c5a2aafd</guid>
                
                    <category>
                        <![CDATA[ mafft ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Genetics ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Mon, 27 Jan 2020 11:04:41 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2020/01/image-64-1.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Shen Huang</p>
<p>Novel Coronavirus (2019-nCoV) is a deadly virus that seems to have originated in Wuhan, China. As of January 26, the virus has already caused 76 deaths.</p>
<p>As a coronavirus targeting human respiratory systems, 2019-nCoV is highly infectious – especially during wet and cold seasons.</p>
<p>When people sneeze, they can shoot out respiratory system-related pathogens at a high speed. These can infect humans in many ways – most often through contacting mouth, nose, and eyes.</p>
<p>To avoid infections, you should avoid outdoor activities – especially in crowded areas. It's also important to sanitize your hands often and not to rub your eyes with your hands.</p>
<p>I'm in China, and my plans for Lunar New Year are now ruined. So I decided to stay home and create this tutorial on how to obtain genetic sequence data of 2019-nCoV and perform a Sequence Alignment on it with MAFFT.</p>
<p>I hope this article raises your interest in bioinformatics research, so you can help scientists fight these viral outbreaks.</p>
<h2 id="heading-what-is-sequence-alignment-and-what-is-mafft">What is Sequence Alignment? And what is MAFFT?</h2>
<p><strong>Sequence Alignment</strong> is a way of arranging DNA, RNA, or protein to identify regions of similarity that may reveal functional, structural or evolutionary relationships between the sequences. A recent <a target="_blank" href="https://onlinelibrary.wiley.com/doi/epdf/10.1002/jmv.25682">publication</a> suggested cross-species transmission from snake to human with the help of sequence alignment through MAFFT.</p>
<p><strong>MAFFT</strong> (<strong>M</strong>ultiple <strong>A</strong>lignment using <strong>F</strong>ast <strong>F</strong>ourier <strong>T</strong>ransform) is a multiple sequence alignment program published in 2002. You can use it to perform sequence alignment for RNA sequences. <strong>Coronaviruses</strong> are, for example, viruses with a single-stranded RNA enveloped in a shell derived from the cell membranes of the host.</p>
<h2 id="heading-where-can-you-obtain-rna-sequence-data">Where Can You Obtain RNA Sequence Data?</h2>
<p>The latest update of 2019-nCoV can be found on <a target="_blank" href="https://bigd.big.ac.cn/ncov#about">NGDC</a> (National Genomics Data Center of China). In this tutorial, we will analyze the <a target="_blank" href="https://www.ncbi.nlm.nih.gov/nuccore/MN938384">2019-nCoV</a> virus and the <a target="_blank" href="https://www.ncbi.nlm.nih.gov/nuccore/MK062184">SARS-CoV</a> virus found inside the NCBI (National Center for Biotechnology Information) data bank.</p>
<p>SARS-CoV, infamously know as SARS (Severe Acute Respiratory Syndrome), has resulted 774 deaths in 17 reported countries around year 2020.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/01/image-65.png" alt="Image" width="600" height="400" loading="lazy">
<em>Example RNA Sequence Data from <a target="_blank" href="https://www.ncbi.nlm.nih.gov/">NCBI</a></em></p>
<p>I have copy and pasted the data into a file with the name of the virus. It should look something like the data in the screenshot above, with an index number followed by codes in a batch size of 10, for a total of 60 codes per line, separated by spaces.</p>
<h2 id="heading-how-to-perform-sequence-alignment-on-2019-ncov-with-mafft">How to Perform Sequence Alignment on 2019-nCoV with MAFFT</h2>
<p>First, you need to install MAFFT. You can install it via Anaconda with the following commands.</p>
<p>Manual installation for different operating systems can be found on the <a target="_blank" href="https://mafft.cbrc.jp/alignment/software/">MAFFT official website</a>.</p>
<pre><code class="lang-bash">conda install mafft
</code></pre>
<p>MAFFT is fairly easy to use, but it process data in a special format. You'll need to preprocess your obtained data so that it can be aligned by MAFFT.</p>
<p>Here's the Python script that does this:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> sys
<span class="hljs-keyword">import</span> re
output = <span class="hljs-string">""</span>
<span class="hljs-keyword">for</span> filename <span class="hljs-keyword">in</span> sys.argv[<span class="hljs-number">1</span>:]:
    infile = open(filename)
    data = infile.read()
    data = <span class="hljs-string">" "</span>.join(re.split(<span class="hljs-string">"[^atcg\n]"</span>, data))
    data = data.replace(<span class="hljs-string">" "</span>, <span class="hljs-string">""</span>)
    output = output + <span class="hljs-string">"&gt;"</span> + filename + <span class="hljs-string">"\n"</span> + data + <span class="hljs-string">"\n"</span>
print(output)
outfile = open(<span class="hljs-string">'SEQUENCES.txt'</span>, <span class="hljs-string">'w+'</span>)
outfile.write(output)
</code></pre>
<p>You can save the above Python code into a file called "preprocess.py", inside the same folder as my virus RNA data. Then we can run the following bash command in the folder to preprocess the data.</p>
<pre><code class="lang-bash">python3 preprocess.py 2019-nCoV_HKU-SZ-002a_2020 icSARS-C7-MA
</code></pre>
<p>The output file called "SEQUENCES.txt" should now look like something below. The virus name is appended at the top of the file. The white space and index numbers are also stripped off.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/01/image-66.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Now you can perform Sequence Alignment with MAFFT in your Terminal with the following steps:</p>
<ol>
<li>Locate your working folder.</li>
<li>Call "mafft" inside your terminal.</li>
<li>For input file, put "SEQUENCES.txt".</li>
<li>For output file, put "output.txt".</li>
<li>Select "1" for "Clustal format" as your output format.</li>
<li>Select "1" for "auto" as your strategy.</li>
<li>Leave all other arguments blank.</li>
</ol>
<p>Here's a gif of me running this in my terminal:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/01/Jan-27-2020-18-46-10.gif" alt="Image" width="600" height="400" loading="lazy"></p>
<p>After you hit enter, you just need to wait for MAFFT to align your RNA codes.</p>
<p>The finished product should look like something below:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/01/image-67.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>Note that the "-" is used to shift the codes and "*" is used to highlight similarities between the sequences.</p>
<p>Congratulations, you have just learned how to perform Sequence Alignment with MAFFT! Now you can play with the gene code and take advantage of the alignment information however you like.</p>
<p>Help Wuhan fight off deadly disease as developer, data scientists and more: </p>
<p><a target="_blank" href="https://github.com/wuhan2020/wuhan2020">https://github.com/wuhan2020/wuhan2020</a></p>
<p>A bit more about me: I'm a developer who's into all kinds of things. I've written some other fun tutorials like these:</p>
<p><a target="_blank" href="https://www.freecodecamp.org/news/ghost/#/editor/post/5ceb787ee17b4228e0185dbf/">How to create beautiful LANTERNS that ARRANGE THEMSELVES into words</a></p>
<p><a target="_blank" href="https://www.freecodecamp.org/news/ghost/#/editor/post/5ceb767ee17b4228e01833b7/">How to drop LEPRECHAUN-HATS into your website with COMPUTER VISION</a></p>
<p>Want me to write a tutorial about something? Let me know. Happy coding.</p>
 ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
