Biopython to print lengths of downloaded sequences

Biopython to print lengths of downloaded sequences


Hi you guys! Does anyone know how to use bio python to print the lengths of downloaded sequences?

I am trying to make a file in nano with:
import sys
fh = open (sys.argv[1],“r”)
lines = fh.readlines()
import sys
from Bio import SeqIO

Open the input file

fh=open(sys.argv[1], “r”)

Open the output file

fo = open(sys.argv[2],“w”)

for record in SeqIO.parse(fh, “genbank”):
organism = record.annotations[“organism”]
print(organism, record.seq)
print (record.seq)
print len(record.seq)
SeqIO.write(record.seq, fo, “fasta”)

Close input file


Close the output file



Well I can’t not attempt to help a fellow biologist.

This is how I normally do such things. You can also use the open-close file handling notation if you’d prefer. I just find it easier to use the ‘with open…’ notation.

# Imports
import sys
from Bio import SeqIO

# Define Variables
input_handle = sys.argv[1]
output_handle = sys.argv[2]

# Open the input file
with open(input_handle, 'r') as file:
    for record in SeqIO.parse(file, 'genbank'):
        print('Length: ', len(record.seq))
        SeqIO.write(record, output_handle, 'fasta')  # Write the records individually in FASTA format

Hopefully this helps. If your input/output files aren’t always going to be input: Genbank, output: FASTA format, you can easily add more system arguments and plug them in instead of the ‘hardcoded’ methods to allow for basically any Bio format.