Biopython to print lengths of downloaded sequences

Biopython to print lengths of downloaded sequences
0

#1

Hi you guys! Does anyone know how to use bio python to print the lengths of downloaded sequences?

I am trying to make a file in nano with:
import sys
fh = open (sys.argv[1],“r”)
lines = fh.readlines()
import sys
from Bio import SeqIO

Open the input file

fh=open(sys.argv[1], “r”)

Open the output file

fo = open(sys.argv[2],“w”)

for record in SeqIO.parse(fh, “genbank”):
organism = record.annotations[“organism”]
print(organism, record.seq)
print (record.seq)
print len(record.seq)
SeqIO.write(record.seq, fo, “fasta”)

Close input file

fh.close()

Close the output file

fo.close()


#2

Well I can’t not attempt to help a fellow biologist.

This is how I normally do such things. You can also use the open-close file handling notation if you’d prefer. I just find it easier to use the ‘with open…’ notation.

# Imports
import sys
from Bio import SeqIO

# Define Variables
input_handle = sys.argv[1]
output_handle = sys.argv[2]

# Open the input file
with open(input_handle, 'r') as file:
    for record in SeqIO.parse(file, 'genbank'):
        print(record.annotations['organism'])
        print(record.seq)
        print('Length: ', len(record.seq))
        SeqIO.write(record, output_handle, 'fasta')  # Write the records individually in FASTA format

Hopefully this helps. If your input/output files aren’t always going to be input: Genbank, output: FASTA format, you can easily add more system arguments and plug them in instead of the ‘hardcoded’ methods to allow for basically any Bio format.