Beginner python student (need help :-( )

Beginner python student (need help :-( )
0

#1

Hello everyone.

I have a CSV file with one thousand row and each row its structure is like that (Show 2 lines):

[icode]Line 219 (6420),Dh," ““url””: ““https://r3—sn-gxqpgpn-h5ql.googlevideo.com/videoplayback?ipbits=0&requiressl=yes&gir=yes&pcm2cms=yes&ip=150.214.57.8&fvip=5&c=WEB&id=o-AKPE5P1-L_x8zzDRYj5aoX3bYJ1W02pyCMpEI4-dsRC-&gcr=es&mm=31%2C29&mn=sn-gxqpgpn-h5ql%2Csn-h5q7dnlk&signature=201841BEA18EE01EE5C51CEE1487A4CAC75DED51.0B35DB9CB1ADB52AFFF24A3250F9B1C5708068ED&keepalive=yes&sparams=clen%2Cdur%2Cei%2Cgcr%2Cgir%2Cid%2Cinitcwndbps%2Cip%2Cipbits%2Citag%2Ckeepalive%2Clmt%2Cmime%2Cmm%2Cmn%2Cms%2Cmv%2Cpcm2cms%2Cpl%2Crequiressl%2Csource%2Cexpire&mt=1541693372&mv=m&initcwndbps=642500&ms=au%2Crdu&itag=251&mime=audio%2Fwebm&pl=16&expire=1541715093&clen=1853458&ei=NWDkW6LwLJic1waombfoBQ&key=yt6&source=youtube&dur=117.241&txp=5511222&lmt=1538050727946448&alr=yes&cpn=mH58o0Uz_dI7ZuKp&cver=2.20181106&range=309760-378453&rn=27&rbuf=5457"”,”[/icode]

[icode]Line 479 (106021),Dh," ““url””: ““https://r3—sn-gxqpgpn-h5ql.googlevideo.com/videoplayback?ipbits=0&requiressl=yes&gir=yes&pcm2cms=yes&ip=150.214.57.8&fvip=5&xtags=vproj%3Dmesh&c=WEB&id=o-AKPE5P1-L_x8zzDRYj5aoX3bYJ1W02pyCMpEI4-dsRC-&aitags=133%2C134%2C135%2C136%2C137%2C160%2C242%2C243%2C244%2C247%2C248%2C264%2C266%2C271%2C278%2C298%2C299%2C302%2C303%2C304%2C305%2C308%2C313%2C315&mm=31%2C29&mn=sn-gxqpgpn-h5ql%2Csn-h5q7dnlk&signature=369A4451CD5098765653E430D3FBFF20D4F8EF36.349B11C21BE7DEDFA4CA70F8365808EB9153FE4C&keepalive=yes&sparams=aitags%2Cclen%2Cdur%2Cei%2Cgcr%2Cgir%2Cid%2Cinitcwndbps%2Cip%2Cipbits%2Citag%2Ckeepalive%2Clmt%2Cmime%2Cmm%2Cmn%2Cms%2Cmv%2Cpcm2cms%2Cpl%2Crequiressl%2Csource%2Cxtags%2Cexpire&mt=1541693372&mv=m&initcwndbps=642500&ms=au%2Crdu&itag=315&mime=video%2Fwebm&pl=16&gcr=es&expire=1541715093&clen=347859091&ei=NWDkW6LwLJic1waombfoBQ&key=yt6&source=youtube&dur=117.183&lmt=1492587801555590&alr=yes&cpn=mH58o0Uz_dI7ZuKp&cver=2.20181106&range=32056563-38295938&rn=28&rbuf=5553"”,”[/icode]

I am new working with python and csv files as input. I just wanna extract from each line 9 labels value and save them, sorted by columns in an excel file, for example:

[output]Line url ip lmt mime itag dur clen rbuf
219 https://r3—sn-gxqpgpn-h5ql.googlevideo.com 150.214.57.8 1538050727946440 audio 251 117.241 1853458 5457
479 https://r3—sn-gxqpgpn-h5ql.googlevideo.com 150.214.57.9 1492587801555590 video 315 117.183 347859091 5553[/output]

i appreciate very much your help.


#2

Hey There,

Theres a few ways to do this, but before I launch into it I just wanted to make sure you were aware that excel can actually import and open CSV files itself, so no need for much coding. Just open the CSV file with excel, do your data manipulation, and save it as a .xlsx native Excel file. Baring that won’t work for you or you just really want to learn how to do this in Python, here you go:

Step 1) Install a Python library that can work with excel files. I’m using openpyxl in this example

Shell:

$ pip install openpyxl

Python:
Method 1) - Easiest method, but might produce a memory error on certain systems if the Excel document to be written is on the large side. As a note, CSV files are read in as Python Lists. Each line of the CSV file will correspond to a list, each item separated by a comma in the line will become a list item.
Ex: Line1: “Line, url, Ip, lmt, mime, itag, dur, clen, rbuf”
will look like this in Python: [“Line”, “url”, “Ip”, “lmt”, “mime”, “itag”, “dur”, “clen”, “rbuf”]
We will be enumerating over these lists

Make sure you place the CSV document in the working directory of your Python environment so that you can find it easily, or more advanced filepath manipulation will be required.

import csv  # For CSV stuff, part of the standard Python library
from openpyxl import Workbook  # For Excel stuff

csv_file_path = 'examplecsv.csv'  # The filename of the CSV file
excel_filename = 'exampleCSVtoXLSX.xlsx'  # The filename we'll title the excel doc
with open(csv_file_path, 'r') as file:
    csv_doc = csv.reader(file) # This method opens the file and will handle closing it when we're done
    wb = Workbook()  # Declare a new openpyxl Excel workbook
    ws1 = wb.active  # Make this the active worksheet in the Excel doc
    for r, row in enumerate(csv_doc):  # Loop through each row (CSV line) one-by-one
        for c, col in enumerate(row):  # Loop through each col (comma seperated item) in each row
            ws1.cell(row=r+1, column=c+1, value = col)  # Append a cell to the worksheet at cell position (r+1,c+1), because they write non-zero indexed...oddly, with value of the item
    wb.save(excel_filename)  # Save the Excel document

Method 2) - Preferred for writing large Excel documents. This uses far less memory than method 1 but isn’t quite as straightforward for a beginner.

import csv   # For CSV stuff, part of the standard Python library
from openpyxl import Workbook  # For Excel Stuff
from openpyxl.worksheet.write_only import WriteOnlyCell

csv_file_path = 'examplecsv.csv'  # The filename of the CSV file
excel_filename = 'exampleCSVtoXLSX.xlsx'  # The filename we'll title the excel doc
with open(csv_file_path, 'r') as file:
    csv_doc = csv.reader(file)  # This method opens the file and will handle closing it when we're done
    wb = Workbook(write_only=True)  # Declare a new openpyxl Excel Workbook in Write Only Mode
    ws1 = wb.create_sheet()  # Create a new worksheet within that workbook, one isn't auto created like the last method
    for r, row in enumerate(csv_doc):  # Enumerate over the CSV, row by row
        row_to_append = []
        for c, col in enumerate(row):  # In every row, enumerate over each column
            cell = WriteOnlyCell(ws1, value=col)  # Declare a new Cell with the value associated with (row, col)
            row_to_append.append(cell)  # Add each new cell to a temporary list
        ws1.append(row_to_append)  # Append that list as a new row of cells in the worksheet
    wb.save(excel_filename)  # Save the workbook to the current working directory

Hopefully that helps. Good luck!

EDIT: Spelling