I am using python code using jsonlines libarary to upload json files. I am converting each json dictionary element to series and concatinating all the series in a dataframe.
jsonlist =  increment = 0 filecount = 0 with jsonlines.open("test.ndjson") as counter: for j in counter: filecount += 1 print(filecount) with tqdm.tqdm(jsonlines.open("/home/jovyan/data/onlyyouhotels.ndjson"),total=filecount, unit="json files") as reader: series = [json_normalize(j) for j in reader] data_uonly = pd.concat(series) data_uonly.to_pickle('raw_Uonly_pickle')
This takes lots of time to load the data and CPU usages spikes up to 100% due to this. Please suggest a way where i can use parralol processing in this code to load the data and it takes less time and less memory usages. NOw it is taking 2 to 3 hours to load the data for 1.3 gb file.