by David Venturi

Developing Data Scientists and Engineers

Free Code Camp asked 15,000 people who they are, and how they’re learning to code. I isolated those focused on data science and data engineering.

aSILUhjjQw-9oESu5nfKZFnWxCVuUil0Y44B
Image courtesy of Data Science Europe

More than 15,000 people responded to Free Code Camp’s 2016 New Coder Survey, granting researchers (like me!) an unprecedented glimpse into how people are learning to code. They released the entire dataset on Kaggle.

646 respondents answered “Data Scientist/Data Engineer” to the question: “Which one of these roles are you most interested in?

LaRH-Ib5yZyJqo5tN-EtSnCn2Q5bhKSr0Q04

Here are a few high-level statistics from this data-focused subset, which complements Free Code Camp’s exploration of new coders in general.

I’ve borrowed the structure of Free Code Camp’s announcement article for ease of comparison. I’ve also included my comments where findings differ notably. And a few bonus plots, too!

We asked 15,000 people who they are, and how they’re learning to code
More than 15,000 people responded to the 2016 New Coder Survey, granting researchers an unprecedented glimpse into how…medium.freecodecamp.com

Who participated?

Of the 646 developing data scientists and data engineers who responded to the survey:

  • 25% are women (4% more)
  • their median age is 26 years old (one year younger)
  • they started programming an average of 16 months ago (5 months earlier)

Learner goals and approaches

14 hours each week, on average, are spent learning.

This is one hour less than new coders in general.

ZGguGNm-YNRos6TcYmCHIX05ojbDZSedAnp9

0% want to freelance or start their own business.*

Compared to 40% for the full new coder survey, this is a bit shocking. I have a hunch these zero counts are caused by the survey’s design. Every respondent that answered the job role of interest question has zero counts for “start your own business” and “freelance.”

fn5pgq2f5k5OU-SVhUFrk7zrHeLxHE26sSHx

52% percent are already applying for jobs, or will start applying within the next year.

This is a longer time horizon than new coders in general, where 65% are applying within the next year.

lNbc5oDeKUphgmODGzeQ5zfcslFo0ZUMPuJp

Most of them want to work in an office, as opposed to remotely.

qty6xgABTJcT5RIh98W4wLdH2wOkm3YBj2tl

And a majority are willing to relocate.

Nn38UlZlL97LDm6DUuoKkjQfiodnr2evIxPT

Most of them have not yet attended any in-person coding events.

WDtWs38OYgY97GvARwPY3G6Ksg4QXLP42uqO

64% have used at least one of Coursera, edX, or Udacity.

Only 46% of new coders in general have used at least one of these resources. These companies have a wider range of subject areas than the some of the coding-specific resources listed.

D9vPPj0NgJf-s5EKv9QcAy6EAbq-TeGMDUsw

Of them, Partially Derivative, Becoming A Data Scientist, and Talking Machines are the only data-specific podcasts noted.

8lNnDuvT9nxwQQqLugDLFNKNB3v9Ms11Fupe

Only 1% have attended a bootcamp.

6% of new coders have attended a bootcamp.

aNx6QetGSui6lZ7PBWc66CCdOMG7vx9UkuMS

Demographics and Socioeconomics

Data-focused respondents represent 166 countries.

y7ssqsEHvsZdfLnmwi3oW8RsLiCDo-mJVlJx

More than 90% are from North America, Europe, and Asia.

The dominating percentage of North Americans should be expected because Free Code Camp is based in the United States.

EHY-6w51hkd5a9FfoFeuwjm6aCc0evgoewq6

Their cities span a wide range of urbanization levels.

OdSqGAg9oTZ9cIr1VubhZLA6JLmfeELS4vNl

Just under a quarter of respondents are ethnic minorities in their country.

rY7yTMjq-1uoQA8xVz8TpOi2T3iYdDpYChT6

And nearly half are non-native English speakers. They grew up speaking one of 148 languages.

yac-b3DhH0fzQCc-hmijeLdiTMjQvBkTMzW-

67% have earned at least a bachelor’s degree.

Compared to 58% for new coders in general, the data-focused subset is more skewed towards post-secondary studies.

JsSTsedH1YnQHRJt81fgg7BQ0HRM95UrvoN8

Diversity amongst majors is greater compared to the full survey, where Computer Science and Information Technology checked in at #1 and #2 with 17% and 5%, respectively.

NczZx2Mfzy4bJ5dQ9Zgv6xq2qxo3DD5XWGkz

Just over one-half are currently working.

Two-thirds of the new coder population are currently working.

vytHHjCO8IF71lPRRCBl9254bExJfz8J9bZr

A quarter work in the tech industry.

There is a higher variety of employment fields compared to the full dataset, where 50% of respondents work in software development and IT.

FSHSj3-vYAYU0Jq8sA9nqU7CqCxVO8sFbM6M

Median current salary is $44k.

The median current salary for the full dataset is $37k.

dQS3VVj0e1FoMctj56lfATE-IMNhs0NU2xyh

And they expect to earn a median of $60k with their new data science/engineering skills.

The median for the full survey dataset is $50k. With data science/engineering being notoriously lucrative in 2016, some respondents might be seeking higher wages.

c0HccBsVxzPPza2XsPII8dMLm-BsKL5ZIxlr

7% have served in their country’s military.

I2vl7XxQueQa1l7jTuJiwQX5fKzu2kG3cK7w
Image courtesy of Cpl Jamie Peters RLC

13% have children, and another 3% financially support an elderly or disabled relative. And one-fifth are doing this without the help of a spouse.

cxermaG45yC9Fw8sdvblXyO5Pki9Zulc3oR4
Op-yveSfJuwpSsDH6pIs7cuA-LdbyqxKPIGe
Images courtesy of Stay at Home Mum and Stay at Home Dad

47% consider themselves underemployed (working a job that is below their education level).

This is 5% higher than new coders in general.

If they have a home mortgage, they owe an average of $194k.

If they have student loans, they owe an average of $37k.

This average is $3k more than the full survey dataset.

JOdarfxEecJkSdLtxixjLpdy1t7CJglfDxk5
C2-RsUC4mvo0DpZnDbfO6cq66c2218g2xHsF
Image courtesy of Andrew Burton

14% don’t yet have high-speed internet at home.

And 3% are currently receiving disability benefits from their government.

These are the people who are learning data science and engineering. Free, self-paced learning resources are definitely important.

What’s next?

You can find a more detailed version of this analysis on Kaggle, where I outline my exploratory data analysis (EDA) process.

Be sure to check out my initial exploration of Free Code Camp’s dataset, where I dive deeper into the characteristics of new coders:

New Coders: How Salary and Time Spent Learning Vary by Demographic
I analyzed the 15,000 respondents to Free Code Camp’s New Coder Survey by continent, gender, and whether they’re an…medium.freecodecamp.comThe 6 most desirable coding jobs (and the types of people drawn to each)
Free Code Camp asked 15,000 people who they are, and how they’re learning to code. I separated them by their job…medium.freecodecamp.com

If you have questions or concerns about this series or the R code that generated it, don’t hesitate to let me know.

David Venturi (@venturidb) | Twitter
The latest Tweets from David Venturi (@venturidb). Creating my own data science master's degree. @queensu chem eng/econ…twitter.com