<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ data scientist - freeCodeCamp.org ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ data scientist - freeCodeCamp.org ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Tue, 23 Jun 2026 22:45:24 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/tag/data-scientist/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ How to Use ChatGPT – Prompts for Data Scientists ]]>
                </title>
                <description>
                    <![CDATA[ By Shittu Olumide These days, many people are concerned with the way AI is evolving. They see it effortlessly solving problems, performing tasks quickly and accurately, and doing jobs easily and efficiently.  And while some people are scared, others ... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-use-chatgpt-for-data-scientists/</link>
                <guid isPermaLink="false">66d4610c38f2dc3808b790f7</guid>
                
                    <category>
                        <![CDATA[ Artificial Intelligence ]]>
                    </category>
                
                    <category>
                        <![CDATA[ chatgpt ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Data Science ]]>
                    </category>
                
                    <category>
                        <![CDATA[ data scientist ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Mon, 24 Apr 2023 23:01:14 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2023/04/Shittu-Olumide-How-to-use-ChatGPT-for-Data-Scientists.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Shittu Olumide</p>
<p>These days, many people are concerned with the way AI is evolving. They see it effortlessly solving problems, performing tasks quickly and accurately, and doing jobs easily and efficiently. </p>
<p>And while some people are scared, others are actually embracing it. I'm one of those people who are embracing it, and I advise you to do the same. Learn to use AI to your own advantage. </p>
<p>This isn't a new phenomenon. Back in the day when the internet started out, many people were skeptical about it and didn't want to change. They didn't want to move their businesses online. But the reverse is the case now – almost all businesses nowadays have an online presence. </p>
<p>I believe this is how it's going to be in the very near future: anyone who can't use AI to their benefit and leverage its capabilities may suffer. </p>
<p>To be able to get the best results when using an AI tool, you need to be able to ask the right questions and input the correct prompts. </p>
<p>In this article, I'll share some useful and important prompts that data scientists can use. We'll cover areas such as machine learning, data visualization, and lots more. </p>
<h2 id="heading-what-is-chatgpt">What is ChatGPT?</h2>
<p><a target="_blank" href="https://chat.openai.com/">ChatGPT</a> is a language model developed by OpenAI that's based on the GPT-3.5 architecture. It use deep learning algorithms to generate human-like responses to user queries or inputs. </p>
<p>As a language model, it has been trained on a massive amount of textual data from various sources and can understand and generate text in multiple languages. </p>
<p>ChatGPT can do many things, including the following:</p>
<ol>
<li>Answering general knowledge questions.</li>
<li>Providing simpler explanations of complex concepts.</li>
<li>Generating creative writing prompts.</li>
<li>Summarizing long passages of text.</li>
<li>Recommending products or services based on user preferences.</li>
<li>Translating text into different languages.</li>
<li>Generating human-like responses to conversations.</li>
<li>Helping with language learning by providing definitions and examples of words and phrases.</li>
<li>Generating personalized content, such as emails and social media posts.</li>
<li>Creating chatbots and virtual assistants for businesses.</li>
<li>Assisting with research by providing relevant information and sources.</li>
<li>Generating jokes and humorous responses.</li>
<li>Assisting with mental health by providing coping strategies and resources.</li>
<li>Analyzing data and generating reports.</li>
<li>Generating music and art based on user input and preferences.</li>
</ol>
<p>Note that this list is not exhaustive. ChatGPT's capabilities are constantly evolving and expanding with advancements in artificial intelligence technology.</p>
<h2 id="heading-best-practices-for-creating-good-prompts">Best Practices for Creating Good Prompts</h2>
<p>Coming up with good question prompts for ChatGPT can be challenging, but some general principles and strategies can help guide you in the process.</p>
<ol>
<li><strong>Clearly define the goal of the question</strong>: Before formulating a good question for ChatGPT, it's important to be clear on what you're trying to achieve. What information or insight are you hoping to get from the model? Once you have a clear goal in mind, you can start to think about the types of questions that might be most useful.</li>
<li><strong>Keep it specific and focused</strong>: ChatGPT is good at generating answers to specific questions, so it's important to frame your questions in a specific and focused way. Avoid broad or vague questions, and be as clear and concise as possible.</li>
<li><strong>Use natural language</strong>: ChatGPT is designed to understand and generate natural language, so it's important to use it when formulating your questions. Avoid using technical jargon or complex language that might be difficult for the model to understand.</li>
<li><strong>Provide context</strong>: ChatGPT works best when it has context to work with, so it can be helpful to provide some context when formulating your questions. This might include providing background information or explaining the context of the question.</li>
<li><strong>Test and refine</strong>: Finally, testing your question prompts and refining them over time is important. Try different types of questions and see how ChatGPT responds. Pay attention to the quality and accuracy of the answers you get, and use this feedback to refine your question prompts and improve the quality of the information you get from ChatGPT.</li>
</ol>
<p>Let's explore some of these areas in data science and highlight the types of questions you can ask ChatGPT to get helpful responses.</p>
<h2 id="heading-data-exploration">Data Exploration</h2>
<p>Data Exploration is the process of analyzing, visualizing, and understanding a dataset to discover patterns, trends, and relationships between the variables. </p>
<p>Here are 10 example questions you can ask ChatGPT to get some really helpful answers about data exploration:</p>
<ol>
<li>What is data exploration and how is it useful in data science?</li>
</ol>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/Screen-Shot-2023-04-24-at-12.54.34.png" alt="Image" width="600" height="400" loading="lazy">
<em>Example question and answer about data exploration</em></p>
<ol start="2">
<li><p>What are some popular Python libraries for data exploration and how are they used?</p>
</li>
<li><p>Can you provide an example of a basic data exploration script using Python?</p>
</li>
<li><p>How can you identify patterns and trends in time series data using Pandas and Matplotlib?</p>
</li>
<li><p>How can you generate scatterplots and line charts to explore relationships between variables using Pandas and Matplotlib?</p>
</li>
<li><p>How can you perform dimensionality reduction using PCA to explore relationships between variables?</p>
</li>
<li><p>Can you provide an example of a data exploration script that identifies patterns and trends in data using Pandas and Seaborn?</p>
</li>
<li><p>What are some common techniques for exploring relationships between variables using Pandas and Matplotlib?</p>
</li>
<li><p>Can you provide an example of a data exploration script that uses t-SNE, PCA, and clustering to explore relationships between variables?</p>
</li>
<li><p>How can you perform dimensionality reduction using PCA to explore relationships between variables?</p>
</li>
</ol>
<h2 id="heading-data-analysis">Data Analysis</h2>
<p>Data analysis involves collecting, cleaning, transforming, and modeling data to draw insights and conclusions from it. It is important in many fields such as business, science, engineering, and social sciences.   </p>
<p>Here are 10 questions that you can ask ChatGPT to get the most helpful answers about data analysis:</p>
<ol>
<li>What is the difference between data analysis and data mining?</li>
</ol>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/Screen-Shot-2023-04-24-at-13.20.49.png" alt="Image" width="600" height="400" loading="lazy">
<em>Example question and answer about data analysis</em></p>
<ol start="2">
<li><p>What are the most common data analysis methods?</p>
</li>
<li><p>What is the difference between descriptive and inferential statistics?</p>
</li>
<li><p>How do I select the appropriate statistical test for my data?</p>
</li>
<li><p>How can I use data visualization to analyze my data?</p>
</li>
<li><p>What are some common challenges in data analysis, and how can I overcome them?</p>
</li>
<li><p>What are some techniques for cleaning and preprocessing data?</p>
</li>
<li><p>How can I ensure the quality and accuracy of my data?</p>
</li>
<li><p>How can I use machine learning algorithms for data analysis?</p>
</li>
<li><p>What are some ethical considerations in data analysis, and how can I address them?</p>
</li>
</ol>
<h2 id="heading-data-visualization">Data Visualization</h2>
<p>Data visualization refers to representing data and information through graphical or visual means, such as charts, graphs, and maps, to help people understand complex data sets and patterns quickly and efficiently.  </p>
<p>Here are 10 possible questions you could ask ChatGPT to learn more about data visualization:</p>
<ol>
<li>What are some popular Python libraries for data visualization and how are they used?</li>
</ol>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/Screen-Shot-2023-04-24-at-13.19.41.png" alt="Image" width="600" height="400" loading="lazy">
<em>Example question and answer about data viz</em></p>
<ol start="2">
<li><p>Can you give me an example of creating interactive graphs with Plotly?</p>
</li>
<li><p>How can you ensure that your visualizations are accessible and readable?</p>
</li>
<li><p>How can you choose the right chart or graph for different types of data?</p>
</li>
<li><p>Can you provide an example of a basic data visualization script using Python?</p>
</li>
<li><p>What are some common techniques for creating static visualizations using Matplotlib and Seaborn?</p>
</li>
<li><p>How can you perform correlation analysis and heat mapping using Pandas and Matplotlib?</p>
</li>
<li><p>C an you provide an example of a data visualization that adheres to best practices for effective visualization design?</p>
</li>
<li><p>How can you create line charts, bar charts, scatterplots, and other visualizations using Matplotlib and Seaborn?</p>
</li>
<li><p>What are some examples of innovative data visualizations and how were they created?</p>
</li>
</ol>
<h2 id="heading-data-mining">Data Mining</h2>
<p>Data mining involves discovering patterns and insights from large datasets using statistical and computational methods. It lets you extract useful information from data and transform it into an understandable structure for further use.</p>
<p>Here are 10 questions that you can ask ChatGPT to get the best answers about data mining:</p>
<ol>
<li>What is the difference between data mining and machine learning?</li>
</ol>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/Screen-Shot-2023-04-24-at-13.18.07.png" alt="Image" width="600" height="400" loading="lazy">
<em>Example question and answer about data mining</em></p>
<ol start="2">
<li><p>What are some standard techniques used in data mining?</p>
</li>
<li><p>How does data mining help in decision-making processes?</p>
</li>
<li><p>What are some of the benefits of data mining?</p>
</li>
<li><p>Can you explain the process of data mining in simple terms?</p>
</li>
<li><p>What are some of the challenges of data mining?</p>
</li>
<li><p>How can data mining be used in marketing and advertising?</p>
</li>
<li><p>Can you provide some examples of successful data mining applications?</p>
</li>
<li><p>What ethical considerations should be taken into account when using data mining?</p>
</li>
<li><p>How can data mining be used in healthcare and medical research?</p>
</li>
</ol>
<h2 id="heading-machine-learning">Machine Learning</h2>
<p>Machine Learning is a branch of artificial intelligence that allows computer systems to learn from data without being explicitly programmed. The primary goal of machine learning is to enable machines to automatically improve their performance on a particular task through experience automatically.  </p>
<p>Here are 10 questions that you can ask ChatGPT to learn more about Machine Learning:</p>
<ol>
<li>Can you provide an example of a basic machine learning script using Python?</li>
</ol>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/04/Screen-Shot-2023-04-24-at-13.15.58.png" alt="Image" width="600" height="400" loading="lazy">
<em>Example question and answer about machine learning</em></p>
<ol start="2">
<li><p>What are some popular Python libraries for machine learning and how are they used?</p>
</li>
<li><p>What is a model selection and how can you choose the right algorithm for a machine learning problem?</p>
</li>
<li><p>How can you compare the performance of different machine learning models using different metrics?</p>
</li>
<li><p>Can you write code that will apply 6 different classification algorithms at once, and evaluate them by using precision-recall and f1 score and append the results to the data frame called <code>pred_df</code>?</p>
</li>
<li><p>How can you perform clustering and dimensionality reduction tasks using Scikit-Learn?</p>
</li>
<li><p>Can you provide an example of a machine learning script that performs model selection using Scikit-Learn?</p>
</li>
<li><p>How can you perform regression and classification tasks using Scikit-Learn?</p>
</li>
<li><p>What are some best practices for deploying machine learning models in production?</p>
</li>
<li><p>How can you evaluate the performance of an unsupervised learning model using different metrics?</p>
</li>
</ol>
<h2 id="heading-more-tips-on-interacting-with-chatgpt">More Tips on Interacting with ChatGPT</h2>
<p>Once you have your responses from ChatGPT, there's still a lot you can do. ChatGPT has contextual awareness, so it doesn't "forget" what you were talking about if you ask in the same session/thread. Try these techniques to get even more info:</p>
<ol>
<li><strong>Ask follow-up questions</strong>: If ChatGPT answers your question but you would like more information or clarification, don't hesitate to ask a follow-up question. For example, if you ask, "What is data science?" and ChatGPT responds with a definition, you could follow up with a question like "Can you give me an example of how data science is used in the real world?"</li>
<li><strong>Ask for clarification</strong>: If ChatGPT's response is unclear or you don't understand something, don't hesitate to ask for clarification. ChatGPT is designed to provide clear and concise responses, but sometimes additional explanation may be necessary.</li>
<li><strong>Be specific</strong>: Try to be as specific as possible when asking a question. This will help ChatGPT provide a more accurate and relevant response. For example, instead of asking, "What is the stock market?" you could ask, "How do stock prices fluctuate in response to economic indicators?"</li>
<li><strong>Use natural language</strong>: ChatGPT is designed to understand and respond to natural language, so try to ask questions in a conversational tone. You don't need technical jargon or complex language to get a good response.</li>
<li><strong>Keep the conversation focused</strong>: Stay on topic and avoid asking unrelated or tangential questions. This will help ChatGPT provide more focused and accurate responses. If you want to ask a different question, starting a new conversation is usually best.</li>
</ol>
<h1 id="heading-conclusion">Conclusion</h1>
<p>ChatGPT is a powerful tool that data scientists can use to enhance their work. With its natural language processing capabilities, ChatGPT can provide quick and accurate answers to a wide range of data mining questions, making it an indispensable resource for those working in this field. </p>
<p>By asking the right questions, you can gain valuable insights into data mining techniques, tools, and best practices, ultimately leading to better decision-making and improved outcomes. Just don't forget to fact-check if you're not super familiar with the topic you're asking about.</p>
<p>With the increasing importance of data mining in today's business world, ChatGPT can help data scientists stay on top of the latest trends and advancements, giving them a competitive edge in the market.</p>
<p>Let's connect on <a target="_blank" href="https://www.twitter.com/Shittu_Olumide_">Twitter</a> and on <a target="_blank" href="https://www.linkedin.com/in/olumide-shittu">LinkedIn</a>. You can also subscribe to my <a target="_blank" href="https://www.youtube.com/channel/UCNhFxpk6hGt5uMCKXq0Jl8A">YouTube</a> channel.</p>
<p>Happy Coding!</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Data Analyst VS Data Scientist – What's the Difference? ]]>
                </title>
                <description>
                    <![CDATA[ Data analyst and data scientist are two career paths in big data. And while they do have similarities, each requires different skills.  The basic difference between the two is that a data scientist works to capture data while a data analyst tries to ... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/data-analyst-vs-data-scientist-whats-the-difference/</link>
                <guid isPermaLink="false">66adf0a53bf50764799b9ca7</guid>
                
                    <category>
                        <![CDATA[ data analysis ]]>
                    </category>
                
                    <category>
                        <![CDATA[ data analytics ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Data Science ]]>
                    </category>
                
                    <category>
                        <![CDATA[ data scientist ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Kolade Chris ]]>
                </dc:creator>
                <pubDate>Thu, 17 Nov 2022 16:18:21 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2022/11/web-g1c2368440_1280.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Data analyst and data scientist are two career paths in big data. And while they do have similarities, each requires different skills. </p>
<p>The basic difference between the two is that a data scientist works to capture data while a data analyst tries to gain insights from that data.</p>
<p>This article is for you if you’re interested in a career in big data and you don’t know whether you'd want to be a data analyst or data scientist. It will also help you if you just want to know the differences between a data analyst and a data scientist. </p>
<h2 id="heading-what-well-cover">What We'll Cover</h2>
<ul>
<li><a class="post-section-overview" href="#heading-what-is-data-analytics-and-who-is-a-data-analyst">What is Data Analytics and Who is a Data Analyst?</a><ul>
<li><a class="post-section-overview" href="#heading-what-does-a-data-analyst-do">What does a Data Analyst Do?</a>  </li>
<li><a class="post-section-overview" href="#heading-how-to-become-a-data-analyst">How to Become a Data Analyst</a></li>
</ul>
</li>
<li><a class="post-section-overview" href="#heading-what-is-data-science-and-who-is-a-data-scientist">What is Data Science and Who is a Data Scientist? </a><ul>
<li><a class="post-section-overview" href="#heading-what-does-a-data-scientist-do">What does a Data Scientist Do?</a></li>
<li><a class="post-section-overview" href="#heading-how-to-become-a-data-scientist">How to Become a Data Scientist</a> </li>
</ul>
</li>
<li><a class="post-section-overview" href="#heading-what-are-the-differences-between-data-analyst-and-data-scientist">What are the Differences between Data Analyst and Data Scientist?</a></li>
<li><a class="post-section-overview" href="#heading-conclusion">Conclusion</a></li>
</ul>
<h2 id="heading-what-is-data-analytics-and-who-is-a-data-analyst">What is Data Analytics and Who is a Data Analyst?</h2>
<p><strong>Data analytics</strong> bridges the gap between <strong>data science</strong> and <strong>business analytics</strong>. It is the systematic approach of processing raw data and subsequently extracting meaningful information from it. </p>
<p>The information extracted from the raw data is the focus of data analysis. The professional who does this analysis is a <strong>data analyst</strong>.</p>
<h3 id="heading-what-does-a-data-analyst-do">What does a Data Analyst Do?</h3>
<p>Data analysts make use of statistical and logical techniques to evaluate data. They use tools such as SQL to query databases and extract the needed information that can help companies make better decisions. </p>
<p>To dig into and assess the information from this data, a data analyst uses programming languages like R, SAS, and Python, and tools like <a target="_blank" href="https://www.freecodecamp.org/news/data-visualization-using-d3-course/">D3</a>, <a target="_blank" href="https://www.freecodecamp.org/news/tableau-for-data-science-and-data-visualization-crash-course/">Tableau</a>, and <a target="_blank" href="https://powerbi.microsoft.com/en-us/">Power BI</a>.</p>
<p>In addition, a data analyst cleans up the database by getting rid of redundant and unusable data.</p>
<h3 id="heading-how-to-become-a-data-analyst">How to Become a Data Analyst</h3>
<p>To become a data analyst, you can earn a relevant degree from an accredited college or university, attend a bootcamp, or learn it yourself. </p>
<p>You can learn to become a data analyst yourself because building a career in a certain field in tech is all about <strong>skills</strong>. Once you have those skills and you can put them into practical use, then you can become a data analyst. </p>
<p>Some job requirements for data analysts include degrees and some don’t. So there’s room for anyone who doesn’t have a degree but has the skills.</p>
<p>As a data analyst, the skills you need are: </p>
<ul>
<li>Soft skills (critical thinking, communication, and others)</li>
<li>Data visualization (D3, Tableau, Power BI)</li>
<li>SQL and (probably) NoSQL</li>
<li>Statistics </li>
<li>Spreadsheets (Excel, Google Sheets, and others)</li>
<li>A few programming languages like Python, R, SAS, and JavaScript for D3</li>
<li>Machine learning </li>
</ul>
<p>It doesn’t end there. You should try to work on projects that make you appear employable to recruiters. You should also try to get an entry-level job that can help you put those skills into real-world practice. And if you can’t find an entry-level job, then you can consider volunteering. </p>
<p>Here are a few resources you can use to get started:</p>
<ol>
<li><a target="_blank" href="https://www.freecodecamp.org/news/learn-data-analysis-with-python-course/">Learn Data Analysis with Python</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/exploratory-data-analysis-with-numpy-pandas-matplotlib-seaborn/">What is Data Analysis? Full Handbook</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/data-analysis-with-python-for-excel-users-course/">Data Analysis with Python for Excel Users</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/what-does-a-data-analyst-do-data-analyst-job-description/">What does a Data Analyst Do?</a></li>
</ol>
<h2 id="heading-what-is-data-science-and-who-is-a-data-scientist">What is Data Science and Who is a Data Scientist?</h2>
<p>Data science is the development of strategies for capturing data and preparing it for analysis. It also involves processing and developing data models with programming languages like R and Python, then deploying those models into applications. The professional who develops these strategies is called a data scientist.</p>
<h3 id="heading-what-does-a-data-scientist-do">What does a Data Scientist Do?</h3>
<p>A data scientist is more focused on developing and implementing tools that help data analysts analyze the data and extract the needed information from it. </p>
<p>This means data scientists spend their time developing models and preparing algorithms. And if the organization needs to deploy a model, data scientists are in charge of that.</p>
<h3 id="heading-how-to-become-a-data-scientist">How to Become a Data Scientist</h3>
<p>Most data science job openings require a relevant degree such as Statistics and Computer Science. But on a personal note, I’ve seen data science openings that don’t require degrees. </p>
<p>Towards the end of this article, I will link an article that shows you where to see those data science job openings. </p>
<p>Once again, what matters is the skills. Once you have those skills and can put them into use, then you can get a job as a data scientist. </p>
<p>Some of the skills you need to become a data scientist are:</p>
<ul>
<li>Mathematics</li>
<li>Programming (Python, R, SAS)</li>
<li>Statistics</li>
<li>Linear algebra</li>
<li>Machine learning</li>
<li>Cloud computing</li>
<li>SQL and NoSQL (Most openings won’t require NoSQL but it’s a good skill to learn)</li>
<li>Apache Hadoop</li>
<li>Calculus</li>
</ul>
<p>Here are some resources to get you started:</p>
<ol>
<li><a target="_blank" href="https://www.freecodecamp.org/news/hands-on-data-science-course/">Learn the Basics of Data Science - Hands-On Course</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/python-data-science-course-matplotlib-pandas-numpy/">Python for Data Science Course</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/top-statistics-concepts-to-know-before-getting-into-data-science/">Top Statistics Concepts to Know Before Getting Into Data Science</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/23-common-data-science-interview-questions-for-beginners/">Data Science Interview Questions for Beginners</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/first-steps-to-learn-data-science-or-ml-after-the-roadmap/">Programming, Math, and Science Concepts to Know for Data Science</a></li>
</ol>
<h2 id="heading-what-are-the-differences-between-data-analyst-and-data-scientist">What are the Differences between Data Analyst and Data Scientist?</h2>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Basis</td><td>Data Scientist</td><td>Data Analyst</td></tr>
</thead>
<tbody>
<tr>
<td><strong>Programming</strong></td><td>Advance use of languages like Python, R, and SAS</td><td>Basic Knowledge of Python, R, and SAS</td></tr>
<tr>
<td><strong>Skills</strong></td><td>Advanced programming languages, Statistics, Machine learning, cloud computing</td><td>Basic programming languages, statistics, probability, Spreadsheets, Visualization tools</td></tr>
<tr>
<td><strong>Work</strong></td><td>Spend more time developing models, tools, and creating algorithms to ease analysis</td><td>Spend more time writing queries to retrieve data and process data into meaningful information</td></tr>
<tr>
<td><strong>Degree</strong></td><td>Foundational technical background with Bachelor's degree in Computer Science, Statistics, or Infomation systems. Master's degree in Data Science.</td><td>Foundational technical background with Bachelor's degree in Computer Science, Statistics, or Infomation systems. Master's degree in Data Analytics</td></tr>
<tr>
<td><strong>Salary</strong></td><td>$144,729 /year base pay in the US (Indeed)</td><td>$71,717 /year base pay in the US (Indeed)</td></tr>
</tbody>
</table>
</div><h2 id="heading-conclusion">Conclusion</h2>
<p>Data scientist and data analyst are both in-demand career paths you can follow in big data. If you’re confused about which to take get into between the two, here are some things to consider:</p>
<ul>
<li>if you’re well-versed in Mathematics, Statistics, and computer science, either of the two is good for you</li>
<li>if you want to create advanced machine learning models, you should consider getting into <strong>data science</strong></li>
<li>if you are interested in analytics, you’d probably make a great <strong>data analyst</strong>.</li>
</ul>
<p>There’s no black-and-white guide to help you choose between becoming a data scientist and a data analyst. And it's not helpful to say one is better than the other. </p>
<p>In the end, what matters is solving problems and helping humanity learn and improve, not how much a data analyst makes or how much a data scientist makes.</p>
<h3 id="heading-more-general-readings">More General Readings</h3>
<ul>
<li><a target="_blank" href="https://www.freecodecamp.org/news/how-to-find-remote-jobs/">freeCodeCamp article on how to find remote jobs</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/what-is-data-science-what-a-data-scientist-actually-does/">What is Data Science</a></li>
<li><a target="_blank" href="https://ischoolonline.berkeley.edu/data-science/what-is-data-analytics/">What is Data Analytics</a> </li>
</ul>
<p>Thank you for reading. </p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ What is Data Science? What a Data Scientist Actually Does ]]>
                </title>
                <description>
                    <![CDATA[ Data Science is one of the most in-demand and desirable careers of the 21st century. Even though the term was introduced in the early 1960s, its meaning has changed considerably over time. And despite its rise in popularity in recent years, many peop... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/what-is-data-science-what-a-data-scientist-actually-does/</link>
                <guid isPermaLink="false">66b1e4eb88a49cff617991ed</guid>
                
                    <category>
                        <![CDATA[ Data Science ]]>
                    </category>
                
                    <category>
                        <![CDATA[ data scientist ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Dionysia Lemonaki ]]>
                </dc:creator>
                <pubDate>Thu, 15 Sep 2022 20:11:02 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2022/09/pexels-alex-knight-2599244.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Data Science is one of the most in-demand and desirable careers of the 21st century.</p>
<p>Even though the term was introduced in the early 1960s, its meaning has changed considerably over time. And despite its rise in popularity in recent years, many people outside the field still find the term confusing and don't know what it entails.</p>
<p>So, what is data science, and what do data scientists actually do? </p>
<p>What is the data science process? Why are data scientists so in demand, and how do they help companies gain more customers and increase their profits?</p>
<p>My aim with this article is to answer those questions and outline some of the skills needed for you to become a data scientist yourself with the help of free resources.</p>
<p>Here is what we will cover:</p>
<ol>
<li><a class="post-section-overview" href="#introduction">What is data science?</a></li>
<li><a class="post-section-overview" href="#business">Why is data science important and how data science helps businesses</a></li>
<li><a class="post-section-overview" href="#process">What does a data scientist actually do? The data science process explained</a><ol>
<li><a class="post-section-overview" href="#first">Asking the right questions - identifying the problem at hand</a></li>
<li><a class="post-section-overview" href="#second">Collecting data</a></li>
<li><a class="post-section-overview" href="#third">Data cleaning</a></li>
<li><a class="post-section-overview" href="#fourth">Exploring and modeling data </a></li>
<li><a class="post-section-overview" href="#fith">Interpreting and communicating results</a></li>
</ol>
</li>
<li><a class="post-section-overview" href="#skills">What skills does a data scientist need? How to become a data scientist</a><ol>
<li><a class="post-section-overview" href="#maths-statistics">Statistics and mathematics</a></li>
<li><a class="post-section-overview" href="#algorithms">Algorithms</a></li>
<li><a class="post-section-overview" href="#programming">Computer programming</a></li>
<li><a class="post-section-overview" href="#sql">SQL</a></li>
<li><a class="post-section-overview" href="#data-viz">Data visualization tools</a></li>
<li><a class="post-section-overview" href="#machine-learning">Machine learning</a></li>
</ol>
</li>
</ol>
<h2 id="heading-what-is-data-science">What Is Data Science? <a></a></h2>
<p>Digital data are everywhere nowadays, and we produce large amounts daily. </p>
<p>You produce a lot of data just by going for a walk and scrolling on your phone while listening to your favorite music track on a streaming platform. </p>
<p>You produce data just by uploading a photo to a social media platform or browsing a website looking to buy shoes and then purchasing a pair.</p>
<p>And with each passing year, the amount of data we will all be producing will only continue to increase.</p>
<p>Data science is about collecting and analyzing digital data, extracting and obtaining insights, making informed decisions based on that data, and turning it into meaningful and valuable action.</p>
<p>And this is why data science is necessary for businesses regardless of size - it is the study of extracting insights and transforming data into meaningful and practical information.</p>
<p>The type of data that data scientists analyze can be both structured and unstructured. </p>
<p>Structured data can look like numeric data or text values in an Excel spreadsheet or a Comma-Separated Value (CSV for short) file. Structured data is typically in a tabular format, organized in rows and columns, and stored in a database. </p>
<p>And unstructured data can be data from numbers, text, images, videos, or audio files, to name a few.</p>
<p>Data scientists analyze those large volumes of structured and unstructured data, produce meaningful insights, and make informed decisions.</p>
<p>Data science is a multidisciplinary field that uses different tools, methods, and technologies that change over time. </p>
<p>Specifically, it is the intersection between probability, statistics, mathematics, data analysis, artificial intelligence, machine learning, computer science (algorithms and programming), and business.</p>
<h2 id="heading-why-is-data-science-important-and-how-data-science-helps-businesses">Why Is Data Science Important and How Data Science Helps Businesses <a></a></h2>
<p>As mentioned in the previous section, data science is necessary for businesses because it helps them extract meaningful insights and take actionable steps to reach their goals, grow, and remain competitive in the market.</p>
<p>Data scientists are essential for companies because of the value they provide. They help companies make better and more informed decisions.</p>
<p>Data science allows businesses to uncover new or repetitive patterns, understand trends over time, and visualize the relationships between two things. </p>
<p>Investigating and uncovering such patterns could help a business maximize its profits, increase revenue, and prevent it from experiencing significant losses. Data science can predict and prevent future problems and unfortunate circumstances and protect businesses from loss - which ties in with data science detecting fraud. </p>
<p>Businesses are now able to use data science tools to create accurate fraud detection models to help prevent fraud from happening.</p>
<p>Data science can also be helpful for gathering customer feedback and coming up with new ideas for creating new products and services, as well as solutions to problems that customers face. This can help lead to meeting customers' needs and increasing profit. </p>
<p>By analyzing patterns and reoccurring trends, a business can notice and recognize potential gaps, which leads to innovation, creative solutions, and greater customer satisfaction.</p>
<p>Another reason a data science strategy is essential for the growth of every business is that it can attract new customers via targeted ads. </p>
<p>Essentially, companies use your browsing history to learn more about you and gather insights into which of their products and services may be of interest to you. With those insights at hand, they can show and recommend products and services that are tailored and fit your interests.</p>
<h2 id="heading-what-does-a-data-scientist-actually-do-the-data-science-process-explained">What Does A Data Scientist Actually Do? The Data Science Process Explained <a></a></h2>
<p>What tasks does a data scientist carry out on a day-to-day basis?</p>
<p>The tasks will heavily depend on the company size as well as the sector of the company.</p>
<p>In a smaller company, a data scientist may be the only person responsible for all the data processes. In contrast, in a larger enterprise, a data scientist will most likely be part of a bigger team and have a higher degree of specialization in their role.</p>
<p>Below are the steps involved in the data science process.</p>
<h3 id="heading-asking-the-right-questions-identifying-the-problem-at-hand">Asking The Right Questions - Identifying The Problem At Hand <a></a></h3>
<p>The first step in the data science process is asking the right questions, some of which include:</p>
<ul>
<li>What happened?</li>
<li>Why did that happen?</li>
<li>What kind of information do I need to collect?</li>
<li>What will happen in the future?</li>
<li>What is the business trying to achieve?</li>
<li>What are the current challenges?</li>
<li>What can be done right now?</li>
</ul>
<p>In this first step, the goal is to understand the problem at hand as completely as possible and define the right questions that need answering. This first step is crucial for the rest of the process and for gathering the type of data that will help solve the problem.</p>
<h3 id="heading-collecting-data">Collecting Data <a></a></h3>
<p>The next step in the data science process, and a big chunk of a data scientist's work, is extracting and collecting the right kind of data.</p>
<p>This step involves:</p>
<ul>
<li>Checking what type of pre-existing data is available to them. </li>
<li>Collecting new data from selected sources.</li>
</ul>
<p>Data scientists need plenty of data to work with, and they get hold of data in different ways, some of which include:</p>
<ul>
<li>Using internal company data.</li>
<li>Using public data sets.</li>
<li>Querying relational databases.</li>
<li>Conducting market research.</li>
<li>Conducting surveys. </li>
<li>Performing web scraping - a technique that extracts information from websites. </li>
<li>Checking server logs.</li>
<li>Automatically collecting data via website cookies and third-party sources.</li>
</ul>
<p>At this stage, the data is raw, meaning it could be corrupt and will likely have missing values and contain mistakes and errors.</p>
<h3 id="heading-data-cleaning">Data Cleaning <a></a></h3>
<p>Raw data is not usable.</p>
<p>The next step in the data science process, and one of the most important and time-consuming parts of the job, is data cleaning and preparing the cleaned data.</p>
<p>Data cleaning standardizes data to a uniform format.</p>
<p>This step includes:</p>
<ul>
<li>Looking for missing data values, asking why they are missing, and filling them in if needed.</li>
<li>Correcting errors and inaccuracies such as spelling mistakes. </li>
<li>Removing duplicate values. </li>
<li>Uncovering corrupt records.</li>
<li>Dealing with inconsistent data.</li>
<li>Identifying <a target="_blank" href="https://www.freecodecamp.org/news/what-is-an-outlier-definition-and-how-to-find-outliers-in-statistics/">outliers</a>. </li>
</ul>
<p>Cleaning data will ensure that there will not be any inaccuracies at the end of the data science process.</p>
<h3 id="heading-exploring-and-modeling-data">Exploring and Modeling Data <a></a></h3>
<p>Exploring data is essentially analyzing it in-depth to gain a deeper understanding, narrowing down the data that will be crucial for answering the initial questions, uncovering patterns, and extracting meaningful insights. With those new insights, data scientists can go on to provide impactful recommendations.</p>
<p>This step in the data science process involves utilizing statistical methods and data visualization tools for creating diagrams, charts, and graphs to represent evident trends and correlations in the data. </p>
<p>Data scientists use algorithms, machine learning, and artificial intelligence techniques to build, evaluate, deploy and monitor a machine learning predictive model for the data.</p>
<p>They perform hypothesis testing and predict and forecast highly accurate outcomes to determine the best actionable steps for the future.</p>
<h3 id="heading-interpreting-and-communicating-results">Interpreting and Communicating Results <a></a></h3>
<p>The last step in the data science process involves communicating and presenting the findings in a compelling and easy-to-understand way to other teams, decision-makers, company executives, stakeholders, and clients. The presentation needs to be accessible to non-technical staff.</p>
<p>Communication skills are one of the most important and underrated skills a data scientist can have in their toolbelt. They are equally as important as the technical skills needed for the job.</p>
<p>This step is also known as <em>data storytelling</em> - the data scientist uses the data and insights they have gathered to interpret and tell a story on the work and explorations they have done, how the business can best use those findings and the conclusions they reached. </p>
<p>During this presentation, the data scientists answer the questions they defined in the first step.</p>
<h2 id="heading-what-skills-does-a-data-scientist-need-how-to-become-a-data-scientist">What Skills Does A Data Scientist Need? How To Become A Data Scientist <a></a></h2>
<p>In the following sections, I will outline some of the technical skills you need as an aspiring data scientist.</p>
<h3 id="heading-statistics-and-mathematics">Statistics and Mathematics <a></a></h3>
<p>As a data scientist, you need a good grasp and foundational knowledge of math basics.</p>
<p>But what kind of math is required for data science?</p>
<p>The math requirements and concepts you will need to familiarize yourself with for data science are:</p>
<ul>
<li>Calculus</li>
<li>Linear algebra</li>
<li>Probability and statistics</li>
</ul>
<p>Good knowledge of probability and statistics will help you gather and analyze data, figure out patterns, and draw conclusions from the data.</p>
<p>Here are some resources to get you started with calculus:</p>
<ul>
<li><a target="_blank" href="https://www.freecodecamp.org/news/precalculus-learn-college-math-prerequisites-with-this-free-5-hour-course/">Precalculus – Learn College Math Prerequisites with this Free 5-Hour Course</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/learn-college-calculus-in-free-course/">Learn Calculus 1 in This Free 12-Hour Course</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/learn-calculus-2-in-this-free-7-hour-course/">Learn Calculus 2 in This Free 7-Hour Course</a></li>
</ul>
<p>.. linear algebra:</p>
<ul>
<li><a target="_blank" href="https://www.freecodecamp.org/news/learn-algebra-to-improve-your-programming-skills/">College Algebra – Learn College Math Prerequisites with this Free 7-Hour Course</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/linear-algebra-full-course/">Learn Linear Algebra with This 20-Hour Course and Free Textbook</a></li>
</ul>
<p>.. and statistics:</p>
<ul>
<li><a target="_blank" href="https://www.freecodecamp.org/news/top-statistics-concepts-to-know-before-getting-into-data-science/">Statistics for Beginners – Top Stats Concepts to Know Before Getting into Data Science</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/statistics-for-data-science/">Statistics for Data Science — a Complete Guide for Aspiring ML Practitioners</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/free-statistics-course/">Learn College-level Statistics in this free 8-hour course</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/if-you-want-to-learn-data-science-take-a-few-of-these-statistics-classes-9bbabab098b9#.esdiw8wnk">If you want to learn Data Science, take a few of these statistics classes</a></li>
</ul>
<h3 id="heading-algorithms">Algorithms <a></a></h3>
<p>Knowledge of algorithms is one of the most important skills in data science.</p>
<p>Here are a couple of the most popular data science algorithms you can start with:</p>
<ul>
<li><a target="_blank" href="https://www.freecodecamp.org/news/how-to-build-and-train-linear-and-logistic-regression-ml-models-in-python/">Linear and logistic regression</a>. A linear regression algorithm is most often used for predictive analysis. It attempts to model the relationship of a variable (also known as the dependent variable) based on the value of another variable (also known as an independent variable). And a logistic regression algorithms is a statistical analysis method used to predict a yes or no outcome.</li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/how-to-use-the-tree-based-algorithm-for-machine-learning/">Random forest</a>. A random forest algorithm is used for classification and regression problems and combines multiple decision trees into a single model.</li>
</ul>
<h3 id="heading-computer-programming">Computer Programming <a></a></h3>
<p>One of the most popular programming languages for data science is Python. </p>
<p>Python is a general-purpose programming language, and it is very beginner-friendly (thanks to its readable syntax that resembles the English language) and its versatility.</p>
<p>Python offers a wealth of packages and external libraries for data manipulation, such as Pandas and NumPy, as well as for data visualization, such as Matplotlib.</p>
<p>Below are some free beginner Python resources to get you started:</p>
<ul>
<li><a target="_blank" href="https://www.freecodecamp.org/news/python-programming-course/">Free Python Programming Course [2022]</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/20-beginner-python-projects/">How to Code 20 Beginner Python Projects</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/python-fundamentals-for-data-science/">Python Fundamentals for Data Science</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/top-python-concepts-for-data-science/">Top Python Concepts to Know Before Learning Data Science</a></li>
</ul>
<p>Once you understand the fundamentals, you can move on to learning about Pandas, NumPy, and Matplotlib.</p>
<ul>
<li><a target="_blank" href="https://www.freecodecamp.org/news/the-ultimate-guide-to-the-numpy-scientific-computing-library-for-python/">The Ultimate Guide to the NumPy Package for Scientific Computing in Python</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/python-pandas-functions/">How to Get Started with Pandas in Python – a Beginner's Guide</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/matplotlib-course-learn-python-data-visualization/">Matplotlib Course – Learn Python Data Visualization</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/how-to-analyze-data-with-python-pandas/">How to Analyze Data with Python, Pandas &amp; Numpy - 10 Hour Course</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/python-data-science-course-matplotlib-pandas-numpy/">Python Data Science – A Free 12-Hour Course for Beginners. Learn Pandas, NumPy, Matplotlib, and More.</a></li>
</ul>
<p>Another programming language used in data science is R. This programming language was designed specifically for statistical computing, statistical analysis, data analysis, and data manipulation.</p>
<p>To get started learning R, check out the following resources:</p>
<ul>
<li><a target="_blank" href="https://www.freecodecamp.org/news/r-programming-language-explained/">R Programming Language Explained</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/r-programming-course/">Learn R programming language basics in just 2 hours with this free course on statistical programming</a></li>
</ul>
<h3 id="heading-sql">SQL <a></a></h3>
<p>Data scientists need to know how to interact with a database system, such as a relational database, to organize, store, and extract a large amount of data.</p>
<p>A database is an electronic storage localization for data. Data can be easily retrieved and searched through.</p>
<p>A relational database is structured in format and all data items stored have pre-defined relationships with each other.</p>
<p>And this is where SQL comes in. SQL stands for Structured Query Language and is used for accessing, querying, manipulating, and interacting with relational databases.</p>
<p>With SQL queries, you can perform CRUD (Create, Read, Update, and Delete) operations on data.</p>
<p>To learn SQL, check out the following resources:</p>
<ul>
<li><a target="_blank" href="https://www.freecodecamp.org/news/learn-sql-to-work-in-data-science/">Why You Should Learn SQL if You Want a Data Science Job</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/learn-sql-free-relational-database-courses-for-beginners/">Learn SQL – Free Relational Database Courses for Beginners</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/learn-sql-in-10-minutes/">SQL Commands Cheat Sheet – How to Learn SQL in 10 Minutes</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/sql-recipes/">Learn SQL with These 5 Easy Recipes</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/sql-and-databases-full-course/">SQL and Databases - A Full Course for Beginners</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/learn/relational-database/">Relational Database Certification</a></li>
</ul>
<h3 id="heading-data-visualization-tools">Data Visualization Tools <a></a></h3>
<p>Data visualization is the graphical interpretation and presentation of data - this includes creating graphs, charts, interactive dashboards, or maps that can be easily shared with other team members and stakeholders.</p>
<p>Data visualization tools are used to tell a story with data and drive decision-making.</p>
<p>One of the most popular data visualization tools used is Tableau.</p>
<p>To learn Tableau, check out the following course:</p>
<ul>
<li><a target="_blank" href="https://www.freecodecamp.org/news/tableau-for-data-science-and-data-visualization-crash-course/">Tableau for Data Science and Data Visualization - Crash Course</a></li>
</ul>
<h3 id="heading-machine-learning">Machine Learning <a></a></h3>
<p>Machine Learning (or ML for short) is the intersection of artificial intelligence (short for AI) and computer science.</p>
<p>Computer systems learn how to perform a specific task without being explicitly programmed. </p>
<p>Machine learning enables systems to learn, recognize and identify statistical patterns, improve, and become more accurate from experience.</p>
<p>And data scientists use machine learning extensively and incorporate it into their work.</p>
<p>Here are some machine learning resources to get you started:</p>
<ul>
<li><a target="_blank" href="https://www.freecodecamp.org/news/what-is-machine-learning-for-beginners/">What is Machine Learning? ML Tutorial for Beginners</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/ai-vs-ml-whats-the-difference/">AI vs ML – What’s the Difference Between Artificial Intelligence and Machine Learning?</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/how-to-learn-machine-learning-practical-tips-and-resources/">How to Learn Machine Learning – Tips and Resources to Learn ML the Practical Way</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/free-machine-learning-course-10-hourse/">Free 10-Hour Machine Learning Course</a></li>
<li><a target="_blank" href="https://www.freecodecamp.org/news/best-machine-learning-courses/">10 Best Machine Learning Courses to Take in 2022</a></li>
</ul>
<h2 id="heading-conclusion">Conclusion</h2>
<p>This marks the end of the article – thank you so much for making it to the end!</p>
<p>Hopefully, this guide was helpful, and it gave you some insight into what data science is, what a data scientist actually does, what the data science process entails, and what skills you need to enter the field.</p>
<p>Thank you for reading!</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to predict likes and shares based on your article’s title using Machine Learning ]]>
                </title>
                <description>
                    <![CDATA[ By Flavio H. Freitas Choosing a good title for an article is an important step in the writing process. The more interesting the title seems, the higher the chance a reader will interact with the whole thing. Furthermore, showing the user content they... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-predict-likes-and-shares-based-on-your-articles-title-using-machine-learning-47f98f0612ea/</link>
                <guid isPermaLink="false">66c353f2d73001a6c0054c03</guid>
                
                    <category>
                        <![CDATA[ data scientist ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Machine Learning ]]>
                    </category>
                
                    <category>
                        <![CDATA[ social media ]]>
                    </category>
                
                    <category>
                        <![CDATA[ software development ]]>
                    </category>
                
                    <category>
                        <![CDATA[ tech  ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Fri, 21 Sep 2018 22:24:18 +0000</pubDate>
                <media:content url="https://cdn-media-1.freecodecamp.org/images/1*gQRQ6x29YFA_ngSpaoDCUw.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Flavio H. Freitas</p>
<p>Choosing a good title for an article is an important step in the writing process. The more interesting the title seems, the higher the chance a reader will interact with the whole thing. Furthermore, showing the user content they prefer (to interact with) increases the user’s satisfaction.</p>
<p>This is how my final project from the <a target="_blank" href="https://udacity.com/course/machine-learning-engineer-nanodegree--nd009">Machine Learning Engineer Nanodegree</a> specialization started. I just finished it, and I feel <em>so proud and happy</em> ? that I wanted to share with you some insights I’ve had about the whole flow. Also, I promised Q<a target="_blank" href="https://medium.com/@quincylarson">uincy Larson</a> this article when I finished the project.</p>
<p>If you want to see the final technical document <a target="_blank" href="https://github.com/flaviohenriquecbc/machine-learning-capstone-project/blob/master/final-report.pdf">click here</a>. If you want the implementation of the code, check it out <a target="_blank" href="https://github.com/flaviohenriquecbc/machine-learning-capstone-project/blob/master/title-success-prediction.ipynb">here</a> or fork my project on <a target="_blank" href="https://github.com/flaviohenriquecbc/machine-learning-capstone-project">GitHub</a>. If you just want an overview using layperson’s terms, this is the right place — continue reading this article.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*gQRQ6x29YFA_ngSpaoDCUw.png" alt="Image" width="800" height="290" loading="lazy">
<em>FreeCodeCamp Medium post on Twitter</em></p>
<p>Some of the most used platforms to spread ideas nowadays are Twitter and Medium (you are here!). On Twitter, articles are normally posted including external URLs and the title, where users can access the article and demonstrate their satisfaction with a like or a retweet of the original post.</p>
<p>Medium shows the full text with tags (to classify the article) and claps (similar to Twitter’s likes) to show how much the users appreciate the content. <em>A correlation between these two platforms can bring us valuable information.</em></p>
<h3 id="heading-the-project">The project</h3>
<p>The problem that I defined was a classification task using supervised learning: <em>Predict the number of likes and retweets an article receives based on the title.</em></p>
<p>Correlating the number of likes and retweets from Twitter with a Medium article is an attempt to isolate the effect of the number of reached readers and the number of Medium claps. Because the more the article is shared on different platforms, the more readers it will reach and the more Medium claps it will (likely) receive.</p>
<p>Using only the Twitter statistic, we’d expect that the articles reached initially almost the same number of readers (those readers being the followers of the freeCodeCamp account on Twitter). Their performance and interactions, therefore, would be limited to the characteristics of the tweet — for example, the title of the article. And that is exactly what we want to measure.</p>
<p>I chose the <a target="_blank" href="https://twitter.com/freecodecamp">freeCodeCamp account</a> for this project because the idea was to limit the scope of the subject of the articles and better predict the response on a specific field. The same title can perform well in one category (e.g. Technology), but not necessarily in a different one (e.g. Culinary). Also, this account posts the title of the original article and the URL on Medium as the tweet content.</p>
<h3 id="heading-how-does-the-data-look">How does the data look?</h3>
<p>The first step of this project was to get the information from Twitter and Medium and then correlate it. The dataset can be found <a target="_blank" href="https://github.com/flaviohenriquecbc/machine-learning-capstone-project/blob/master/dataset/dataset-tweets-final.json">here</a> and it has 711 data points. This is how the dataset looks like:</p>
<h3 id="heading-analyzing-and-learning-with-the-data">Analyzing and learning with the data</h3>
<p>After analyzing the dataset and plotting some graphics, I found interesting information about it. For these analyses, <strong>the outliers were removed,</strong> and I just considered the <strong>25% top performers</strong> for each feature (retweet, like, and clap).</p>
<p>So let’s take a look at what the numbers say for freeCodeCamp articles written on Medium and shared on Twitter.</p>
<h4 id="heading-what-is-a-good-title-length">What is a good title length?</h4>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*Mm7zCNram85z-qmQ2PYGgA.png" alt="Image" width="800" height="391" loading="lazy">
<em>Title length performance</em></p>
<p>Writing titles that have a length <strong>greater than 50 and less than 110</strong> characters helps to increase the chances of a successful article.</p>
<h4 id="heading-what-is-a-good-number-of-words-in-the-title">What is a good number of words in the title?</h4>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*fQ1kXH82jeikkfUtsl7baA.png" alt="Image" width="800" height="389" loading="lazy">
<em>Number of words performance</em></p>
<p>The most effective number of words in the title is <strong>9 to 17</strong>. To optimize the number of retweets and likes, try something from 9 to 18 words, and for claps from 7 to 17.</p>
<h4 id="heading-which-are-the-best-categories-to-tag">Which are the best categories to tag?</h4>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*NNmbj8LjKK4Mj1eBvRD2wQ.png" alt="Image" width="800" height="358" loading="lazy"></p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*spIxtLO9qD042AP-XFiicA.png" alt="Image" width="800" height="351" loading="lazy"></p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*WSluJ1QtQNwukYnW60TU1A.png" alt="Image" width="800" height="364" loading="lazy"></p>
<p><strong>Programming</strong>, <strong>Tech</strong>, <strong>Technology</strong>, <strong>JavaScript</strong> and <strong>Web Development</strong> are categories you should consider when tagging your next article. They appear for all the three features as a good indicator.</p>
<h4 id="heading-which-are-the-best-words-to-use">Which are the best words to use?</h4>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*f1vJmkiXf0Nlxc9nCU0Vrw.png" alt="Image" width="800" height="391" loading="lazy"></p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*vKj2TVnOSgLHWuv3WiAZUA.png" alt="Image" width="800" height="392" loading="lazy"></p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/1*Y4PqnyR2dF4da5WWKuqS1g.png" alt="Image" width="800" height="376" loading="lazy"></p>
<p>In this lexical analysis, you’ll notice that some words get much more attention on the freeCodeCamp community than others. If the intention is to make the articles reach further in numbers, talking about JavaScript, React or CSS will increase how much it’s appreciated. Using the words “learn” or “guide” to describe will also make the probability higher.</p>
<h3 id="heading-using-machine-learning">Using Machine Learning</h3>
<p>OK! After taking a look at the data and extracting some information from it, the goal was to create a Machine Learning model that makes predictions of the number of retweets, likes, and claps based on the title of the article.</p>
<p>Predicting the number of retweets, likes, and claps of an article can be treated as a classification problem, and that is a common task of machine learning (ML). But for this, we need to use the output as discrete values (a range of numbers). The input will be the title of the articles with each word as a token (t1, t2, t3, … tn), the title length, and the number of words in the title.</p>
<p>The ranges for our features are:</p>
<ul>
<li>Retweets: 0–10, 10–30, 30+</li>
<li>Likes: 0–25, 25–60, 60+</li>
<li>Claps: 0–50, 50–400, 400+</li>
</ul>
<p>And finally, after preprocessing our dataset and evaluating some models (everything fully described <a target="_blank" href="https://github.com/flaviohenriquecbc/machine-learning-capstone-project/blob/master/final-report.pdf">here</a>), we reached the conclusion that the MultinomialNB model performed better for retweets reaching an accuracy of 60.6%. Logistic regression reached 55.3% for likes and 49% for claps.</p>
<p>As an experiment for this article, I ran the prediction of the title of this article and the model predicted that:</p>
<p>It will have 10–30 retweets and 25–60 favorites on Twitter and 400+ claps on Medium.</p>
<p>How is this prediction? ?</p>
<p><a target="_blank" href="https://medium.com/@flaviohfreitas"><em>Follow me</em></a> <em>if you want to read more of my articles</em> ? <em>And if you enjoyed this article, be sure to like it give me a lot of claps — it means the world to the writer.</em></p>
<p><strong>Flávio H. de Freitas</strong> is an Entrepreneur, Engineer, Tech lover, Dreamer and Traveler. Has worked as <strong>CTO</strong> in <strong>Brazil</strong>, <strong>Silicon Valley and Europe</strong>.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ I Built A Jupyter Notebook That Will Analyze Cryptocurrency Portfolios For You ]]>
                </title>
                <description>
                    <![CDATA[ By Grant Bartel The amount of engagement in the crypto investment space needs no introduction. With market caps, volumes, and public awareness on the rise, I thought I’d put together a simple Jupyter notebook to get a clearer and broader viewpoint in... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/i-built-a-jupyter-notebook-that-will-analyze-cryptocurrency-portfolios-for-you-bdaba618aeca/</link>
                <guid isPermaLink="false">66d45edb230dff01669057f1</guid>
                
                    <category>
                        <![CDATA[ Bitcoin ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Cryptocurrency ]]>
                    </category>
                
                    <category>
                        <![CDATA[ data scientist ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Investing ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Sat, 20 Jan 2018 10:35:13 +0000</pubDate>
                <media:content url="https://cdn-media-1.freecodecamp.org/images/1*yclB_TfehNu8DxAADDBzXg.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Grant Bartel</p>
<p>The amount of engagement in the crypto investment space needs no introduction. With <a target="_blank" href="http://www.ibtimes.co.uk/year-cryptocurrencies-became-mainstream-1654616">market caps, volumes, and public awareness on the rise</a>, I thought I’d put together a simple Jupyter notebook to get a clearer and broader viewpoint into the investment activities within my own crypto portfolio.</p>
<p>TL;DR <a target="_blank" href="https://github.com/grantathon/crypto_portfolio_analysis">here’s the code</a> ;)</p>
<h3 id="heading-why-should-we-analyze-our-portfolios">Why Should We Analyze Our Portfolios?</h3>
<p>Because we’re definitely missing important details about our investments by only looking at the total value of our (potentially fat) wallets — even though I enjoy looking at Blockfolio from time to time. Because seeing our Ripple go to the moon and overshadow the rest of our investments is likely increasing our financial risk substantially. Because we all want our money to grow, but achieving this by picking a diverse set of cryptos is easier and safer than picking a moonshot that could end up a dud (and make us broke).</p>
<p>And let’s face it, the market gains are just too big for us to be left in the dark on the true characteristics of our investment portfolios.</p>
<h3 id="heading-important-portfolio-characteristics">Important Portfolio Characteristics</h3>
<p>Now there are several characteristics of our portfolio that we should take a good look at, including return <strong>and</strong> risk. But a lot of the time we’re fixated on one and not the other.</p>
<p>We can look at return in several ways: the amount of money we’ve made from the beginning to the current date, the average rate of money we’ve made over specific time periods (e.g., annual returns), how much better our investments did when compared to several characteristics of a benchmark (e.g., <a target="_blank" href="https://www.investopedia.com/terms/a/alpha.asp">alpha</a>), and even the annual compound rate it would have taken to get to our current investment based on our starting point (i.e., <a target="_blank" href="https://en.wikipedia.org/wiki/Compound_annual_growth_rate">CAGR</a>).</p>
<p>As important, if not more, is how we look at risk and its effect on return. I don’t know about you, but I want to make sure I’m making a good return based on an amount of risk I feel comfortable with. If we take on a huge amount of risk to make one particular return when we could have taken much less risk to make that very same return, the path to take for a more <strong>efficient investment</strong> is clear.</p>
<p>This is where understanding volatility, correlations, and risk-adjusted returns come into play by computing statistics such as standard deviation of returns (or volatility), <a target="_blank" href="https://www.investopedia.com/terms/b/beta.asp">beta</a>, the <a target="_blank" href="https://en.wikipedia.org/wiki/Sharpe_ratio">Sharpe ratio</a>, and the <a target="_blank" href="https://en.wikipedia.org/wiki/Sortino_ratio">Sortino ratio</a>.</p>
<p>And while we can compute all the statistics under the sun to measure our portfolio’s performance, it doesn’t do much good if we don’t include a reference point to see how well we’re doing in comparison. This is called a <a target="_blank" href="https://www.investopedia.com/terms/b/benchmark.asp">benchmark</a>, and we’ll be using the golden boy of cryptocurrencies: Bitcoin.</p>
<h3 id="heading-notebook-walk-through">Notebook Walk-Through</h3>
<p>So I don’t want to display a bunch of code here because I think you should go through the notebook yourself and get a feel for things. Don’t be afraid, the notebook includes some clear explanations and the code is commented! It’ll also help in better understanding this post. If you want, clone <a target="_blank" href="https://github.com/grantathon/crypto_portfolio_analysis">the repo</a> and give it a whirl first. However, I will show you results through some statistics and nice visualizations.</p>
<p>To start, we need to create a tradesheet that emulates how we invested our portfolio. The one below is included in <a target="_blank" href="https://github.com/grantathon/crypto_portfolio_analysis">the repo</a>. These are actually the same cryptos I invested in and the times I bought and sold them up until now, but the amount of money and the allocations (i.e., the amount I bought and sold) are not ;)</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/XzoSg0XtQ6s8pL4cdk3sKG-k381rkitWdTtr" alt="Image" width="427" height="382" loading="lazy"></p>
<p>You can think of the tradesheet as our <strong>investment strategy</strong>. These are the trades we decided to take based on our wizardry powers or what an algorithm told us.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/sikS8FYcWs5FmvVcfZY2naHH5UxGyMPDn3PJ" alt="Image" width="606" height="328" loading="lazy">
_Source: [Playstarbound](https://community.playstarbound.com/threads/glitch-ship-ai-feedback.80652/page-11" rel="noopener" target="<em>blank" title=")</em></p>
<p>Along with the tradesheet, we also need historical market data. I chose to go with something simple: download some CSVs from <a target="_blank" href="https://www.coingecko.com/">CoinGecko</a> and throw them into a data folder. Pulling data from an API would be better though!</p>
<p>Now we want to run a backtest on our investment strategy. Simply put, running a backtest allows us to go back in time to our first trade, walk forward in time, and simulate the trading activity that occurred in our portfolio up until today. A backtester can be very sophisticated and can be used in a lot of different scenarios (to the finance geeks: pun intended), but in our case it’s rather straightforward.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/I05CqzVIRz3ccjLIjcy85QUBz5apBZ4xWAcG" alt="Image" width="544" height="202" loading="lazy"></p>
<p>Based on the statistics above, it’s clear that our portfolio did fairly well when compared to our benchmark. The returns are better, volatility is only slightly worse, and our beta is surprisingly below 100%. And look at that alpha!</p>
<p>OK. Numbers are nice, but I want to see some charts.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/pnWUDH7fXvwbY-DlQ2wiYuK1inrGjP4P9vnY" alt="Image" width="738" height="493" loading="lazy"></p>
<p>Well that’s intimidating. The above chart shows how the USD value of our portfolio evolved over time including all of our cash flows (i.e., deposits and withdrawals). While it’s nice to visualize this, it’s hard to get a clear idea of how our portfolio did in true performance when cash flows are included. For example, if I deposited $1 million (I wish), the portfolio would appear to have a HUGE spike!</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/aI9DYZ7J2guZE1BcBaJ04dLfbPDaavgNcBPm" alt="Image" width="732" height="458" loading="lazy"></p>
<p>Now that’s better. By removing the daily returns when cash flows were witnessed, we have a more accurate representation of the true performance of our portfolio. Fortunately, we have a very small number of cash flows, so this method is acceptable. As you can see, it took us some time to catch up to Bitcoin, but it did and eventually surpassed it (thanks <a target="_blank" href="https://golem.network/">Golem</a> and <a target="_blank" href="https://neo.org/">NEO</a>).</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/H5PJZhnLSD23zJVoy7jFUAO8vDDzNGesxa3c" alt="Image" width="710" height="493" loading="lazy"></p>
<p>Actually, you can see that after the crazy Bitcoin, Ethereum, and Litecoin boom (aka the Coinbase boom), our portfolio became more diversified. This surely had a lot to do with the dampening of the upcoming Bitcoin drawdowns and the likely larger returns experienced among the newly added assets.</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/jD3HVn19ylfyNF6clE3QMxThd9ZbX6Q0WGVH" alt="Image" width="727" height="466" loading="lazy"></p>
<p>Well there you have it. Clearly, our portfolio experienced much less volatility (i.e., risk) after diversifying. Diversification (and luck) for the win!</p>
<p><img src="https://cdn-media-1.freecodecamp.org/images/BcXX6mRbl6s-S5cUvheXUxWzYQNLcNAbIQbd" alt="Image" width="582" height="567" loading="lazy"></p>
<p>For me, this is the most interesting plot. This is a matrix that represents the correlations between all of the assets in our portfolio. While a lot of assets had a <a target="_blank" href="https://statistics.laerd.com/statistical-guides/pearson-correlation-coefficient-statistical-guide.php">medium to high correlation</a> with one another, <a target="_blank" href="https://www.bitcoincash.org/">Bitcoin Cash</a> had a very low correlation to every single asset. You can even see that it was negatively correlated with <a target="_blank" href="https://omisego.network/">OmiseGO</a>! Correlations do change over time, but it’s nonetheless interesting to see these types of relationships within our portfolio.</p>
<p>Again, go ahead and clone <a target="_blank" href="https://github.com/grantathon/crypto_portfolio_analysis">the repo</a> and play around a bit so you can understand in more detail how we went about analyzing our portfolio. You can even add your own tradesheet to get a glimpse into yours. And if you find bugs, let me know!</p>
<h3 id="heading-summing-it-all-up">Summing It All Up</h3>
<p>I hope you’ve gained a better appreciation for why it’s important to look at your portfolio through various lenses. It’s hard to get a clear understanding from just visualizing asset price movements, especially with all that’s been going on lately in the crypto space. Also, it’s not always clear how much risk we’re taking on over time, and how those risks will evolve when we invest.</p>
<p>What is clear is that diversification in such a market is important, because none of us knows where this market is going. With that in mind, best to keep an eye on your ship while weathering the storms and HODL.</p>
<p>By the way, none of this should be treated as investment advice and same goes for the code. Whichever investments you pursue are purely at your own discretion.</p>
<p>Full disclosure: At the time of writing this article I was invested in BCH, BTC, ETH, GNT, LTC, NEO, and OMG.</p>
<p><em>I’m Grant and I’m a freelance SEO and content professional. If you’re looking to grow your brand's organic search traffic, I can help with your <a target="_blank" href="https://www.writefintech.com/">fintech SEO</a>. Cheers!</em></p>
 ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
