<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ Piotr Plonski - freeCodeCamp.org ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ Piotr Plonski - freeCodeCamp.org ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Sat, 13 Jun 2026 09:47:45 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/author/pplonski/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ How to Perform Classification with Automated Machine Learning (AutoML) ]]>
                </title>
                <description>
                    <![CDATA[ In this article I will show you how to use Automated Machine Learning (AutoML) to build a classifier for tabular data. And don't worry – I will explain all strange definitions :) There won't be any math in this article (although I like math, it is co... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/classification-with-python-automl/</link>
                <guid isPermaLink="false">66d46089246e57ac83a2c7b7</guid>
                
                    <category>
                        <![CDATA[ automation ]]>
                    </category>
                
                    <category>
                        <![CDATA[ image classification ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Machine Learning ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Piotr Plonski ]]>
                </dc:creator>
                <pubDate>Mon, 11 May 2020 19:15:12 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2020/05/Untitled-Design--1-.jpg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>In this article I will show you how to use Automated Machine Learning (AutoML) to build a classifier for tabular data. And don't worry – I will explain all strange definitions :)</p>
<p>There won't be any math in this article (although I like math, it is concise). I will try to show things in such way that you can better understand Machine Learning (and AutoML).</p>
<h2 id="heading-first-things-first-what-is-machine-learning">First things first: what is Machine Learning?</h2>
<p><strong>Machine Learning (ML)</strong> is a very broad topic. We can use its definition to explain what it is: teaching a machine to do a task. This is very similar to programming!</p>
<p>The key difference is that in programming, you need to provide an exact recipe (code) that tells the machine how it should perform. In <strong>Machine Learning</strong> you also provide the code, but that code will tell the machine how to learn based on previous examples (historical data).</p>
<p>This code is then used to create a <strong>Machine Learning model</strong>. All future actions done by the machine will be computed by the model.</p>
<p>This is a very loose definition, but you should get a basic understanding about ML from it. I've prepared some schematic pictures showing how programming vs Machine Learning works. I hope they will help you to visualize the difference.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/05/programming.jpg" alt="Image" width="600" height="400" loading="lazy"></p>
<p><em>In programming, humans need to provide exact steps (code) to tell a machine how it should process input data.</em></p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/05/ml.jpg" alt="Image" width="600" height="400" loading="lazy"></p>
<p><em>In Machine Learning, humans need to provide code and historical data for creating Machine Learning Models. After ML Model training, it can be used for computing outputs on unseen data.</em></p>
<p>In the above pictures you can see that programming is often much simpler than Machine Learning (smaller number of total steps, and no need for historical data).</p>
<p>And it often feels like programming is much easier than ML. But there are situations where providing the exact program is impossible.</p>
<p>For example: image classification tasks - say you would like to know what is in the image based on its content. It is impossible to write down all conditions to recognize what is in a picture (pictures can have different size, scales, and so on ...). It is easy to see with the human eye, but writing an exact program is impossible.</p>
<p>But with ML you can create a model that will be able to recognize images. So let's look at some more definitions.</p>
<h3 id="heading-classification">Classification</h3>
<p>Classification is the process of assigning a label (class) to a sample (one instance of data). The ML model that is doing a classification is called a <strong>classifier</strong>.</p>
<h3 id="heading-tabular-data">Tabular data</h3>
<p>Tabular data is simply data in table format, similar to a spreadsheet. Other data formats can be images, video, text, documents, or audio. Data in tabular format has rows which represent samples (observations) and columns which represent features.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/05/tabular-data_01.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p><em>Example of tabular data (Titanic dataset).</em></p>
<p>In this article we will analyze only tabular data. The typical task in ML is to predict one of the columns. Such a column is called the <strong>target</strong> column.</p>
<h2 id="heading-the-iris-data-set">The Iris data set</h2>
<p>I will show you how to build a Machine Leaning model with AutoML on very simple data set called <strong>Iris</strong>. The data can be downloaded from many places (it is the same data!):</p>
<ul>
<li><p>UCI data repository: <a target="_blank" href="https://archive.ics.uci.edu/ml/datasets/Iris">https://archive.ics.uci.edu/ml/datasets/Iris</a></p>
</li>
<li><p>my collection of good data sets for start with ML: <a target="_blank" href="https://github.com/pplonski/datasets-for-start/blob/master/iris/data.csv">https://github.com/pplonski/datasets-for-start/blob/master/iris/data.csv</a></p>
</li>
<li><p>Kaggle: <a target="_blank" href="https://www.kaggle.com/uciml/iris">https://www.kaggle.com/uciml/iris</a></p>
</li>
</ul>
<p>The <strong>Iris</strong> data set contains 150 rows, where each row describes a flower. Each row has 4 features (columns) which describe properties of the flower. These features are:</p>
<ul>
<li><p>sepal length (cm)</p>
</li>
<li><p>sepal width (cm)</p>
</li>
<li><p>petal length (cm)</p>
</li>
<li><p>petal width (cm)</p>
</li>
</ul>
<p>A label (class) is assigned to each flower which tells us what type of iris it is. In this data set there are 3 classes:</p>
<ul>
<li><p>setosa</p>
</li>
<li><p>versicolor</p>
</li>
<li><p>virginica</p>
</li>
</ul>
<p>Let's take the first row of the data. We have:</p>
<ul>
<li><p>sepal length = 5.1 cm</p>
</li>
<li><p>sepal width = 3.5 cm</p>
</li>
<li><p>petal length = 1.4 cm</p>
</li>
<li><p>petal width = 0.2 cm</p>
</li>
<li><p>class = setosa</p>
</li>
</ul>
<p>The first row tells us that someone took the iris type 'setosa', measured its sepal and petal properties, and saved it to the dataset.</p>
<p>Where is the Machine Learning here? Let's assume that we have a set of iris flowers but we don't know what types (classes) they are. We know how to measure the sepal and petal length and width but we can't say what type or class of iris is it.</p>
<p>We can use Machine Learning to <strong>classify</strong> the flower based on our measures. The ML model will take as input the 4 numbers (our measures) and will output the class of the flower.</p>
<h2 id="heading-lets-code">Let's code!</h2>
<p>I will use python in this tutorial. So I assume that you have python installed and know how to install packages.</p>
<p>We will need a few packages, and all of them will be installed with the AutoML package <a target="_blank" href="https://github.com/mljar/mljar-supervised">mljar-supervised</a>. To install it run:</p>
<pre><code class="lang-python">pip install mljar-supervised
</code></pre>
<p>All the code presented in this article is available on [github](https://www.freecodecamp.org/news/p/49d67cd9-1642-43c6-902d-edcfd56ab013/(https://github.com/mljar/mljar-examples/blob/master/Iris_classification/Iris_classification.ipynb). At the beginning, let's import the packages we need:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-keyword">from</span> sklearn <span class="hljs-keyword">import</span> datasets
<span class="hljs-keyword">from</span> sklearn.model_selection <span class="hljs-keyword">import</span> train_test_split
<span class="hljs-keyword">from</span> supervised.automl <span class="hljs-keyword">import</span> AutoML
</code></pre>
<p>Then load the data:</p>
<pre><code class="lang-python">data = datasets.load_iris()
X = pd.DataFrame(data[<span class="hljs-string">"data"</span>], columns=data[<span class="hljs-string">"feature_names"</span>])
y = pd.Series(data[<span class="hljs-string">"target"</span>], name=<span class="hljs-string">"target"</span>).map({i:v <span class="hljs-keyword">for</span> i, v <span class="hljs-keyword">in</span> enumerate(data[<span class="hljs-string">"target_names"</span>])})
</code></pre>
<p>This is how our data looks like:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/05/image-75.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p><em>The</em> <code>X</code> variable ( <code>print(X)</code> )</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/05/image-76.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p><em>The</em> <code>y</code>variable ( <code>print(y)</code> )</p>
<p>We will split our data into two separate sets:</p>
<ul>
<li><p><strong>train</strong> - samples which will be used to train the Machine Learning model</p>
</li>
<li><p><strong>test</strong> - samples which we will use to check how our Machine Learning model is working on unseen (in the training process) data</p>
</li>
</ul>
<pre><code class="lang-python">X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=<span class="hljs-number">0.1</span>)
</code></pre>
<p>We will use 90% of our data for training (90%*150=135 samples) and 10% (15 samples) for testing.</p>
<p>Now that we have our data ready we can train the Machine Learning model. Maybe you've heard that there are many ML algorithms. All of them can be used for model training, such as the following models:</p>
<ul>
<li><p>Decision Tree,</p>
</li>
<li><p>Logistic Regression,</p>
</li>
<li><p>Random Forest,</p>
</li>
<li><p>Neural Networks,</p>
</li>
<li><p>Xgboost,</p>
</li>
</ul>
<p>just to name the few.</p>
<h3 id="heading-which-model-we-should-use-which-one-is-the-best">Which model we should use? Which one is the best?</h3>
<p>There is no single answer to the above questions. It all depends on the data itself. The common approach is to check as many as you can and select the best performing model. Very often the simplest algorithms are very good to start.</p>
<p>But this is not the end of our problems. Each of the algorithms usually has parameters which control the way the model is trained. They are so-called <strong>hyper-parameters</strong>. They should be carefully set for the algorithm. To select their values we also need to check a few of them.</p>
<p>For selecting algorithm and hyper-parameters we can use a validation which can be performed in many different ways. I won't go into the details of validation. I will just show you the tool which can handle all of the above problems. It is <strong>Automated Machine Learning (AutoML)</strong>.</p>
<p>AutoML can check many different ML algorithms and tune hyper-parameters for them. It will search for the best ML model for available data.</p>
<p>In real-life, AutoML is used to do even more, like feature engineering (preparing features for analysis and constructing new ones) or deploying models as REST APIs.</p>
<p>I'm using <code>AutoML</code> from the <code>mljar-supervised</code> package (of which I'm the author). It has a very simple interface. Let's train the model:</p>
<pre><code class="lang-python">automl = AutoML(algorithms=[<span class="hljs-string">"Decision Tree"</span>, <span class="hljs-string">"Linear"</span>, <span class="hljs-string">"Random Forest"</span>],
                total_time_limit=<span class="hljs-number">5</span>*<span class="hljs-number">60</span>)
automl.fit(X_train, y_train)
</code></pre>
<p>The above two lines will check 3 different algorithms for us: Decision Tree, Logistic Regression and Random Forest. Then it'll select the best one. There is a time limit set to 5 minutes (5*60 seconds) for total training time.</p>
<p>As a result of running <code>AutoML</code> you will get output like this:</p>
<pre><code class="lang-python">Create directory AutoML_1
AutoML task to be solved: multiclass_classification
AutoML will use algorithms: [<span class="hljs-string">'Decision Tree'</span>, <span class="hljs-string">'Linear'</span>, <span class="hljs-string">'Random Forest'</span>]
AutoML will optimize <span class="hljs-keyword">for</span> metric: logloss
AutoML will <span class="hljs-keyword">try</span> to check about <span class="hljs-number">33</span> models
Decision Tree final logloss <span class="hljs-number">0.5453226492448378</span> time <span class="hljs-number">30.04</span> seconds
Decision Tree final logloss <span class="hljs-number">0.6419811899692177</span> time <span class="hljs-number">21.25</span> seconds
Decision Tree final logloss <span class="hljs-number">0.4569697687554296</span> time <span class="hljs-number">16.73</span> seconds
Linear final logloss <span class="hljs-number">0.16507067466592637</span> time <span class="hljs-number">15.68</span> seconds
Random Forest final logloss <span class="hljs-number">0.11891177026579884</span> time <span class="hljs-number">28.72</span> seconds
Random Forest final logloss <span class="hljs-number">0.24256194594421207</span> time <span class="hljs-number">28.73</span> seconds
Random Forest final logloss <span class="hljs-number">0.2761028104749779</span> time <span class="hljs-number">27.61</span> seconds
Random Forest final logloss <span class="hljs-number">0.2536702528991272</span> time <span class="hljs-number">29.0</span> seconds
Random Forest final logloss <span class="hljs-number">0.1752405529204018</span> time <span class="hljs-number">27.86</span> seconds
Random Forest final logloss <span class="hljs-number">0.17394416017742964</span> time <span class="hljs-number">27.69</span> seconds
Ensemble final logloss <span class="hljs-number">0.11781603875353275</span> time <span class="hljs-number">0.36</span> seconds
</code></pre>
<p>The results of running this AutoML experiment are available on <a target="_blank" href="https://github.com/mljar/mljar-examples/tree/master/Iris_classification/AutoML_1#automl-leaderboard">github</a>. When you look into the directory created by <code>AutoML</code> you will see the <code>README.md</code> file. It contains the report from the training:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/05/image-77.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>What is more, you can check each trained model by clicking on its link:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/05/image-78.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>To compute predictions, just run the following lines:</p>
<pre><code class="lang-python">y_predicted = automl.predict(X_test)

print(pd.DataFrame({<span class="hljs-string">"Predicted"</span>: y_predicted[<span class="hljs-string">"label"</span>], <span class="hljs-string">"Target"</span>: np.array(y_test)}))
</code></pre>
<p>You will get the following:</p>
<pre><code class="lang-python">     Predicted      Target
<span class="hljs-number">0</span>       setosa      setosa
<span class="hljs-number">1</span>    virginica  versicolor
<span class="hljs-number">2</span>   versicolor  versicolor
<span class="hljs-number">3</span>    virginica   virginica
<span class="hljs-number">4</span>   versicolor  versicolor
<span class="hljs-number">5</span>       setosa      setosa
<span class="hljs-number">6</span>       setosa      setosa
<span class="hljs-number">7</span>   versicolor  versicolor
<span class="hljs-number">8</span>       setosa      setosa
<span class="hljs-number">9</span>   versicolor  versicolor
<span class="hljs-number">10</span>   virginica   virginica
<span class="hljs-number">11</span>  versicolor  versicolor
<span class="hljs-number">12</span>   virginica   virginica
<span class="hljs-number">13</span>   virginica   virginica
<span class="hljs-number">14</span>  versicolor  versicolor
</code></pre>
<p>From the above you can see that there was 1 mistake in the predictions (the row with index 1). The ML model predicted class <code>virginica</code> but it should be <code>versicolor</code>. The accuracy of the ML model is:</p>
<pre><code class="lang-python">Accuracy = <span class="hljs-number">14</span> (correct answers) / <span class="hljs-number">15</span> (total samples) = <span class="hljs-number">93.33</span>%
</code></pre>
<h2 id="heading-summary">Summary</h2>
<p>In this article I showed you the differences between programming and Machine Learning. I hope you understand it a bit better.</p>
<p>Machine Learning is a very broad topic and for sure can't be presented in one article. Learning and applying ML can give you a lot of satisfaction, though, so I encourage everyone to explore further.</p>
<p>Automated Machine Learning improves the process of model training by automating algorithm and hyper-parameters search. I hope that AutoML will make ML more accessible to many developers out there.</p>
<p>If you have any questions or would like to read more articles like this please let me know.</p>
 ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
