<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ python beginner - freeCodeCamp.org ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ python beginner - freeCodeCamp.org ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Sun, 24 May 2026 22:24:30 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/tag/python-beginner/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ How Passing by Object Reference Works in Python ]]>
                </title>
                <description>
                    <![CDATA[ If you've ever modified a variable inside a Python function and been surprised or confused by what happened to it outside the function, you're not alone. This tripped me up for a long time. Coming fro ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-passing-by-object-reference-works-in-python/</link>
                <guid isPermaLink="false">69c5415810e664c5dadbf6e0</guid>
                
                    <category>
                        <![CDATA[ python beginner ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Programming Blogs ]]>
                    </category>
                
                    <category>
                        <![CDATA[ functions ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Mokshita V P ]]>
                </dc:creator>
                <pubDate>Thu, 26 Mar 2026 14:23:20 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5e1e335a7a1d3fcc59028c64/0fb11934-22c6-4304-948c-54c7d423c79d.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>If you've ever modified a variable inside a Python function and been surprised or confused by what happened to it outside the function, you're not alone. This tripped me up for a long time.</p>
<p>Coming from tutorials that talked about "call by value" and "call by reference," I assumed Python must follow one of those two models. It doesn't. Python does something slightly different, and once you understand it, a lot of previously confusing behavior will suddenly click.</p>
<p>In this article, you'll learn:</p>
<ul>
<li><p>What calling by value and calling by reference mean</p>
</li>
<li><p>How other languages like C handle this</p>
</li>
<li><p>What Python actually does (passing by object reference)</p>
</li>
<li><p>How mutable and immutable types affect behavior inside functions</p>
</li>
</ul>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#heading-call-by-value-and-call-by-reference-explained">Call by Value and Call by Reference Explained</a></p>
</li>
<li><p><a href="#heading-how-it-works-in-c-with-examples">How It Works in C (with Examples)</a></p>
</li>
<li><p><a href="#heading-what-python-does-instead">What Python Does Instead</a></p>
</li>
<li><p><a href="#heading-mutable-vs-immutable-types">Mutable vs Immutable Types</a></p>
</li>
<li><p><a href="#heading-conclusion">Conclusion</a></p>
</li>
</ul>
<h2 id="heading-call-by-value-and-call-by-reference-explained">Call by Value and Call by Reference Explained</h2>
<p>Before we get to Python, let's quickly define these two terms.</p>
<p><strong>Call by value</strong> means a copy of the variable is passed to the function. Whatever you do to it inside the function, the original stays unchanged.</p>
<p><strong>Call by reference</strong> means the actual memory location of the variable is passed. Changes inside the function directly affect the original variable.</p>
<p>Many languages support one or both of these models. Python, however, uses neither – at least not in the traditional sense.</p>
<h3 id="heading-how-it-works-in-c-with-examples">How It Works in C (with Examples)</h3>
<p>C is a good example of a language that supports both models explicitly.</p>
<p>Here's how you call by value in C. The original variable is unaffected:</p>
<pre><code class="language-c">#include &lt;stdio.h&gt;

void modify(int *n) {

*n = *n + 10;

printf("Inside function: %d\n", *n); }

int main() {

int x = 5;

modify(&amp;x);

printf("Outside function: %d\n", x);

return 0; }
</code></pre>
<p>Output:</p>
<p>Inside function: 15</p>
<p>Outside function: 15 ← original changed!</p>
<p>In C, you explicitly choose the behavior by deciding whether to pass a pointer or a plain value. Python doesn't give you that choice, but what it does instead is actually quite logical.</p>
<h2 id="heading-what-python-does-instead">What Python Does Instead</h2>
<p>Python uses a model called <strong>passing by object reference</strong> (sometimes called passing by assignment).</p>
<p>When you pass a variable to a function in Python, you're passing a reference to the object that variable points to, not a copy of the value, and not the variable itself.</p>
<p>What happens next depends entirely on whether that object is <strong>mutable</strong> (can be changed in place) or <strong>immutable</strong> (cannot be changed in place).</p>
<h3 id="heading-mutable-vs-immutable-types">Mutable vs Immutable Types</h3>
<p><strong>Immutable types</strong> in Python include <code>int</code>, <code>float</code>, <code>str</code>, and <code>tuple</code>. These objects cannot be modified in place. When you "change" one inside a function, Python creates a brand new object and the original is left untouched.</p>
<pre><code class="language-python">def modify_number(n):
     n = n + 10
     print("Inside function:", n)

x = 5

modify_number(x)

print("Outside function:", x)
</code></pre>
<p>Output:</p>
<p>Inside function: 15</p>
<p>Outside function: 15 ← original unchanged</p>
<p><strong>Mutable types</strong> include <code>list</code>, <code>dict</code>, and <code>set</code>. These can be changed in place. When you modify one inside a function, you're modifying the same object the caller is holding a reference to.</p>
<pre><code class="language-python">def modify_list(items):

    items.append(99)

    print("Inside function:", items)

my_list = [1, 2, 3]

modify_list(my_list)

print("Outside function:", my_list)
</code></pre>
<p>Output:</p>
<p>Inside function: [1, 2, 3, 99]</p>
<p>Outside function: [1, 2, 3, 99] ← original changed!</p>
<p>This is the key insight: Python doesn't decide behavior based on how you pass something, it decides based on what type of object you're passing.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Python doesn't use call by value or call by reference. It <strong>passes by object reference</strong>, where the function receives a reference to the object, and whether that object can be modified in place determines what happens next.</p>
<p>To recap:</p>
<ul>
<li><p><strong>Immutable types</strong> (<code>int</code>, <code>str</code>, <code>tuple</code>): a new object is created inside the function, original stays the same</p>
</li>
<li><p><strong>Mutable types</strong> (<code>list</code>, <code>dict</code>, <code>set</code>): the original object is modified directly</p>
</li>
</ul>
<p>Once this clicked for me, a lot of the "why is Python doing this?" moments started making sense. If you're just getting started with functions in Python, keep this in the back of your mind, it'll save you a lot of debugging headaches.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Use the Polars Library in Python for Data Analysis ]]>
                </title>
                <description>
                    <![CDATA[ In this article, I’ll give you a beginner-friendly introduction to the Polars library in Python. Polars is an open-source library, originally written in Rust, which makes data wrangling easier in Python. The syntax of Polars is very similar to Pandas... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-use-the-polars-library-in-python-for-data-analysis/</link>
                <guid isPermaLink="false">6939b88a5a4b3354fde8c07b</guid>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python 3 ]]>
                    </category>
                
                    <category>
                        <![CDATA[ python beginner ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Polars ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Programming Blogs ]]>
                    </category>
                
                    <category>
                        <![CDATA[ dataset ]]>
                    </category>
                
                    <category>
                        <![CDATA[ dataframe ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Sara Jadhav ]]>
                </dc:creator>
                <pubDate>Wed, 10 Dec 2025 18:14:34 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1765325732081/94ab547b-fdaf-41bb-ae60-ad03be31211a.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>In this article, I’ll give you a beginner-friendly introduction to the Polars library in Python.</p>
<p>Polars is an open-source library, originally written in Rust, which makes data wrangling easier in Python. The syntax of Polars is very similar to Pandas, so if you’ve worked with Pandas or the PySpark library before, using Polars should be a breeze.</p>
<p>Polars excels at giving fast results. It’s also memory efficient and helps you optimize your code using parallelism. It also lets you convert data from and to various libraries like NumPy, Pandas, and others.</p>
<p>In this tutorial, we’ll be learning about the Polars Library from absolute scratch, from installing and importing the library on the system, to manipulating data in a dataset with the help of this library.</p>
<p>First, we’ll look at Polars basic functions. We’ll be also writing some practical code, which will help you apply what you’ve learned. Finally, we’ll be working with an example dataset to solidify some more key Polars concepts. Let’s dive in.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-installing-and-importing-the-polars-library">Installing and Importing the Polars Library</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-is-a-series">What is a Series?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-is-a-dataframe">What is a DataFrame?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-read-csv-files-with-polars">How to Read CSV Files with Polars</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-some-other-important-functions">Some other Important Functions</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-summary">Summary</a></p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>Even though this tutorial is beginner-friendly, having some basic knowledge of the following areas will help you understand this article better:</p>
<ul>
<li><p>Basic Python syntax</p>
</li>
<li><p>Data structures</p>
</li>
<li><p>Ability to import libraries and knowledge of using functions and methods</p>
</li>
<li><p>Basics of NumPy and Pandas will come in handy (not necessary).</p>
</li>
</ul>
<p>Now, that you’re aware of the prior requirements to follow along, let’s get started with our tutorial.</p>
<h2 id="heading-installing-and-importing-the-polars-library">Installing and Importing the Polars Library</h2>
<p>To install the Polars library, you can use the following command in your terminal:</p>
<p><code>pip install polars</code></p>
<p>Now, this works if you already have the pip package manager on your system. If you’re on a conda environment, you can work with this:</p>
<p><code>conda install -c conda-forge polars</code></p>
<p>But I strongly recommend using the pip package manager to avoid various inconveniences.</p>
<p>Let’s import Polars in our program. We’ll follow the same process as we use for importing other libraries in Python:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> polars <span class="hljs-keyword">as</span> pl <span class="hljs-comment"># pl is a conventional alias</span>
</code></pre>
<p>While creating a Polars object with the data, it’s important to know the size of our data. Polars has the capacity to have 2³² rows in the DataFrame. To load more data, use the following command to install the Polars library:</p>
<p><code>pip install polars[rt64]</code></p>
<p>If you want to use the Polars library right away without actually installing it on your system, using a Google Colab notebook is the best option. When using a Google Colab Notebook, you can directly import and start using Polars in your program. I’ll be using Google Colab Notebook for this tutorial.</p>
<h2 id="heading-what-is-a-series">What is a Series?</h2>
<p>A series is a fundamental element of a DataFrame. It’s a 1-dimensional data-structure that you can correlate with a ‘list’ in Python or a ‘1-D array’ in NumPy. But the difference between a series and a 1-D array is that the former is labeled while the later is not. Many series come together to form a DataFrame.</p>
<p>We can create a series with homogenous data as well as heterogenous data.</p>
<h3 id="heading-creating-a-series-with-homogenous-data">Creating a Series with Homogenous Data</h3>
<p>In a series, the datatype of all the elements should be the same. If it’s not, an error is thrown.</p>
<p>The syntax to define a Polars series is as follows:</p>
<p><code>var_name = pl.Series(“column_name”, [values])</code></p>
<p>The following code shows an example of a homogenous series definition in Python:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> polars <span class="hljs-keyword">as</span> pl
series_homo = pl.Series(<span class="hljs-string">"Numbers"</span>, [<span class="hljs-string">'One'</span>, <span class="hljs-string">'Two'</span>, <span class="hljs-string">'Three'</span>, <span class="hljs-string">'Four'</span>, <span class="hljs-string">'Five'</span>])
print(series_homo)
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-plaintext">shape: (5,)
Series: 'Numbers' [str]
[
    "One"
    "Two"
    "Three"
    "Four"
    "Five"
]
</code></pre>
<p>In the above code, we first imported the Polars library using the <code>pl</code> alias to start using it throughout the code. Using aliases is a matter of choice, but <code>pl</code> is a conventional one (like <code>np</code> for NumPy and <code>pd</code> for Pandas). The benefit of using conventional aliases is that when you hand over the code to someone else, it’s easy for them to follow along.</p>
<p>Next, we used the <code>pl.Series()</code> function to create a Polars series object. As its first parameter, we passed the label for our series (<code>Numbers</code> in this case). Then we passed the values to be stores in the form of a list. Remember that the list of values that we pass acts as a single argument. Finally, we printed our series.</p>
<p>We can see that the output tells us about the dimensions of the the Polars object as well as the datatype of the series. The shape (rows, columns) tells us about the the number of rows and columns present in the Polars object.</p>
<p>We can find the data-type of a homogenous series explicitly by using the <code>dtype</code> method.</p>
<pre><code class="lang-python">print(series_homo.dtype)
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-plaintext">String
</code></pre>
<h3 id="heading-creating-a-series-with-heterogenous-data">Creating a Series with Heterogenous Data</h3>
<p>Heterogenous data means that the data-type of all the elements is not the same. The syntax to define a series with heterogenous data is as follows:</p>
<p><code>var_name = pl.Series(“Column_name”, [values], strict=False)</code></p>
<p>So you’re probably wondering, based on what I said above: how can we have a series with heterogenous data? Well, one thing to note is that a series is always homogenous irrespective of the data that is fed to it. I’ll explain below - first let’s look at this code:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> polars <span class="hljs-keyword">as</span> pl

series_hetero = pl.Series(<span class="hljs-string">"Numbers"</span>, [<span class="hljs-number">1</span>, <span class="hljs-string">"Two"</span>, <span class="hljs-number">3</span>, <span class="hljs-string">"Four"</span>], strict=<span class="hljs-literal">False</span>)
print(series_hetero)
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-plaintext">shape: (4,)
Series: 'Numbers' [str]
[
    "1"
    "Two"
    "3"
    "Four"
]
</code></pre>
<p>Here, we created a series object using the <code>pl.Series()</code> function, labelled it, and passed the values that we want in our series.</p>
<p>But you’ll notice that we have provided heterogenous data (data that doesn’t have the same datatype) to the function. Usually, this throws an error. But as we have set the <code>strict</code> parameter as False, the function now becomes lenient with the schema of the series. (The schema is just the expected data-type of the values that are to be recorded in the series.)</p>
<p>If no particular schema is defined for a series that’s fed heterogenous data, <code>pl.Series()</code> sets the schema to <code>pl.Utf8</code> (string datatype). You can see this automatic fixing of the schema in the above example. This prevents the program from bugging, as a string datatype can comprehend characters – numbers as well as symbols.</p>
<p>Also, we can see that datatype of all elements is the same (<code>pl.Utf8</code>). This means that the series is homogenous, even though we put heterogenous data in it.</p>
<p>If we define a schema for the series, then the Polars library converts all the records – which show a different datatype than the defined schema – to null objects. This should be clear in the following example:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> polars <span class="hljs-keyword">as</span> pl
<span class="hljs-comment"># defined the schema as Integer bit 32</span>
series = pl.Series(<span class="hljs-string">"ints"</span>, [<span class="hljs-number">1</span>, <span class="hljs-number">-2</span>, <span class="hljs-number">3</span>, <span class="hljs-number">4</span>, <span class="hljs-number">5</span>, <span class="hljs-string">'Thirteen'</span>, <span class="hljs-string">'Fourteen'</span>], dtype=pl.Int32, strict=<span class="hljs-literal">False</span>)
print(series)
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-plaintext">shape: (7,)
Series: 'ints' [i32]
[
    1
    -2
    3
    4
    5
    null
    null
]
</code></pre>
<p>Here, we can see that the last two entities were ‘String’, but since we set the schema as ‘Integer’, they were reflected as null records.</p>
<p>So as you can see, the leniency of the program depends on whether you set the <code>strict</code> parameter to True of False. If we set it as True, we enforce the schema to the data strictly. Upon failing to obey the schema, the program raises an exception. On the other hand, if we set the <code>strict</code> parameter as False, the series still preserves its homogenous nature by turning schema-disobeying elements to null.</p>
<p>Now that you understand how series work, we’re ready to move on to DataFrames.</p>
<h2 id="heading-what-is-a-dataframe">What is a DataFrame?</h2>
<p>A DataFrame is a two-dimensional data structure that you can use to store large numbers of related parameters of the collected data. It’s also useful for analyzing that data. A DataFrame is nothing more than the collection of many series, each labelled differently to store different aspects of data.</p>
<p>Here’s the syntax to create a Polars DataFrame object:</p>
<p><code>var_name = pl.DataFrame({key: value pairs}, schema)</code></p>
<p>The following example shows you how to define a DataFrame object in Python:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> polars <span class="hljs-keyword">as</span> pl
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np

schema = {<span class="hljs-string">"Number"</span>: pl.UInt32, <span class="hljs-string">"Natural Log"</span>: <span class="hljs-literal">None</span>, <span class="hljs-string">"Log Base 10"</span>: <span class="hljs-literal">None</span>}

df = pl.DataFrame(
    {
        <span class="hljs-string">"Number"</span> : np.arange(<span class="hljs-number">1</span>, <span class="hljs-number">11</span>),
        <span class="hljs-string">"Natural Log"</span> : [np.log(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>,<span class="hljs-number">11</span>)],
        <span class="hljs-string">'Log Base 10'</span> : [np.log10(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>,<span class="hljs-number">11</span>)]
        },
    schema=schema
    )
print(df)
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-plaintext">shape: (10, 3)
┌────────┬─────────────┬─────────────┐
│ Number ┆ Natural Log ┆ Log Base 10 │
│ ---    ┆ ---         ┆ ---         │
│ u32    ┆ f64         ┆ f64         │
╞════════╪═════════════╪═════════════╡
│ 1      ┆ 0.0         ┆ 0.0         │
│ 2      ┆ 0.693147    ┆ 0.30103     │
│ 3      ┆ 1.098612    ┆ 0.477121    │
│ 4      ┆ 1.386294    ┆ 0.60206     │
│ 5      ┆ 1.609438    ┆ 0.69897     │
│ 6      ┆ 1.791759    ┆ 0.778151    │
│ 7      ┆ 1.94591     ┆ 0.845098    │
│ 8      ┆ 2.079442    ┆ 0.90309     │
│ 9      ┆ 2.197225    ┆ 0.954243    │
│ 10     ┆ 2.302585    ┆ 1.0         │
└────────┴─────────────┴─────────────┘
</code></pre>
<p>Above, we created a Polars DataFrame object with the <code>pl.DataFrame()</code> function. In the function, we created a dictionary as an argument for passing the values of the DataFrame.</p>
<p>In the dictionary, each key-value pair represents a series. Each key represents the label of the series, whereas its value represent the values of the series. The values are passed in the form of a list as each key can map to only one value.</p>
<p>Then we defined the schema for the DataFrame. Again, the schema is a dictionary, where each key-value pair corresponds to the schema of the series. In the schema, every key represents the label of the series (to map the schema to the correct series) and its value represents the schema.</p>
<p>In the output, we can see that we got a nice table representing our data. The labels are neatly separated from the data and below them, their schema is also represented.</p>
<h3 id="heading-what-is-a-schema">What is a Schema?</h3>
<p>A schema refers to the definition of the datatype of the series. We fix a particular datatype to the homogenous series to avoid getting in mixed-data.</p>
<p>For example, in the above code, we set the datatype of the column <code>Number</code> to <code>Unsigned Integer - 32 bit (pl.UInt32)</code> as we don’t want to put negative integers in our NumPy logarithm function.</p>
<p>Now, if we want to hide the datatype (that’s written below each label), we can use the following function:</p>
<pre><code class="lang-python">pl.Config.set_tbl_hide_column_data_types(active=<span class="hljs-literal">True</span>)
</code></pre>
<h3 id="heading-the-head-tail-and-glimpse-functions">The Head, Tail, and Glimpse Functions</h3>
<p>The <code>head()</code>, <code>tail()</code> and <code>glimpse()</code> functions are used to have a quick look at the data by reviewing certain records (rows). These are useful especially for large datasets for taking a look at the data, for example to see which columns are present, what type of data is present in each column, and so on.</p>
<p>The <code>head()</code> function prints the given number of rows (passed as the argument of the <code>head()</code> function) from the top of the DataFrame. If no argument is passed, it prints the first five rows of the DataFrame.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> polars <span class="hljs-keyword">as</span> pl
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np

schema = {<span class="hljs-string">"Number"</span>: pl.UInt32, <span class="hljs-string">"Natural Log"</span>: <span class="hljs-literal">None</span>, <span class="hljs-string">"Log Base 10"</span>: <span class="hljs-literal">None</span>}

df = pl.DataFrame(
    {
        <span class="hljs-string">"Number"</span> : np.arange(<span class="hljs-number">1</span>, <span class="hljs-number">11</span>),
        <span class="hljs-string">"Natural Log"</span> : [np.log(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>,<span class="hljs-number">11</span>)],
        <span class="hljs-string">'Log Base 10'</span> : [np.log10(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>,<span class="hljs-number">11</span>)]
        },
    schema=schema
    )
pl.Config.set_tbl_hide_column_data_types(active=<span class="hljs-literal">True</span>)
print(df.head(<span class="hljs-number">3</span>))
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-plaintext">shape: (3, 3)
┌────────┬─────────────┬─────────────┐
│ Number ┆ Natural Log ┆ Log Base 10 │
╞════════╪═════════════╪═════════════╡
│ 1      ┆ 0.0         ┆ 0.0         │
│ 2      ┆ 0.693147    ┆ 0.30103     │
│ 3      ┆ 1.098612    ┆ 0.477121    │
└────────┴─────────────┴─────────────┘
</code></pre>
<p>In this example, we have the used the same DataFrame that we just created. Then we used the <code>head()</code> function to output the first three rows of the DataFrame. Also, you may now notice that the schema representation under column names has disappeared. This is because we used <code>pl.Config.set_tbl_hide_column_data_types(active=True)</code>.</p>
<p>The <code>glimpse()</code> function presents the data briefly and in a horizontal manner (rows are represented as columns and columns are represented as rows) for better readability.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> polars <span class="hljs-keyword">as</span> pl
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np

schema = {<span class="hljs-string">"Number"</span>: pl.UInt32, <span class="hljs-string">"Natural Log"</span>: <span class="hljs-literal">None</span>, <span class="hljs-string">"Log Base 10"</span>: <span class="hljs-literal">None</span>}

df = pl.DataFrame(
    {
        <span class="hljs-string">"Number"</span> : np.arange(<span class="hljs-number">1</span>, <span class="hljs-number">11</span>),
        <span class="hljs-string">"Natural Log"</span> : [np.log(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>,<span class="hljs-number">11</span>)],
        <span class="hljs-string">'Log Base 10'</span> : [np.log10(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>,<span class="hljs-number">11</span>)]
        },
    schema=schema
    )
pl.Config.set_tbl_hide_column_data_types(active=<span class="hljs-literal">True</span>)
print(df.glimpse())
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-plaintext">Rows: 10
Columns: 3
$ Number      &lt;u32&gt; 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
$ Natural Log &lt;f64&gt; 0.0, 0.6931471805599453, 1.0986122886681098, 1.3862943611198906, 1.6094379124341003, 1.791759469228055, 1.9459101490553132, 2.0794415416798357, 2.1972245773362196, 2.302585092994046
$ Log Base 10 &lt;f64&gt; 0.0, 0.3010299956639812, 0.47712125471966244, 0.6020599913279624, 0.6989700043360189, 0.7781512503836436, 0.8450980400142568, 0.9030899869919435, 0.9542425094393249, 1.0

None
</code></pre>
<p>Here, we used the <code>glimpse()</code> function on our previously created DataFrame <code>df</code>. We can see the output as our transposed DataFrame. Also, <code>None</code> is returned. This is because, by default, <code>glimpse()</code> sets its <code>return_as_string</code> parameter to <code>None</code>. To change it to string, we can set the <code>return_as_string</code> parameter to True. The following example shows how to do it:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> polars <span class="hljs-keyword">as</span> pl
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np

schema = {<span class="hljs-string">"Number"</span>: pl.UInt32, <span class="hljs-string">"Natural Log"</span>: <span class="hljs-literal">None</span>, <span class="hljs-string">"Log Base 10"</span>: <span class="hljs-literal">None</span>}

df = pl.DataFrame(
    {
        <span class="hljs-string">"Number"</span> : np.arange(<span class="hljs-number">1</span>, <span class="hljs-number">11</span>),
        <span class="hljs-string">"Natural Log"</span> : [np.log(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>,<span class="hljs-number">11</span>)],
        <span class="hljs-string">'Log Base 10'</span> : [np.log10(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>,<span class="hljs-number">11</span>)]
        },
    schema=schema
    )
pl.Config.set_tbl_hide_column_data_types(active=<span class="hljs-literal">True</span>)
print(<span class="hljs-string">f'Returned as String: \n<span class="hljs-subst">{df.glimpse(return_as_string=<span class="hljs-literal">True</span>)}</span>'</span>)
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-plaintext">Returned as String: 
Rows: 10
Columns: 3
$ Number      &lt;u32&gt; 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
$ Natural Log &lt;f64&gt; 0.0, 0.6931471805599453, 1.0986122886681098, 1.3862943611198906, 1.6094379124341003, 1.791759469228055, 1.9459101490553132, 2.0794415416798357, 2.1972245773362196, 2.302585092994046
$ Log Base 10 &lt;f64&gt; 0.0, 0.3010299956639812, 0.47712125471966244, 0.6020599913279624, 0.6989700043360189, 0.7781512503836436, 0.8450980400142568, 0.9030899869919435, 0.9542425094393249, 1.0
</code></pre>
<p>In the above code, we can see that the DataFrame is returned as a string and <code>None</code> is not returned.</p>
<p>Finally, the <code>tail()</code> function outputs the given number of rows (passed as the argument of the <code>tail()</code> function) from the bottom of the dataset. When no argument is passed, it outputs the last 5 rows by default.</p>
<p>This is useful for checking if our data was completely loaded. Checking the first few records using the <code>head()</code> function and the last few records with the <code>tail()</code> function ensures that the data is correctly and totally loaded.</p>
<p>Also, we can check if there are any empty records at the end of the dataset. Having empty records at the end of the dataset can be fatal in some cases. For example, if you have to train an ML model on a dataset and you split the dataset statically into testing and training datasets, the empty rows at the end are going to cause an issue. So, checking our data beforehand is a best practice, and these functions help us do it.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> polars <span class="hljs-keyword">as</span> pl
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np

schema = {<span class="hljs-string">"Number"</span>: pl.UInt32, <span class="hljs-string">"Natural Log"</span>: <span class="hljs-literal">None</span>, <span class="hljs-string">"Log Base 10"</span>: <span class="hljs-literal">None</span>}

df = pl.DataFrame(
    {
        <span class="hljs-string">"Number"</span> : np.arange(<span class="hljs-number">1</span>, <span class="hljs-number">11</span>),
        <span class="hljs-string">"Natural Log"</span> : [np.log(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>,<span class="hljs-number">11</span>)],
        <span class="hljs-string">'Log Base 10'</span> : [np.log10(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>,<span class="hljs-number">11</span>)]
        },
    schema=schema
    )
pl.Config.set_tbl_hide_column_data_types(active=<span class="hljs-literal">True</span>)
print(df.tail(<span class="hljs-number">3</span>))
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-plaintext">shape: (3, 3)
┌────────┬─────────────┬─────────────┐
│ Number ┆ Natural Log ┆ Log Base 10 │
╞════════╪═════════════╪═════════════╡
│ 8      ┆ 2.079442    ┆ 0.90309     │
│ 9      ┆ 2.197225    ┆ 0.954243    │
│ 10     ┆ 2.302585    ┆ 1.0         │
└────────┴─────────────┴─────────────┘
</code></pre>
<p>In the above code, we used the <code>tail()</code> function on the dataset (that we created earlier) and passed ‘3’ as our argument. Thus our program returned the last three rows of the dataset.</p>
<h3 id="heading-the-sample-function">The Sample Function</h3>
<p>The <code>sample()</code> function returns a given number of random rows in random order based on their occurrence in the DataFrame. This helps to avoid biased sampling of data.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> polars <span class="hljs-keyword">as</span> pl
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np

schema = {<span class="hljs-string">"Number"</span>: pl.UInt32, <span class="hljs-string">"Natural Log"</span>: <span class="hljs-literal">None</span>, <span class="hljs-string">"Log Base 10"</span>: <span class="hljs-literal">None</span>}

df = pl.DataFrame(
    {
        <span class="hljs-string">"Number"</span> : np.arange(<span class="hljs-number">1</span>, <span class="hljs-number">11</span>),
        <span class="hljs-string">"Natural Log"</span> : [np.log(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>,<span class="hljs-number">11</span>)],
        <span class="hljs-string">'Log Base 10'</span> : [np.log10(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>,<span class="hljs-number">11</span>)]
        },
    schema=schema
    )
pl.Config.set_tbl_hide_column_data_types(active=<span class="hljs-literal">True</span>)
print(df.sample(<span class="hljs-number">3</span>))
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-plaintext">shape: (3, 3)
┌────────┬─────────────┬─────────────┐
│ Number ┆ Natural Log ┆ Log Base 10 │
╞════════╪═════════════╪═════════════╡
│ 6      ┆ 1.791759    ┆ 0.778151    │
│ 5      ┆ 1.609438    ┆ 0.69897     │
│ 10     ┆ 2.302585    ┆ 1.0         │
└────────┴─────────────┴─────────────┘
</code></pre>
<p>We can see in the output that we got random rows of the data in a random order of their occurrence in the dataset (row 5 comes before row 6 in the DataFrame, yet by sampling we got row 5 after row 6.) Sampling is a good practice as it helps avoid overfitting in ML in some cases and gives us a general idea about the entire dataset.</p>
<h3 id="heading-concatenating-two-dataframes">Concatenating Two DataFrames</h3>
<p>In a nutshell, ‘concatenating’ simply means ‘linking’. Adding or linking one dataset to another – basically, stacking one on top of another – is concatenating the two datasets.</p>
<p>For example, in the previous DataFrame, we had numbers from 1 to 10 and their logarithms. Now, if we want to make it 1 to 20, we have to concatenate a different dataset containing numbers 11 to 20 to the former dataset.</p>
<p>The following code shows how this works:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> polars <span class="hljs-keyword">as</span> pl
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np

schema = {<span class="hljs-string">"Number"</span>: pl.UInt32, <span class="hljs-string">"Natural Log"</span>: <span class="hljs-literal">None</span>, <span class="hljs-string">"Log Base 10"</span>: <span class="hljs-literal">None</span>}

df = pl.DataFrame(
    {
        <span class="hljs-string">"Number"</span> : np.arange(<span class="hljs-number">1</span>, <span class="hljs-number">11</span>),
        <span class="hljs-string">"Natural Log"</span> : [np.log(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>,<span class="hljs-number">11</span>)],
        <span class="hljs-string">'Log Base 10'</span> : [np.log10(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>,<span class="hljs-number">11</span>)]
        },
    schema=schema
    )
pl.Config.set_tbl_hide_column_data_types(active=<span class="hljs-literal">True</span>)

<span class="hljs-comment"># new dataset created for concatenation</span>
df1 = pl.DataFrame({
    <span class="hljs-string">"Number"</span> : [x <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">11</span>, <span class="hljs-number">21</span>)],
    <span class="hljs-string">"Log Base 10"</span> : [np.log10(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">11</span>,<span class="hljs-number">21</span>)],
    <span class="hljs-string">"Natural Log"</span> : [np.log(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">11</span>, <span class="hljs-number">21</span>)]
}, schema=schema)

print(pl.concat([df, df1], how=<span class="hljs-string">'vertical'</span>)) <span class="hljs-comment"># concatenating the two datasets</span>
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-plaintext">shape: (20, 3)
┌────────┬─────────────┬─────────────┐
│ Number ┆ Natural Log ┆ Log Base 10 │
╞════════╪═════════════╪═════════════╡
│ 1      ┆ 0.0         ┆ 0.0         │
│ 2      ┆ 0.693147    ┆ 0.30103     │
│ 3      ┆ 1.098612    ┆ 0.477121    │
│ 4      ┆ 1.386294    ┆ 0.60206     │
│ 5      ┆ 1.609438    ┆ 0.69897     │
│ …      ┆ …           ┆ …           │
│ 16     ┆ 2.772589    ┆ 1.20412     │
│ 17     ┆ 2.833213    ┆ 1.230449    │
│ 18     ┆ 2.890372    ┆ 1.255273    │
│ 19     ┆ 2.944439    ┆ 1.278754    │
│ 20     ┆ 2.995732    ┆ 1.30103     │
└────────┴─────────────┴─────────────┘
</code></pre>
<p>In this code, we first created the DataFrame <code>df</code>. Then we created another DataFrame <code>df1</code>. Next, we used <code>pl.concat()</code> to concatenate the DataFrames.</p>
<p>The first argument that we passed is the list of the DataFrames that are to be linked. The <code>how</code> parameter defines the manner of concatenation. ‘Vertical’ in this context means that we are linking DataFrames vertically (adding more rows).</p>
<p>The important thing to note here is that schema incompatibility may raise an exception. If the DataFrames that are to be concatenated have different schemas, there will be a schema incompatibility problem. So it’s better to keep the schemas of both the datasets (that are to be concatenated) the same.</p>
<p>Here, we introduced a variable named <code>schema</code> containing the schema parameter of the DataFrame and we applied it to both the DataFrames to avoid schema incompatibility.</p>
<p>Also, concatenation occurs in the order of the passed arguments. For example, in the above code, <code>df</code> appears prior to <code>df1</code>, thus in the linked DataFrame, <code>df</code> appears first and then <code>df1</code>. If we had changed the sequence of values, the concatenated DataFrame would start from <code>df1</code> and then <code>df</code>.</p>
<p>The following code explains that:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> polars <span class="hljs-keyword">as</span> pl
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np

schema = {<span class="hljs-string">"Number"</span>: pl.UInt32, <span class="hljs-string">"Natural Log"</span>: <span class="hljs-literal">None</span>, <span class="hljs-string">"Log Base 10"</span>: <span class="hljs-literal">None</span>}

df = pl.DataFrame(
    {
        <span class="hljs-string">"Number"</span> : np.arange(<span class="hljs-number">1</span>, <span class="hljs-number">11</span>),
        <span class="hljs-string">"Natural Log"</span> : [np.log(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>,<span class="hljs-number">11</span>)],
        <span class="hljs-string">'Log Base 10'</span> : [np.log10(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>,<span class="hljs-number">11</span>)]
        },
    schema=schema
    )
pl.Config.set_tbl_hide_column_data_types(active=<span class="hljs-literal">True</span>)

<span class="hljs-comment"># new dataset created for concatenation</span>
df1 = pl.DataFrame({
    <span class="hljs-string">"Number"</span> : [x <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">11</span>, <span class="hljs-number">21</span>)],
    <span class="hljs-string">"Log Base 10"</span> : [np.log10(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">11</span>,<span class="hljs-number">21</span>)],
    <span class="hljs-string">"Natural Log"</span> : [np.log(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">11</span>, <span class="hljs-number">21</span>)]
}, schema=schema)

print(pl.concat([df1, df], how=<span class="hljs-string">'vertical'</span>)) <span class="hljs-comment"># sequence changed from [df,df1] to [df1, df]</span>
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-plaintext">shape: (20, 3)
┌────────┬─────────────┬─────────────┐
│ Number ┆ Natural Log ┆ Log Base 10 │
╞════════╪═════════════╪═════════════╡
│ 11     ┆ 2.397895    ┆ 1.041393    │
│ 12     ┆ 2.484907    ┆ 1.079181    │
│ 13     ┆ 2.564949    ┆ 1.113943    │
│ 14     ┆ 2.639057    ┆ 1.146128    │
│ 15     ┆ 2.70805     ┆ 1.176091    │
│ …      ┆ …           ┆ …           │
│ 6      ┆ 1.791759    ┆ 0.778151    │
│ 7      ┆ 1.94591     ┆ 0.845098    │
│ 8      ┆ 2.079442    ┆ 0.90309     │
│ 9      ┆ 2.197225    ┆ 0.954243    │
│ 10     ┆ 2.302585    ┆ 1.0         │
└────────┴─────────────┴─────────────┘
</code></pre>
<p>Here, we can see that the <code>df1</code> appears first and then <code>df</code> appears (unlike the previous example). Thus, the sequence of the values matters.</p>
<h3 id="heading-how-to-join-two-dataframes">How to Join Two DataFrames</h3>
<p><strong>Joining</strong> datasets and <strong>concatenating</strong> datasets are two different concepts. While concatenating means ‘linking’ two separate datasets, <a target="_blank" href="https://www.freecodecamp.org/news/understanding-sql-joins/">joining</a> refers to combining datasets based on a shared column (a key).<br>The computer matches rows from both datasets where the key values are the same.</p>
<p>In the above dataset ‘df’, we’ll add a new column by joining the dataset ‘df’ with another DataFrame.</p>
<pre><code class="lang-python"><span class="hljs-comment"># new dataframe</span>
new_col = pl.DataFrame({
    <span class="hljs-string">"Number"</span> : [x <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>, <span class="hljs-number">11</span>)],
    <span class="hljs-string">"Log Base 2"</span> : [np.log2(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>, <span class="hljs-number">11</span>)]
})

new_data = df.join(new_col, on=<span class="hljs-string">"Number"</span>, how=<span class="hljs-string">"left"</span>) <span class="hljs-comment"># Both have one column same to map values</span>

print(new_data.head())
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-plaintext">shape: (5, 4)
┌────────┬─────────────┬─────────────┬────────────┐
│ Number ┆ Natural Log ┆ Log Base 10 ┆ Log Base 2 │
╞════════╪═════════════╪═════════════╪════════════╡
│ 1      ┆ 0.0         ┆ 0.0         ┆ 0.0        │
│ 2      ┆ 0.693147    ┆ 0.30103     ┆ 1.0        │
│ 3      ┆ 1.098612    ┆ 0.477121    ┆ 1.584963   │
│ 4      ┆ 1.386294    ┆ 0.60206     ┆ 2.0        │
│ 5      ┆ 1.609438    ┆ 0.69897     ┆ 2.321928   │
└────────┴─────────────┴─────────────┴────────────┘
</code></pre>
<p>In this example, we used the join function on <code>df</code> and passed <code>new_col</code> as its argument. This is why the columns of the <code>df</code> function occur prior to the column of the <code>new_col</code> dataset. The parameter <code>on</code> should be given a column name on the basis of which the two datasets are to be joined.</p>
<p>Here, we first mapped the elements of the column <code>Number</code> and its corresponding rows and joined the DataFrames accordingly.</p>
<p>If we used the <code>join()</code> function on the <code>new_col</code> DataFrame, the columns of <code>df</code> would appear later than the column in <code>new_col</code>. The following code will make it clear:</p>
<pre><code class="lang-python"><span class="hljs-comment"># new dataframe</span>
new_col = pl.DataFrame({
    <span class="hljs-string">"Number"</span> : [x <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>, <span class="hljs-number">11</span>)],
    <span class="hljs-string">"Log Base 2"</span> : [np.log2(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>, <span class="hljs-number">11</span>)]
})

new_data = new_col.join(df, on=<span class="hljs-string">"Number"</span>, how=<span class="hljs-string">"left"</span>) <span class="hljs-comment"># passed df as argument</span>

print(new_data.head())
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-plaintext">shape: (5, 4)
┌────────┬────────────┬─────────────┬─────────────┐
│ Number ┆ Log Base 2 ┆ Natural Log ┆ Log Base 10 │
╞════════╪════════════╪═════════════╪═════════════╡
│ 1      ┆ 0.0        ┆ 0.0         ┆ 0.0         │
│ 2      ┆ 1.0        ┆ 0.693147    ┆ 0.30103     │
│ 3      ┆ 1.584963   ┆ 1.098612    ┆ 0.477121    │
│ 4      ┆ 2.0        ┆ 1.386294    ┆ 0.60206     │
│ 5      ┆ 2.321928   ┆ 1.609438    ┆ 0.69897     │
└────────┴────────────┴─────────────┴─────────────┘
</code></pre>
<p>You can notice that the column ‘Log Base 2’ appears prior to other columns (unlike in the previous example). Thus this change is significant.</p>
<h3 id="heading-how-to-use-the-withcolumns-function">How to Use the <code>with_columns()</code> Function</h3>
<p>The <code>with_columns()</code> function enables us to make changes to the column and print it as a new column with existing columns from the original dataset. This is similar to the <code>join()</code> function.</p>
<p>The following example will make it clear:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> polars <span class="hljs-keyword">as</span> pl
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np

df = pl.DataFrame(
    {
        <span class="hljs-string">"Number"</span> : np.arange(<span class="hljs-number">1</span>, <span class="hljs-number">11</span>),
        <span class="hljs-string">"Natural Log"</span> : [np.log(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>,<span class="hljs-number">11</span>)],
        <span class="hljs-string">'Log Base 10'</span> : [np.log10(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>,<span class="hljs-number">11</span>)]
        },
    schema=schema
    )
new_data = df.with_columns((np.log2(pl.col(<span class="hljs-string">"Number"</span>))).alias(<span class="hljs-string">"Log Base 2"</span>))

print(new_data.head())
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-plaintext">shape: (5, 4)
┌────────┬─────────────┬─────────────┬────────────┐
│ Number ┆ Natural Log ┆ Log Base 10 ┆ Log Base 2 │
╞════════╪═════════════╪═════════════╪════════════╡
│ 1      ┆ 0.0         ┆ 0.0         ┆ 0.0        │
│ 2      ┆ 0.693147    ┆ 0.30103     ┆ 1.0        │
│ 3      ┆ 1.098612    ┆ 0.477121    ┆ 1.584963   │
│ 4      ┆ 1.386294    ┆ 0.60206     ┆ 2.0        │
│ 5      ┆ 1.609438    ┆ 0.69897     ┆ 2.321928   │
└────────┴─────────────┴─────────────┴────────────┘
</code></pre>
<p>In this example, we have a DataFrame <code>df</code>. To add a column to it , we use the <code>with_columns()</code> function. In this function, we selected column named ‘Number’ using the <code>pl.col()</code> function and put it inside the <code>np.log2()</code> to get the log base 2 value for every record. Finally, to label the new column, we used the <code>alias()</code> function, with the label passed to it as an argument.</p>
<p>Now that we know about the basics of DataFrames, let’s look at how we can work with CSV files.</p>
<h2 id="heading-how-to-read-csv-files-with-polars">How to Read CSV Files with Polars</h2>
<p>Reading CSV files with Polars is extremely similar to how it works in Pandas. For this tutorial, I’ll be using the Titanic Dataset. Here’s the <a target="_blank" href="https://www.kaggle.com/datasets/yasserh/titanic-dataset?select=Titanic-Dataset.csv">link to the dataset</a> so you can download it. In this part of the tutorial, we’ll be mainly talking about column selection (useful in feature selection) and filtering the data.</p>
<p>Here’s the syntax for reading a CSV file:</p>
<p><code>var_name = pl.read_csv(“path_dataset“)</code></p>
<p>Example code:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> polars <span class="hljs-keyword">as</span> pl

data = pl.read_csv(<span class="hljs-string">"/titanic_dataset.csv"</span>)
print(data.head())
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-plaintext">shape: (5, 12)
┌─────────────┬──────────┬────────┬─────────────────────┬───┬─────────┬─────────┬───────┬──────────┐
│ PassengerId ┆ Survived ┆ Pclass ┆ Name                ┆ … ┆ Ticket  ┆ Fare    ┆ Cabin ┆ Embarked │
╞═════════════╪══════════╪════════╪═════════════════════╪═══╪═════════╪═════════╪═══════╪══════════╡
│ 892         ┆ 0        ┆ 3      ┆ Kelly, Mr. James    ┆ … ┆ 330911  ┆ 7.8292  ┆ null  ┆ Q        │
│ 893         ┆ 1        ┆ 3      ┆ Wilkes, Mrs. James  ┆ … ┆ 363272  ┆ 7.0     ┆ null  ┆ S        │
│             ┆          ┆        ┆ (Ellen Need…        ┆   ┆         ┆         ┆       ┆          │
│ 894         ┆ 0        ┆ 2      ┆ Myles, Mr. Thomas   ┆ … ┆ 240276  ┆ 9.6875  ┆ null  ┆ Q        │
│             ┆          ┆        ┆ Francis             ┆   ┆         ┆         ┆       ┆          │
│ 895         ┆ 0        ┆ 3      ┆ Wirz, Mr. Albert    ┆ … ┆ 315154  ┆ 8.6625  ┆ null  ┆ S        │
│ 896         ┆ 1        ┆ 3      ┆ Hirvonen, Mrs.      ┆ … ┆ 3101298 ┆ 12.2875 ┆ null  ┆ S        │
│             ┆          ┆        ┆ Alexander (Helg…    ┆   ┆         ┆         ┆       ┆          │
└─────────────┴──────────┴────────┴─────────────────────┴───┴─────────┴─────────┴───────┴──────────┘
</code></pre>
<p>We can get the statistical analysis of the data by using the <code>describe()</code> function.</p>
<pre><code class="lang-python">print(data.describe())
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-plaintext">shape: (9, 13)
┌────────────┬─────────────┬──────────┬──────────┬───┬─────────────┬───────────┬───────┬──────────┐
│ statistic  ┆ PassengerId ┆ Survived ┆ Pclass   ┆ … ┆ Ticket      ┆ Fare      ┆ Cabin ┆ Embarked │
╞════════════╪═════════════╪══════════╪══════════╪═══╪═════════════╪═══════════╪═══════╪══════════╡
│ count      ┆ 418.0       ┆ 418.0    ┆ 418.0    ┆ … ┆ 418         ┆ 417.0     ┆ 91    ┆ 418      │
│ null_count ┆ 0.0         ┆ 0.0      ┆ 0.0      ┆ … ┆ 0           ┆ 1.0       ┆ 327   ┆ 0        │
│ mean       ┆ 1100.5      ┆ 0.363636 ┆ 2.26555  ┆ … ┆ null        ┆ 35.627188 ┆ null  ┆ null     │
│ std        ┆ 120.810458  ┆ 0.481622 ┆ 0.841838 ┆ … ┆ null        ┆ 55.907576 ┆ null  ┆ null     │
│ min        ┆ 892.0       ┆ 0.0      ┆ 1.0      ┆ … ┆ 110469      ┆ 0.0       ┆ A11   ┆ C        │
│ 25%        ┆ 996.0       ┆ 0.0      ┆ 1.0      ┆ … ┆ null        ┆ 7.8958    ┆ null  ┆ null     │
│ 50%        ┆ 1101.0      ┆ 0.0      ┆ 3.0      ┆ … ┆ null        ┆ 14.4542   ┆ null  ┆ null     │
│ 75%        ┆ 1205.0      ┆ 1.0      ┆ 3.0      ┆ … ┆ null        ┆ 31.5      ┆ null  ┆ null     │
│ max        ┆ 1309.0      ┆ 1.0      ┆ 3.0      ┆ … ┆ W.E.P. 5734 ┆ 512.3292  ┆ G6    ┆ S        │
└────────────┴─────────────┴──────────┴──────────┴───┴─────────────┴───────────┴───────┴──────────┘
</code></pre>
<h3 id="heading-how-to-select-columns-from-the-dataset">How to Select Columns from the Dataset</h3>
<p>Now we’re going to learn how to select certain columns from the dataset and transform those columns into a new DataFrame. This can be useful if we want to train an ML model based on only certain columns and not the entire dataset (that is, using feature selection).</p>
<p>Let’s first look at the code below:</p>
<pre><code class="lang-python">new_df = data.select(
    pl.col(<span class="hljs-string">"Survived"</span>),
    pl.col(<span class="hljs-string">"Name"</span>),
    pl.col(<span class="hljs-string">"Age"</span>),
    pl.col(<span class="hljs-string">"Sex"</span>)
)

print(new_df.head())
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-plaintext">shape: (5, 4)
┌──────────┬─────────────────────────────────┬──────┬────────┐
│ Survived ┆ Name                            ┆ Age  ┆ Sex    │
╞══════════╪═════════════════════════════════╪══════╪════════╡
│ 0        ┆ Kelly, Mr. James                ┆ 34.5 ┆ male   │
│ 1        ┆ Wilkes, Mrs. James (Ellen Need… ┆ 47.0 ┆ female │
│ 0        ┆ Myles, Mr. Thomas Francis       ┆ 62.0 ┆ male   │
│ 0        ┆ Wirz, Mr. Albert                ┆ 27.0 ┆ male   │
│ 1        ┆ Hirvonen, Mrs. Alexander (Helg… ┆ 22.0 ┆ female │
└──────────┴─────────────────────────────────┴──────┴────────┘
</code></pre>
<p>In the code above, we selected four columns using the <code>select()</code> and <code>pl.col()</code> functions from the Titanic Dataset and transformed them into a new DataFrame called <code>new_df</code>.</p>
<p>Now, we can filter this data however we want. Let’s make a new DataFrame by filtering out only surviving passengers from the dataset:</p>
<pre><code class="lang-python">survived_data = data.select(
    pl.col(<span class="hljs-string">"Survived"</span>),
    pl.col(<span class="hljs-string">"Name"</span>),
    pl.col(<span class="hljs-string">"Age"</span>),
    pl.col(<span class="hljs-string">"Sex"</span>)
).filter(pl.col(<span class="hljs-string">"Survived"</span>)==<span class="hljs-number">1</span>)

print(survived_data.head())
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-plaintext">shape: (5, 4)
┌──────────┬─────────────────────────────────┬──────┬────────┐
│ Survived ┆ Name                            ┆ Age  ┆ Sex    │
╞══════════╪═════════════════════════════════╪══════╪════════╡
│ 1        ┆ Wilkes, Mrs. James (Ellen Need… ┆ 47.0 ┆ female │
│ 1        ┆ Hirvonen, Mrs. Alexander (Helg… ┆ 22.0 ┆ female │
│ 1        ┆ Connolly, Miss. Kate            ┆ 30.0 ┆ female │
│ 1        ┆ Abrahim, Mrs. Joseph (Sophie H… ┆ 18.0 ┆ female │
│ 1        ┆ Snyder, Mrs. John Pillsbury (N… ┆ 23.0 ┆ female │
└──────────┴─────────────────────────────────┴──────┴────────┘
</code></pre>
<p>In the above code, we used the <code>filter()</code> function. This function helps us gather data that applies to our given condition. In the above example, we added the condition that, “Every element in the column named ‘Survived’ should be equal to 1”. Hence, we got our required data.</p>
<h2 id="heading-some-other-important-functions">Some Other Important Functions</h2>
<h3 id="heading-how-to-print-the-names-of-the-columns-of-a-dataset">How to Print the Names of the Columns of a Dataset</h3>
<p>You can print the names of a column using the <code>columns</code> method. The following code shows how to use the columns method:</p>
<pre><code class="lang-python">print(data.columns) <span class="hljs-comment"># data --&gt; Titanic Dataset</span>
</code></pre>
<p><strong>Output:</strong></p>
<blockquote>
<p>['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp', 'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked']</p>
</blockquote>
<h3 id="heading-how-to-index-a-dataset">How to Index a Dataset</h3>
<p>Indexing a dataset means adding an index column to the existing dataset. It can prove useful in keeping track of the rows of the dataset.</p>
<p>We can index the dataset using the <code>with_row_index()</code> function. Inside this function, we can pass the argument to name this new index column. If we don’t pass any argument, the index column name is set as ‘index’ by default.</p>
<pre><code class="lang-python">data = pl.read_csv(<span class="hljs-string">"/titanic_dataset.csv"</span>).with_row_index(<span class="hljs-string">'#'</span>) <span class="hljs-comment"># naming the index column as '#'</span>
print(data.head())
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-plaintext">shape: (5, 13)
┌─────┬─────────────┬──────────┬────────┬───┬─────────┬─────────┬───────┬──────────┐
│ #   ┆ PassengerId ┆ Survived ┆ Pclass ┆ … ┆ Ticket  ┆ Fare    ┆ Cabin ┆ Embarked │
│ --- ┆ ---         ┆ ---      ┆ ---    ┆   ┆ ---     ┆ ---     ┆ ---   ┆ ---      │
│ u32 ┆ i64         ┆ i64      ┆ i64    ┆   ┆ str     ┆ f64     ┆ str   ┆ str      │
╞═════╪═════════════╪══════════╪════════╪═══╪═════════╪═════════╪═══════╪══════════╡
│ 0   ┆ 892         ┆ 0        ┆ 3      ┆ … ┆ 330911  ┆ 7.8292  ┆ null  ┆ Q        │
│ 1   ┆ 893         ┆ 1        ┆ 3      ┆ … ┆ 363272  ┆ 7.0     ┆ null  ┆ S        │
│ 2   ┆ 894         ┆ 0        ┆ 2      ┆ … ┆ 240276  ┆ 9.6875  ┆ null  ┆ Q        │
│ 3   ┆ 895         ┆ 0        ┆ 3      ┆ … ┆ 315154  ┆ 8.6625  ┆ null  ┆ S        │
│ 4   ┆ 896         ┆ 1        ┆ 3      ┆ … ┆ 3101298 ┆ 12.2875 ┆ null  ┆ S        │
└─────┴─────────────┴──────────┴────────┴───┴─────────┴─────────┴───────┴──────────┘
</code></pre>
<h3 id="heading-how-to-rename-columns-in-the-dataset">How to Rename Columns in the Dataset</h3>
<p>Lastly, to rename columns in the Dataset, we use the <code>rename()</code> function.</p>
<pre><code class="lang-python">data = pl.read_csv(<span class="hljs-string">"/titanic_dataset.csv"</span>).with_row_index(<span class="hljs-string">'#'</span>).rename({<span class="hljs-string">'PassengerId'</span>:<span class="hljs-string">'renamed_col'</span>})
print(data.head())
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-plaintext">shape: (5, 13)
┌─────┬─────────────┬──────────┬────────┬───┬─────────┬─────────┬───────┬──────────┐
│ #   ┆ renamed_col ┆ Survived ┆ Pclass ┆ … ┆ Ticket  ┆ Fare    ┆ Cabin ┆ Embarked │
│ --- ┆ ---         ┆ ---      ┆ ---    ┆   ┆ ---     ┆ ---     ┆ ---   ┆ ---      │
│ u32 ┆ i64         ┆ i64      ┆ i64    ┆   ┆ str     ┆ f64     ┆ str   ┆ str      │
╞═════╪═════════════╪══════════╪════════╪═══╪═════════╪═════════╪═══════╪══════════╡
│ 0   ┆ 892         ┆ 0        ┆ 3      ┆ … ┆ 330911  ┆ 7.8292  ┆ null  ┆ Q        │
│ 1   ┆ 893         ┆ 1        ┆ 3      ┆ … ┆ 363272  ┆ 7.0     ┆ null  ┆ S        │
│ 2   ┆ 894         ┆ 0        ┆ 2      ┆ … ┆ 240276  ┆ 9.6875  ┆ null  ┆ Q        │
│ 3   ┆ 895         ┆ 0        ┆ 3      ┆ … ┆ 315154  ┆ 8.6625  ┆ null  ┆ S        │
│ 4   ┆ 896         ┆ 1        ┆ 3      ┆ … ┆ 3101298 ┆ 12.2875 ┆ null  ┆ S        │
└─────┴─────────────┴──────────┴────────┴───┴─────────┴─────────┴───────┴──────────┘
</code></pre>
<p>In the above example, we renamed the column named ‘PassengerId’ to ‘renamed_col’.</p>
<h2 id="heading-summary">Summary</h2>
<p>Now you know how to work with the Polars Python library to analyze your data more effectively.</p>
<p>In this article, you learned:</p>
<ul>
<li><p>What Polars is and how to install it</p>
</li>
<li><p>How to define series and DataFrames in Polars</p>
</li>
<li><p>Different functions to deal with DataFrames.</p>
</li>
<li><p>How to read and work with CSV files in Polars</p>
</li>
</ul>
<p>Thanks for Reading, and happy data wrangling!</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Transform JSON Data to Match Any Schema ]]>
                </title>
                <description>
                    <![CDATA[ Whether you’re transferring data between APIs or just preparing JSON data for import, mismatched schemas can break your workflow.  Learning how to clean and normalize JSON data ensures a smooth, error-free data transfer. This tutorial demonstrates ho... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/transform-json-data-schema/</link>
                <guid isPermaLink="false">686f40595293ca3e659585b7</guid>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                    <category>
                        <![CDATA[ pandas ]]>
                    </category>
                
                    <category>
                        <![CDATA[ json ]]>
                    </category>
                
                    <category>
                        <![CDATA[ json-schema ]]>
                    </category>
                
                    <category>
                        <![CDATA[ python beginner ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Nneoma Uche ]]>
                </dc:creator>
                <pubDate>Thu, 10 Jul 2025 04:23:53 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1752121420492/513db316-cdc7-47ef-8f20-4911cf5d41f9.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Whether you’re transferring data between APIs or just preparing JSON data for import, mismatched schemas can break your workflow.  Learning how to clean and normalize JSON data ensures a smooth, error-free data transfer.</p>
<p>This tutorial demonstrates how to clean messy JSON and export the results into a new file, based on a predefined schema. The JSON file we’ll be cleaning contains a dataset of 200 synthetic customer records.</p>
<p>In this tutorial, we’ll apply two methods for cleaning the input data:</p>
<ul>
<li><p>With pure Python</p>
</li>
<li><p>With <code>pandas</code></p>
</li>
</ul>
<p>You can apply either of these in your code. But the <code>pandas</code> method is better for large, complex data sets. Let’s jump right into the process.</p>
<h3 id="heading-heres-what-well-cover">Here’s what we’ll cover:</h3>
<ul>
<li><p><a class="post-section-overview" href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-add-and-inspect-the-json-file">Add and Inspect the JSON File</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-define-the-target-schema">Define the Target Schema</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-clean-json-data-with-pure-python">How to Clean JSON Data with Pure Python</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-clean-json-data-with-pandas">How to Clean JSON Data with Pandas</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-validate-the-cleaned-json">How to Validate the Cleaned JSON</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-pandas-vs-pure-python-for-data-cleaning">Pandas vs Pure Python for Data Cleaning</a></p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>To follow along with this tutorial, you should have a basic understanding of:</p>
<ul>
<li><p>Python dictionaries, lists, and loops</p>
</li>
<li><p>JSON data structure (keys, values, and nesting)</p>
</li>
<li><p>How to read and write JSON files with Python’s <code>json</code> module</p>
</li>
</ul>
<h2 id="heading-add-and-inspect-the-json-file">Add and Inspect the JSON File</h2>
<p>Before you begin writing any code, make sure that the <strong>.json</strong> file you intend to clean is in your project directory. This makes it easy to load in your script using the file name alone.</p>
<p>You can now inspect the data structure by viewing the file locally or loading it in your script, with Python’s built-in <code>json</code> module.</p>
<p>Here’s how (assuming the file name is <strong>“old_customers.json”</strong>):</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1752079424973/3cd77410-6fa9-483d-9a73-edbe4c035327.jpeg" alt="Code to view or print contents of the raw JSON file in terminal" class="image--center mx-auto" width="407" height="231" loading="lazy"></p>
<p>This shows you whether the JSON file is structured as a dictionary or a list. It also prints out the entire file in your terminal. Mine is a dictionary that maps to a list of 200 customer entries. You should always open up the raw JSON file in your IDE to get a closer look at its structure and schema.</p>
<h2 id="heading-define-the-target-schema">Define the Target Schema</h2>
<p>If someone asks for JSON data to be cleaned, it probably means that the <a target="_blank" href="https://json-schema.org/understanding-json-schema/about">current schema</a> is unsuitable for its intended purpose. At this point, you want to be clear on what the final JSON export should look like.</p>
<p>JSON schema is essentially a blueprint that describes:</p>
<ul>
<li><p>required fields</p>
</li>
<li><p>field names</p>
</li>
<li><p>data type for each field</p>
</li>
<li><p>standardized formats (for example, lowercase emails, trimmed whitespace, etc.)</p>
</li>
</ul>
<p>Here’s what the old schema versus the target schema looks like:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1751956173106/d5957404-57ae-4de9-b61b-90eefa0b9260.jpeg" alt="A screenshot of the old JSON Schema to be transformed" class="image--center mx-auto" width="597" height="222" loading="lazy"></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1751956365336/dcf6a024-1ae6-4c95-92ae-5544ba4cbb3e.jpeg" alt="The expected JSON Schema" class="image--center mx-auto" width="460" height="186" loading="lazy"></p>
<p>As you can see, the goal is to delete the <code>”customer_id”</code> and <code>”address”</code> fields in each entry and rename the rest from:</p>
<ul>
<li><p><code>”name”</code> to <code>”full_name”</code></p>
</li>
<li><p><code>”email”</code> to <code>”email_address”</code></p>
</li>
<li><p><code>”phone”</code> to <code>”mobile”</code></p>
</li>
<li><p><code>”membership_level”</code> to <code>”tier”</code></p>
</li>
</ul>
<p>The output should contain 4 response fields instead of 6, all renamed to fit the project requirements.</p>
<h2 id="heading-how-to-clean-json-data-with-pure-python">How to Clean JSON Data with Pure Python</h2>
<p>Let’s explore using Python’s built-in <code>json</code> module to align the raw data with the predefined schema.</p>
<h3 id="heading-step-1-import-json-and-time-modules">Step 1: Import <code>json</code> and <code>time</code> modules</h3>
<p>Importing <code>json</code> is necessary because we’re working with JSON files. But we’ll use the <code>time</code> module to track how long the data cleaning process takes.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> json
<span class="hljs-keyword">import</span> time
</code></pre>
<h3 id="heading-step-2-load-the-file-with-jsonload">Step 2: Load the file with <code>json.load()</code></h3>
<pre><code class="lang-python">start_time = time.time()
<span class="hljs-keyword">with</span> open(<span class="hljs-string">'old_customers.json'</span>) <span class="hljs-keyword">as</span> file:
    crm_data = json.load(file)
</code></pre>
<h3 id="heading-step-3-write-a-function-to-loop-through-and-clean-each-customer-entry-in-the-dictionary">Step 3: Write a function to loop through and clean each customer entry in the dictionary</h3>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">clean_data</span>(<span class="hljs-params">records</span>):</span>
    transformed_records = []
    <span class="hljs-keyword">for</span> customer <span class="hljs-keyword">in</span> records[<span class="hljs-string">"customers"</span>]:
        transformed_records.append({
                <span class="hljs-string">"full_name"</span>: customer[<span class="hljs-string">"name"</span>],
                <span class="hljs-string">"email_address"</span>: customer[<span class="hljs-string">"email"</span>],
                <span class="hljs-string">"mobile"</span>: customer[<span class="hljs-string">"phone"</span>],
                <span class="hljs-string">"tier"</span>: customer[<span class="hljs-string">"membership_level"</span>],

                })
    <span class="hljs-keyword">return</span> {<span class="hljs-string">"customers"</span>: transformed_records}

new_data = clean_data(crm_data)
</code></pre>
<p><code>clean_data()</code> takes in the original data (<strong>temporarily</strong>) stored in the records variable, transforming it to match our target schema.</p>
<p>Since the JSON file we loaded is a dictionary containing a <code>”customers”</code> key, which maps to a list of customer entries, we access this key and loop through each entry in the list.</p>
<p>In the for loop, we rename the relevant fields and store the cleaned entries in a new list called <code>”transformed_records”</code>.</p>
<p>Then, we return the dictionary, with the <code>”customers”</code> key intact.</p>
<h3 id="heading-step-4-save-the-output-in-a-json-file">Step 4: Save the output in a .json file</h3>
<p>Decide on a name for your cleaned JSON data and assign that to an <code>output_file</code> variable, like so:</p>
<pre><code class="lang-python">output_file = <span class="hljs-string">"transformed_data.json"</span>
<span class="hljs-keyword">with</span> open(output_file, <span class="hljs-string">"w"</span>) <span class="hljs-keyword">as</span> f:
    json.dump(new_data, f, indent=<span class="hljs-number">4</span>)
</code></pre>
<p>You can also add a <code>print()</code> statement below this block to confirm that the file has been saved in your project directory.</p>
<h3 id="heading-step-5-time-the-data-cleaning-process">Step 5: Time the data cleaning process</h3>
<p>At the beginning of this process, we imported the time module to measure how long it takes to clean up JSON data using pure Python. To track the runtime, we stored the current time in a <code>start_time</code> variable before the cleaning function, and we’ll now include an <code>end_time</code> variable at the end of the script.</p>
<p>The difference between the <code>end_time</code> and <code>start_time</code> values gives you the total runtime in seconds.</p>
<pre><code class="lang-python">end_time = time.time()
elapsed_time = end_time - start_time

print(<span class="hljs-string">f"Transformed data saved to <span class="hljs-subst">{output_file}</span>"</span>)
print(<span class="hljs-string">f"Processing data took <span class="hljs-subst">{elapsed_time:<span class="hljs-number">.2</span>f}</span> seconds"</span>)
</code></pre>
<p>Here’s how long the data cleaning process took with the pure Python approach:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1751957367537/4a33fc16-7158-427e-b715-bec10a586857.jpeg" alt="Script runtime displayed in terminal" class="image--center mx-auto" width="766" height="88" loading="lazy"></p>
<h2 id="heading-how-to-clean-json-data-with-pandas">How to Clean JSON Data with Pandas</h2>
<p>Now we’re going to try achieving the same results as above, using Python and a third-party library called <code>pandas</code>. Pandas is an open-source library used for data manipulation and analysis in Python.</p>
<p>To get started, you need to have the Pandas library installed in your directory. In your terminal, run:</p>
<pre><code class="lang-python">pip install pandas
</code></pre>
<p>Then follow these steps:</p>
<h3 id="heading-step-1-import-the-relevant-libraries">Step 1: Import the relevant libraries</h3>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> json
<span class="hljs-keyword">import</span> time
<span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd
</code></pre>
<h3 id="heading-step-2-load-file-and-extract-customer-entries">Step 2: Load file and extract customer entries</h3>
<p>Unlike the pure Python method, where we simply indexed the key name <code>”customers”</code> to access the list of customer data, working with <code>pandas</code> requires a slightly different approach.</p>
<p>We must extract the list before loading it into a DataFrame because <code>pandas</code> expects structured data. Extracting the list of customer dictionaries upfront ensures that we isolate and clean the relevant records alone, preventing errors caused by nested or unrelated JSON data.</p>
<pre><code class="lang-python">start_time = time.time()
<span class="hljs-keyword">with</span> open(<span class="hljs-string">'old_customers.json'</span>, <span class="hljs-string">'r'</span>) <span class="hljs-keyword">as</span> f:
    crm_data = json.load(f)

<span class="hljs-comment">#Extract the list of customer entries</span>
clients = crm_data.get(<span class="hljs-string">"customers"</span>, [])
</code></pre>
<h3 id="heading-step-3-load-customer-entries-into-a-dataframe">Step 3: Load customer entries into a DataFrame</h3>
<p>Once you’ve got a clean list of customer dictionaries, load the list into a DataFrame and assign said list to a variable, like so:</p>
<pre><code class="lang-python"><span class="hljs-comment">#Load into a dataframe</span>
df = pd.DataFrame(clients)
</code></pre>
<p>This creates a tabular or spreadsheet-like structure, where each row represents a customer. Loading the list into a DataFrame also allows you to access <code>pandas</code>’ powerful data cleaning methods like:</p>
<ul>
<li><p><code>drop_duplicate()</code>: removes duplicate rows or entries from a DataFrame</p>
</li>
<li><p><code>dropna()</code>: drops rows with any missing or null data</p>
</li>
<li><p><code>fillna(value)</code>: replaces all missing or null data with a specified value</p>
</li>
<li><p><code>drop(columns)</code>: drops unused columns explicitly</p>
</li>
</ul>
<h3 id="heading-step-4-write-a-custom-function-to-rename-relevant-fields">Step 4: Write a custom function to rename relevant fields</h3>
<p>At this point, we need a function that takes in a single customer entry – a row – and returns a cleaned version that fits the target schema (<code>“full_name”</code>, <code>“email_address”</code>, <code>“mobile”</code> and <code>“tier”</code>).</p>
<p>The function should also handle missing data by setting default values like <strong>”Unknown”</strong> or <strong>”N/A”</strong> when a field is absent.</p>
<p><strong>P.S:</strong> At first, I used <code>drop(columns)</code> to explicitly remove the <code>“address”</code> and <code>“customer_id”</code> fields. But it’s not needed in this case, as the <code>transform_fields()</code> function only selects and renames the required fields. Any extra columns are automatically excluded from the cleaned data.</p>
<h3 id="heading-step-5-apply-schema-transformation-to-all-rows">Step 5: Apply schema transformation to all rows</h3>
<p>We’ll use <code>pandas</code>' <code>apply()</code> method to apply our custom function to each row in the DataFrame. This will creates a Series (for example, 0 → {...}, 1 → {...}, 2 → {...}), which is not JSON-friendly.</p>
<p>As <code>json.dump()</code> expects a list, not a Pandas Series, we’ll apply <code>tolist()</code>, converting the Series to a list of dictionaries.</p>
<pre><code class="lang-python"><span class="hljs-comment">#Apply schema transformation to all rows</span>
transformed_df = df.apply(transform_fields, axis=<span class="hljs-number">1</span>)

<span class="hljs-comment">#Convert series to list of dicts</span>
transformed_data = transformed_df.tolist()
</code></pre>
<p>Another way to approach this is with list comprehension. Instead of using <code>apply()</code> at all, you can write:</p>
<pre><code class="lang-python">transformed_data = [transform_fields(row) <span class="hljs-keyword">for</span> row <span class="hljs-keyword">in</span> df.to_dict(orient=<span class="hljs-string">"records"</span>)]
</code></pre>
<p><code>orient=”records”</code> is an argument for <code>df.to_dict</code> that tells pandas to convert the DataFrame to a list of dictionaries, where each dictionary represents a single customer record (that is, one row).</p>
<p>Then the <strong>for loop</strong> iterates through every customer record on the list, calling the custom function on each row. Finally, the list comprehension (<strong>[...]</strong>) collects the cleaned rows into a new list.</p>
<h3 id="heading-step-6-save-the-output-in-a-json-file">Step 6: Save the output in  a .json file</h3>
<pre><code class="lang-python"><span class="hljs-comment">#Save the cleaned data</span>
output_data = {<span class="hljs-string">"customers"</span>: transformed_data}
output_file = <span class="hljs-string">"applypandas_customer.json"</span>
<span class="hljs-keyword">with</span> open(output_file, <span class="hljs-string">"w"</span>) <span class="hljs-keyword">as</span> f:
    json.dump(output_data, f, indent=<span class="hljs-number">4</span>)
</code></pre>
<p>I recommend picking a different file name for your <code>pandas</code> output. You can inspect both files side by side to see if this output matches the result you got from cleaning with pure Python.</p>
<h3 id="heading-step-7-track-runtime">Step 7: Track runtime</h3>
<p>Once again, check for the difference between start time and end time to determine the program’s execution time.</p>
<pre><code class="lang-python">end_time = time.time()
elapsed_time = end_time - start_time

<span class="hljs-comment">#print(f"Transformed data saved to {output_file}")</span>
print(<span class="hljs-string">f"Transformed data saved to <span class="hljs-subst">{output_file}</span>"</span>)
print(<span class="hljs-string">f"Processing data took <span class="hljs-subst">{elapsed_time:<span class="hljs-number">.2</span>f}</span> seconds"</span>)
</code></pre>
<p>When I used <strong>list comprehension</strong> to apply the custom function, my script’s runtime was <strong>0.03 seconds</strong>, but with <code>pandas</code>’ <code>apply()</code> function, the total runtime dropped to <strong>0.01 seconds</strong>.</p>
<h3 id="heading-final-output-preview">Final output preview:</h3>
<p>If you followed this tutorial closely, your JSON output should look like this – whether you used the <code>pandas</code> method or the pure Python approach:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1751961256627/d7b585f7-4585-4354-9fa7-a171adb31f90.jpeg" alt="The expected JSON output after schema transformation" class="image--center mx-auto" width="455" height="310" loading="lazy"></p>
<h2 id="heading-how-to-validate-the-cleaned-json">How to Validate the Cleaned JSON</h2>
<p>Validating your output ensures that the cleaned data follows the expected structure before being used or shared. This step helps to catch formatting errors, missing fields, and wrong data types early.</p>
<p>Below are the steps for validating your cleaned JSON file:</p>
<h3 id="heading-step-1-install-and-import-jsonschema">Step 1: Install and import <code>jsonschema</code></h3>
<p><code>jsonschema</code> is a third-party validation library for Python. It helps you define the expected structure of your JSON data and automatically check if your output matches that structure.</p>
<p>In your terminal, run:</p>
<pre><code class="lang-python">pip install jsonschema
</code></pre>
<p>Import the required libraries:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> json
<span class="hljs-keyword">from</span> jsonschema <span class="hljs-keyword">import</span> validate, ValidationError
</code></pre>
<p><code>validate()</code> checks whether your JSON data matches the rules defined in your schema. If the data is valid, nothing happens. But if there’s an error – like a missing field or wrong data type – it raises a <code>ValidationError</code>.</p>
<h3 id="heading-step-2-define-a-schema">Step 2: Define a schema</h3>
<p>As you know, JSON schema changes with each file structure. If your JSON data differs from what we’ve been working with so far, learn how to create a schema <a target="_blank" href="https://json-schema.org/learn/getting-started-step-by-step#validate-json-data-against-the-schema">here</a>. Otherwise, the schema below defines the structure we expect for our cleaned JSON:</p>
<pre><code class="lang-python">schema = {
    <span class="hljs-string">"type"</span>: <span class="hljs-string">"object"</span>,
    <span class="hljs-string">"properties"</span>: {
        <span class="hljs-string">"customers"</span>: {
            <span class="hljs-string">"type"</span>: <span class="hljs-string">"array"</span>,
            <span class="hljs-string">"items"</span>: {
                <span class="hljs-string">"type"</span>: <span class="hljs-string">"object"</span>,
                <span class="hljs-string">"properties"</span>: {
                    <span class="hljs-string">"full_name"</span>: {<span class="hljs-string">"type"</span>: <span class="hljs-string">"string"</span>},
                    <span class="hljs-string">"email_address"</span>: {<span class="hljs-string">"type"</span>: <span class="hljs-string">"string"</span>},
                    <span class="hljs-string">"mobile"</span>: {<span class="hljs-string">"type"</span>: <span class="hljs-string">"string"</span>},
                    <span class="hljs-string">"tier"</span>: {<span class="hljs-string">"type"</span>: <span class="hljs-string">"string"</span>}
                },
                <span class="hljs-string">"required"</span>: [<span class="hljs-string">"full_name"</span>, <span class="hljs-string">"email_address"</span>, <span class="hljs-string">"mobile"</span>, <span class="hljs-string">"tier"</span>]
            }
        }
    },
    <span class="hljs-string">"required"</span>: [<span class="hljs-string">"customers"</span>]
}
</code></pre>
<ul>
<li><p>The data is an object that must contain a key: <code>"customers"</code>.</p>
</li>
<li><p><code>"customers"</code> must be an <strong>array</strong> (a list), with each object representing one customer entry.</p>
</li>
<li><p>Each customer entry must have four fields–all strings:</p>
<ul>
<li><p><code>"full_name"</code></p>
</li>
<li><p><code>"email_address"</code></p>
</li>
<li><p><code>"mobile"</code></p>
</li>
<li><p><code>"tier"</code></p>
</li>
</ul>
</li>
<li><p>The <code>"required"</code> fields ensure that none of the relevant fields are missing in any customer record.</p>
</li>
</ul>
<h3 id="heading-step-3-load-the-cleaned-json-file">Step 3: Load the cleaned JSON file</h3>
<pre><code class="lang-python"><span class="hljs-keyword">with</span> open(<span class="hljs-string">"transformed_data.json"</span>) <span class="hljs-keyword">as</span> f:
    data = json.load(f)
</code></pre>
<h3 id="heading-step-4-validate-the-data">Step 4: Validate the data</h3>
<p>For this step, we’ll use a <code>try. . . except</code> block to end the process safely, and display a helpful message if the code raises a <code>ValidationError</code>.</p>
<pre><code class="lang-python"><span class="hljs-keyword">try</span>:
    validate(instance=data, schema=schema)
    print(<span class="hljs-string">"JSON is valid."</span>)
<span class="hljs-keyword">except</span> ValidationError <span class="hljs-keyword">as</span> e:
    print(<span class="hljs-string">"JSON is invalid:"</span>, e.message)
</code></pre>
<h2 id="heading-pandas-vs-pure-python-for-data-cleaning">Pandas vs Pure Python for Data Cleaning</h2>
<p>From this tutorial, you can probably tell that using pure Python to clean and restructure JSON is the more straightforward approach. It is fast and ideal for handling small datasets or simple transformations.</p>
<p>But as data grows and becomes more complex, you might need advanced data cleaning methods that Python alone does not provide. In such cases, <code>pandas</code> becomes the better choice. It handles large, complex datasets effectively, providing built-in functions for handling missing data and removing duplicates.</p>
<p>You can study the <a target="_blank" href="https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf">Pandas cheatsheet</a> to learn more data manipulation methods.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Create a Basic CI/CD Pipeline with Webhooks on Linux ]]>
                </title>
                <description>
                    <![CDATA[ In the fast-paced world of software development, delivering high-quality applications quickly and reliably is crucial. This is where CI/CD (Continuous Integration and Continuous Delivery/Deployment) comes into play. CI/CD is a set of practices and to... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/create-a-basic-cicd-pipeline-with-webhooks-on-linux/</link>
                <guid isPermaLink="false">67995e567a54c877fce42276</guid>
                
                    <category>
                        <![CDATA[ Linux ]]>
                    </category>
                
                    <category>
                        <![CDATA[ linux for beginners ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python 3 ]]>
                    </category>
                
                    <category>
                        <![CDATA[ python beginner ]]>
                    </category>
                
                    <category>
                        <![CDATA[ ci-cd ]]>
                    </category>
                
                    <category>
                        <![CDATA[ CI/CD ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Juan P. Romano ]]>
                </dc:creator>
                <pubDate>Tue, 28 Jan 2025 22:46:46 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1737640144719/9035597c-0a69-4146-93cc-8bd659384169.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>In the fast-paced world of software development, delivering high-quality applications quickly and reliably is crucial. This is where <strong>CI/CD</strong> (Continuous Integration and Continuous Delivery/Deployment) comes into play.</p>
<p>CI/CD is a set of practices and tools designed to automate and streamline the process of integrating code changes, testing them, and deploying them to production. By adopting CI/CD, your team can reduce manual errors, speed up release cycles, and ensure that your code is always in a deployable state.</p>
<p>In this tutorial, we’ll focus on a beginner-friendly approach to setting up a basic CI/CD pipeline using Bitbucket, a Linux server, and Python with Flask. Specifically, we’ll create an automated process that pulls the latest changes from a Bitbucket repository to your Linux server whenever there’s a push or merge to a specific branch.</p>
<p>This process will be powered by Bitbucket webhooks and a simple Flask-based Python server that listens for incoming webhook events and triggers the deployment.</p>
<p>It’s important to note that CI/CD is a vast and complex field, and this tutorial is designed to provide a foundational understanding rather than to be an exhaustive guide.</p>
<p>We’ll cover the basics of setting up a CI/CD pipeline using tools that are accessible to beginners. Just keep in mind that real-world CI/CD systems often involve more advanced tools and configurations, such as containerization, orchestration, and multi-stage testing environments.</p>
<p>By the end of this tutorial, you’ll have a working example of how to automate deployments using Bitbucket, Linux, and Python, which you can build upon as you grow more comfortable with CI/CD concepts.</p>
<h3 id="heading-table-of-contents">Table of Contents:</h3>
<ol>
<li><p><a class="post-section-overview" href="#heading-why-is-cicd-important">Why is CI/CD Important?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-1-set-up-a-webhook-in-bitbucket">Step 1: Set Up a Webhook in Bitbucket</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-2-set-up-the-flask-listener-on-your-linux-server">Step 2: Set Up the Flask Listener on Your Linux Server</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-3-expose-the-flask-app-optional">Step 3: Expose the Flask App (Optional)</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-4-test-the-setup">Step 4: Test the Setup</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-step-5-security-considerations">Step 5: Security Considerations</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-wrapping-up">Wrapping Up</a></p>
</li>
</ol>
<h2 id="heading-why-is-cicd-important">Why is CI/CD Important?</h2>
<p>CI/CD has become a cornerstone of modern software development for several reasons. First and foremost, it accelerates the development process. By automating repetitive tasks like testing and deployment, developers can focus more on writing code and less on manual processes. This leads to faster delivery of new features and bug fixes, which is especially important in competitive markets where speed can be a differentiator.</p>
<p>Another key benefit of CI/CD is reduced errors and improved reliability. Automated testing ensures that every code change is rigorously checked for issues before it’s integrated into the main codebase. This minimizes the risk of introducing bugs that could disrupt the application or require costly fixes later. Automated deployment pipelines also reduce the likelihood of human error during the release process, ensuring that deployments are consistent and predictable.</p>
<p>CI/CD also fosters better collaboration among team members. In traditional development workflows, integrating code changes from multiple developers can be a time-consuming and error-prone process. With CI/CD, code is integrated and tested frequently, often multiple times a day. This means that conflicts are detected and resolved early, and the codebase remains in a stable state. As a result, teams can work more efficiently and with greater confidence, even when multiple contributors are working on different parts of the project simultaneously.</p>
<p>Finally, CI/CD supports continuous improvement and innovation. By automating the deployment process, teams can release updates to production more frequently and with less risk. This enables them to gather feedback from users faster and iterate on their products more effectively.</p>
<h3 id="heading-what-well-cover-in-this-tutorial">What We’ll Cover in This Tutorial</h3>
<p>In this tutorial, we’ll walk through the process of setting up a simple CI/CD pipeline that automates the deployment of code changes from a Bitbucket repository to a Linux server. Here’s what you’ll learn:</p>
<ol>
<li><p>How to configure a Bitbucket repository to send webhook notifications whenever there’s a push or merge to a specific branch.</p>
</li>
<li><p>How to set up a Flask-based Python server on your Linux server to listen for incoming webhook events.</p>
</li>
<li><p>How to write a script that pulls the latest changes from the repository and deploys them to the server.</p>
</li>
<li><p>How to test and troubleshoot your automated deployment process.</p>
</li>
</ol>
<p>By the end of this tutorial, you’ll have a working example of a basic CI/CD pipeline that you can customize and expand as needed. Let’s get started!</p>
<h2 id="heading-step-1-set-up-a-webhook-in-bitbucket"><strong>Step 1: Set Up a Webhook in Bitbucket</strong></h2>
<p>Before starting with the setup, let’s briefly explain what a <strong>webhook</strong> is and how it fits into our CI/CD process.</p>
<p>A webhook is a mechanism that allows one system to notify another system about an event in real-time. In the context of Bitbucket, a webhook can be configured to send an HTTP request (often a POST request with payload data) to a specified URL whenever a specific event occurs in your repository, such as a push to a branch or a pull request merge.</p>
<p>In our case, the webhook will notify our Flask-based Python server (running on your Linux server) whenever there’s a push or merge to a specific branch. This notification will trigger a script on the server to pull the latest changes from the repository and deploy them automatically. Essentially, the webhook acts as the bridge between Bitbucket and your server, enabling seamless automation of the deployment process.</p>
<p>Now that you understand the role of a webhook, let’s set one up in Bitbucket:</p>
<ol>
<li><p>Log in to Bitbucket and navigate to your repository.</p>
</li>
<li><p>On the left-hand sidebar, click on <strong>Settings</strong>.</p>
</li>
<li><p>Under the <strong>Workflow</strong> section, find and click on <strong>Webhooks</strong>.</p>
</li>
<li><p>Click the <strong>Add webhook</strong> button.</p>
</li>
<li><p>Enter a name for your webhook (for example, "Automatic Pull").</p>
</li>
<li><p>In the <strong>URL</strong> field, provide the URL to your server where the webhook will send the request. If you’re running a Flask app locally, this would be something like <a target="_blank" href="http://your-server-ip/pull-repo"><code>http://your-server-ip/pull-repo</code></a>. (For production environments, it’s highly recommended to use HTTPS to secure the communication between Bitbucket and your server.)</p>
</li>
<li><p>In the <strong>Triggers</strong> section, choose the events you want to listen to. For this example, we will select <strong>Push</strong> (and optionally, <strong>Pull Request Merged</strong> if you want to deploy after merges, too).</p>
</li>
<li><p>Save the webhook with a self-explanatory name so it’s easy to identify later.</p>
</li>
</ol>
<p>Once the webhook is set up, Bitbucket will send a POST request to the specified URL every time the selected event occurs. In the next steps, we’ll set up a Flask server to handle these incoming requests and trigger the deployment process.</p>
<p>Here is what you should see when you setup up the Bitbucket webhook</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1738092826221/e0d96fd3-d843-4064-a08d-4de95b985800.png" alt="Bitbucket screen showing the user the creation of a webhook, where your server will pull the modifications when you push or merge in your reposiroty." class="image--center mx-auto" width="600" height="400" loading="lazy"></p>
<h2 id="heading-step-2-set-up-the-flask-listener-on-your-linux-server"><strong>Step 2: Set Up the Flask Listener on Your Linux Server</strong></h2>
<p>In the next step, you’ll set up a simple web server on your Linux machine that will listen for the webhook from Bitbucket. When it receives the notification, it will execute a <code>git pull</code> or a force pull (in case of local changes) to update the repository.</p>
<h3 id="heading-install-flask"><strong>Install Flask:</strong></h3>
<p>To create the Flask application, first install Flask by running:</p>
<pre><code class="lang-bash">pip install flask
</code></pre>
<h3 id="heading-create-the-flask-app"><strong>Create the Flask App:</strong></h3>
<p>Create a new Python script (for example, <a target="_blank" href="http://app.py"><code>app_repo_pull.py</code></a>) on your server and add the following code:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> flask <span class="hljs-keyword">import</span> Flask
<span class="hljs-keyword">import</span> subprocess

app = Flask(__name__)

<span class="hljs-meta">@app.route('/pull-repo', methods=['POST'])</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">pull_repo</span>():</span>
    <span class="hljs-keyword">try</span>:
        <span class="hljs-comment"># Fetch the latest changes from the remote repository</span>
        subprocess.run([<span class="hljs-string">"git"</span>, <span class="hljs-string">"-C"</span>, <span class="hljs-string">"/path/to/your/repository"</span>, <span class="hljs-string">"fetch"</span>], check=<span class="hljs-literal">True</span>)
        <span class="hljs-comment"># Force reset the local branch to match the remote 'test' branch</span>
        subprocess.run([<span class="hljs-string">"git"</span>, <span class="hljs-string">"-C"</span>, <span class="hljs-string">"/path/to/your/repository"</span>, <span class="hljs-string">"reset"</span>, <span class="hljs-string">"--hard"</span>, <span class="hljs-string">"origin/test"</span>], check=<span class="hljs-literal">True</span>)  <span class="hljs-comment"># Replace 'test' with your branch name</span>
        <span class="hljs-keyword">return</span> <span class="hljs-string">"Force pull successful"</span>, <span class="hljs-number">200</span>
    <span class="hljs-keyword">except</span> subprocess.CalledProcessError:
        <span class="hljs-keyword">return</span> <span class="hljs-string">"Failed to force pull the repository"</span>, <span class="hljs-number">500</span>

<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">'__main__'</span>:
    app.run(host=<span class="hljs-string">'0.0.0.0'</span>, port=<span class="hljs-number">5000</span>)
</code></pre>
<p>Here’s what this code does:</p>
<ul>
<li><p><a target="_blank" href="http://subprocess.run"><code>subprocess.run</code></a><code>(["git", "-C", "/path/to/your/repository", "fetch"])</code>: This command fetches the latest changes from the remote repository without affecting the local working directory.</p>
</li>
<li><p><a target="_blank" href="http://subprocess.run"><code>subprocess.run</code></a><code>(["git", "-C", "/path/to/your/repository", "reset", "--hard", "origin/test"])</code>: This command performs a hard reset, forcing the local repository to match the remote <code>test</code> branch. Replace <code>test</code> with the name of your branch.</p>
</li>
</ul>
<p>Make sure to replace <code>/path/to/your/repository</code> with the actual path to your local Git repository.</p>
<h2 id="heading-step-3-expose-the-flask-app-optional"><strong>Step 3: Expose the Flask App (Optional)</strong></h2>
<p>If you want the Flask app to be accessible from outside your server, you need to expose it publicly. For this, you can set up a reverse proxy with NGINX. Here's how to do that:</p>
<p>First, install NGINX if you don't have it already by running this command:</p>
<pre><code class="lang-bash">sudo apt-get install nginx
</code></pre>
<p>Next, you’ll need to configure NGINX to proxy requests to your Flask app. Open the NGINX configuration file:</p>
<pre><code class="lang-bash">sudo nano /etc/nginx/sites-available/default
</code></pre>
<p>Modify the configuration to include this block:</p>
<pre><code class="lang-bash">server {
    listen 80;
    server_name your-server-ip;

    location /pull-repo {
        proxy_pass http://localhost:5000;
        proxy_set_header Host <span class="hljs-variable">$host</span>;
        proxy_set_header X-Real-IP <span class="hljs-variable">$remote_addr</span>;
        proxy_set_header X-Forwarded-For <span class="hljs-variable">$proxy_add_x_forwarded_for</span>;
        proxy_set_header X-Forwarded-Proto <span class="hljs-variable">$scheme</span>;
    }
}
</code></pre>
<p>Now just reload NGINX to apply the changes:</p>
<pre><code class="lang-bash">sudo systemctl reload nginx
</code></pre>
<h2 id="heading-step-4-test-the-setup"><strong>Step 4: Test the Setup</strong></h2>
<p>Now that everything is set up, go ahead and start the Flask app by executing this Python script:</p>
<pre><code class="lang-bash">python3 app_repo_pull.py
</code></pre>
<p>Now to test if everything is working:</p>
<ol>
<li><strong>Make a commit</strong>: Push a commit to the <code>test</code> branch in your Bitbucket repository. This action will trigger the webhook.</li>
</ol>
<ol>
<li><p><strong>Webhook trigger</strong>: The webhook will send a POST request to your server. The Flask app will receive this request, perform a force pull from the <code>test</code> branch, and update the local repository.</p>
</li>
<li><p><strong>Verify the pull</strong>: Check the log output of your Flask app or inspect the local repository to verify that the changes have been pulled and applied successfully.</p>
</li>
</ol>
<h2 id="heading-step-5-security-considerations"><strong>Step 5: Security Considerations</strong></h2>
<p>When exposing a Flask app to the internet, securing your server and application is crucial to protect it from unauthorized access, data breaches, and attacks. Here are the key areas to focus on:</p>
<h4 id="heading-1-use-a-secure-server-with-proper-firewall-rules"><strong>1. Use a Secure Server with Proper Firewall Rules</strong></h4>
<p>A secure server is one that is configured to minimize exposure to external threats. This involves using firewall rules, minimizing unnecessary services, and ensuring that only required ports are open for communication.</p>
<h5 id="heading-example-of-a-secure-server-setup"><strong>Example of a secure server setup:</strong></h5>
<ul>
<li><p><strong>Minimal software</strong>: Only install the software you need (for example, Python, Flask, NGINX) and remove unnecessary services.</p>
</li>
<li><p><strong>Operating system updates</strong>: Ensure your server's operating system is up-to-date with the latest security patches.</p>
</li>
<li><p><strong>Firewall configuration</strong>: Use a firewall to control incoming and outgoing traffic and limit access to your server.</p>
</li>
</ul>
<p>For example, a basic <strong>UFW (Uncomplicated Firewall)</strong> configuration on Ubuntu might look like this:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Allow SSH (port 22) for remote access</span>
sudo ufw allow ssh

<span class="hljs-comment"># Allow HTTP (port 80) and HTTPS (port 443) for web traffic</span>
sudo ufw allow http
sudo ufw allow https

<span class="hljs-comment"># Enable the firewall</span>
sudo ufw <span class="hljs-built_in">enable</span>

<span class="hljs-comment"># Check the status of the firewall</span>
sudo ufw status
</code></pre>
<p>In this case:</p>
<ul>
<li><p>The firewall allows incoming SSH connections on port 22, HTTP on port 80, and HTTPS on port 443.</p>
</li>
<li><p>Any unnecessary ports or services should be blocked by default to limit exposure to attacks.</p>
</li>
</ul>
<h5 id="heading-additional-firewall-rules"><strong>Additional Firewall Rules:</strong></h5>
<ul>
<li><p><strong>Limit access to webhook endpoint</strong>: Ideally, only allow traffic to the webhook endpoint from Bitbucket's IP addresses to prevent external access. You can set this up in your firewall or using your web server (for example, NGINX) by only accepting requests from Bitbucket's IP range.</p>
</li>
<li><p><strong>Deny all other incoming traffic</strong>: For any service that does not need to be exposed to the internet (for example, database ports), ensure those ports are blocked.</p>
</li>
</ul>
<h4 id="heading-2-add-authentication-to-the-flask-app"><strong>2. Add Authentication to the Flask App</strong></h4>
<p>Since your Flask app will be publicly accessible via the webhook URL, you should consider adding authentication to ensure only authorized users (such as Bitbucket's servers) can trigger the pull.</p>
<h5 id="heading-basic-authentication-example"><strong>Basic Authentication Example:</strong></h5>
<p>You can use a simple token-based authentication to secure your webhook endpoint. Here’s an example of how to modify your Flask app to require an authentication token:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> flask <span class="hljs-keyword">import</span> Flask, request, abort
<span class="hljs-keyword">import</span> subprocess

app = Flask(__name__)

<span class="hljs-comment"># Define a secret token for webhook verification</span>
SECRET_TOKEN = <span class="hljs-string">'your-secret-token'</span>

<span class="hljs-meta">@app.route('/pull-repo', methods=['POST'])</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">pull_repo</span>():</span>
    <span class="hljs-comment"># Check if the request contains the correct token</span>
    token = request.headers.get(<span class="hljs-string">'X-Hub-Signature'</span>)
    <span class="hljs-keyword">if</span> token != SECRET_TOKEN:
        abort(<span class="hljs-number">403</span>)  <span class="hljs-comment"># Forbidden if the token is incorrect</span>

    <span class="hljs-keyword">try</span>:
        subprocess.run([<span class="hljs-string">"git"</span>, <span class="hljs-string">"-C"</span>, <span class="hljs-string">"/path/to/your/repository"</span>, <span class="hljs-string">"fetch"</span>], check=<span class="hljs-literal">True</span>)
        subprocess.run([<span class="hljs-string">"git"</span>, <span class="hljs-string">"-C"</span>, <span class="hljs-string">"/path/to/your/repository"</span>, <span class="hljs-string">"reset"</span>, <span class="hljs-string">"--hard"</span>, <span class="hljs-string">"origin/test"</span>], check=<span class="hljs-literal">True</span>)
        <span class="hljs-keyword">return</span> <span class="hljs-string">"Force pull successful"</span>, <span class="hljs-number">200</span>
    <span class="hljs-keyword">except</span> subprocess.CalledProcessError:
        <span class="hljs-keyword">return</span> <span class="hljs-string">"Failed to force pull the repository"</span>, <span class="hljs-number">500</span>

<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">'__main__'</span>:
    app.run(host=<span class="hljs-string">'0.0.0.0'</span>, port=<span class="hljs-number">5000</span>)
</code></pre>
<h5 id="heading-how-it-works"><strong>How it works:</strong></h5>
<ul>
<li><p>The <code>X-Hub-Signature</code> is a custom header that you add to the request when setting up the webhook in Bitbucket.</p>
</li>
<li><p>Only requests with the correct token will be allowed to trigger the pull. If the token is missing or incorrect, the request is rejected with a <code>403 Forbidden</code> response.</p>
</li>
</ul>
<p>You can also use more complex forms of authentication, such as OAuth or HMAC (Hash-based Message Authentication Code), but this simple token approach works for many cases.</p>
<h4 id="heading-3-use-https-for-secure-communication"><strong>3. Use HTTPS for Secure Communication</strong></h4>
<p>It’s crucial to encrypt the data transmitted between your Flask app and the Bitbucket webhook, as well as any sensitive data (such as tokens or passwords) being transmitted over the network. This ensures that attackers cannot intercept or modify the data.</p>
<h5 id="heading-why-https"><strong>Why HTTPS?</strong></h5>
<ul>
<li><p><strong>Data encryption</strong>: HTTPS encrypts the communication, ensuring that sensitive data like your authentication token is not exposed to man-in-the-middle attacks.</p>
</li>
<li><p><strong>Trust and integrity</strong>: HTTPS helps ensure that the data received by your server hasn’t been tampered with.</p>
</li>
</ul>
<h5 id="heading-using-lets-encrypt-to-secure-your-flask-app-with-ssl"><strong>Using Let’s Encrypt to Secure Your Flask App with SSL:</strong></h5>
<ol>
<li><strong>Install Certbot</strong> (the tool for obtaining Let’s Encrypt certificates):</li>
</ol>
<pre><code class="lang-bash">sudo apt-get update
sudo apt-get install certbot python3-certbot-nginx
</code></pre>
<p><strong>Obtain a free SSL certificate for your domain</strong>:</p>
<pre><code class="lang-bash">sudo certbot --nginx -d your-domain.com
</code></pre>
<ul>
<li><p>This command will automatically configure Nginx to use HTTPS with a free SSL certificate from Let’s Encrypt.</p>
</li>
<li><p><strong>Ensure HTTPS is used</strong>: Make sure that your Flask app or Nginx configuration forces all traffic to use HTTPS. You can do this by setting up a redirection rule in Nginx:</p>
</li>
</ul>
<pre><code class="lang-bash">server {
    listen 80;
    server_name your-domain.com;

    <span class="hljs-comment"># Redirect HTTP to HTTPS</span>
    <span class="hljs-built_in">return</span> 301 https://<span class="hljs-variable">$host</span><span class="hljs-variable">$request_uri</span>;
}

server {
    listen 443 ssl;
    server_name your-domain.com;

    ssl_certificate /etc/letsencrypt/live/your-domain.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/your-domain.com/privkey.pem;

    <span class="hljs-comment"># Other Nginx configuration...</span>
}
</code></pre>
<p><strong>Automatic Renewal</strong>: Let’s Encrypt certificates are valid for 90 days, so it’s important to set up automatic renewal:</p>
<pre><code class="lang-bash">sudo certbot renew --dry-run
</code></pre>
<p>This command tests the renewal process to make sure everything is working.</p>
<h4 id="heading-4-logging-and-monitoring"><strong>4. Logging and Monitoring</strong></h4>
<p>Implement logging and monitoring for your Flask app to track any unauthorized attempts, errors, or unusual activity:</p>
<ul>
<li><p><strong>Log requests</strong>: Log all incoming requests, including the IP address, request headers, and response status, so you can monitor for any suspicious activity.</p>
</li>
<li><p><strong>Use monitoring tools</strong>: Set up tools like <strong>Prometheus</strong>, <strong>Grafana</strong>, or <strong>New Relic</strong> to monitor server performance and app health.</p>
</li>
</ul>
<h2 id="heading-wrapping-up">Wrapping Up</h2>
<p>In this tutorial, we explored how to set up a simple, beginner-friendly CI/CD pipeline that automates deployments using Bitbucket, a Linux server, and Python with Flask. Here’s a recap of what you’ve learned:</p>
<ol>
<li><p><strong>CI/CD Fundamentals</strong>: We discussed the basics of Continuous Integration (CI) and Continuous Delivery/Deployment (CD), which are essential practices for automating the integration, testing, and deployment of code. You learned how CI/CD helps speed up development, reduce errors, and improve collaboration among developers.</p>
</li>
<li><p><strong>Setting Up Bitbucket Webhooks</strong>: You learned how to configure a Bitbucket webhook to notify your server whenever there’s a push or merge to a specific branch. This webhook serves as a trigger to initiate the deployment process automatically.</p>
</li>
<li><p><strong>Creating a Flask-based Webhook Listener</strong>: We showed you how to set up a Flask app on your Linux server to listen for incoming webhook requests from Bitbucket. This Flask app receives the notifications and runs the necessary Git commands to pull and deploy the latest changes.</p>
</li>
<li><p><strong>Automating the Deployment Process</strong>: Using Python and Flask, we automated the process of pulling changes from the Bitbucket repository and performing a force pull to ensure the latest code is deployed. You also learned how to configure the server to expose the Flask app and accept requests securely.</p>
</li>
<li><p><strong>Security Considerations</strong>: We covered critical security steps to protect your deployment process:</p>
<ul>
<li><p><strong>Firewall Rules</strong>: We discussed configuring firewall rules to limit exposure and ensure only authorized traffic (from Bitbucket) can access your server.</p>
</li>
<li><p><strong>Authentication</strong>: We added token-based authentication to ensure only authorized requests can trigger deployments.</p>
</li>
<li><p><strong>HTTPS</strong>: We explained how to secure the communication between your server and Bitbucket using SSL certificates from Let's Encrypt.</p>
</li>
<li><p><strong>Logging and Monitoring</strong>: Lastly, we recommended setting up logging and monitoring to keep track of any unusual activity or errors.</p>
</li>
</ul>
</li>
</ol>
<h3 id="heading-next-steps"><strong>Next Steps</strong></h3>
<p>By the end of this tutorial, you now have a working example of an automated deployment pipeline. While this is a basic implementation, it serves as a foundation you can build on. As you grow more comfortable with CI/CD, you can explore advanced topics like:</p>
<ul>
<li><p>Multi-stage deployment pipelines</p>
</li>
<li><p>Integration with containerization tools like Docker</p>
</li>
<li><p>More complex testing and deployment strategies</p>
</li>
<li><p>Use of orchestration tools like Kubernetes for scaling</p>
</li>
</ul>
<p>CI/CD practices are continually evolving, and by mastering the basics, you’ve set yourself up for success as you expand your skills in this area. Happy automating and thank you for reading!</p>
<p>You can <a target="_blank" href="https://github.com/jpromanonet/ci_cd_fcc/tree/main">fork the code from here</a>.</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ Python’s zip() Function Explained with Simple Examples ]]>
                </title>
                <description>
                    <![CDATA[ The zip() function in Python is a neat tool that allows you to combine multiple lists or other iterables (like tuples, sets, or even strings) into one iterable of tuples. Think of it like a zipper on a jacket that brings two sides together. In this g... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/python-zip-function-explained-with-examples/</link>
                <guid isPermaLink="false">6707eb818bd3718987eac606</guid>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python 3 ]]>
                    </category>
                
                    <category>
                        <![CDATA[ python beginner ]]>
                    </category>
                
                    <category>
                        <![CDATA[ programming languages ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Sahil ]]>
                </dc:creator>
                <pubDate>Thu, 10 Oct 2024 14:58:09 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1728351007032/90a321bb-4079-4480-90e7-7aa847c54d9d.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>The <code>zip()</code> function in Python is a neat tool that allows you to combine multiple lists or other iterables (like tuples, sets, or even strings) into one iterable of tuples. Think of it like a zipper on a jacket that brings two sides together.</p>
<p>In this guide, we’ll explore the ins and outs of the <code>zip()</code> function with simple, practical examples that will help you understand how to use it effectively.</p>
<h2 id="heading-how-does-the-zip-function-work">How Does the <code>zip()</code> Function Work?</h2>
<p>The <code>zip()</code> function pairs elements from multiple iterables, like lists, based on their positions. This means that the first elements of each list will be paired, then the second, and so on. If the iterables are not the same length, <code>zip()</code> will stop at the end of the shortest iterable.</p>
<p>The syntax for <code>zip()</code> is pretty straightforward:</p>
<pre><code class="lang-python">zip(*iterables)
</code></pre>
<p>You can pass in multiple iterables (lists, tuples, and so on), and it will combine them into tuples.</p>
<h3 id="heading-example-1-combining-two-lists">Example 1: Combining Two Lists</h3>
<p>Let’s start with a simple case where we have two lists, and we want to combine them. Imagine you have a list of names and a corresponding list of scores, and you want to pair them up.</p>
<pre><code class="lang-python"><span class="hljs-comment"># Two lists to combine</span>
names = [<span class="hljs-string">"Alice"</span>, <span class="hljs-string">"Bob"</span>, <span class="hljs-string">"Charlie"</span>]
scores = [<span class="hljs-number">85</span>, <span class="hljs-number">90</span>, <span class="hljs-number">88</span>]

<span class="hljs-comment"># Using zip() to combine them</span>
zipped = zip(names, scores)

<span class="hljs-comment"># Convert the result to a list so we can see it</span>
zipped_list = list(zipped)
print(zipped_list)
</code></pre>
<p>In this example, the <code>zip()</code> function takes the two lists—<code>names</code> and <code>scores</code>—and pairs them element by element. The first element from <code>names</code> (<code>"Alice"</code>) is paired with the first element from <code>scores</code> (<code>85</code>), and so on. When we convert the result into a list, it looks like this:</p>
<p><strong>Output:</strong></p>
<pre><code class="lang-python">[(<span class="hljs-string">'Alice'</span>, <span class="hljs-number">85</span>), (<span class="hljs-string">'Bob'</span>, <span class="hljs-number">90</span>), (<span class="hljs-string">'Charlie'</span>, <span class="hljs-number">88</span>)]
</code></pre>
<p>This makes it easy to work with related data in a structured way.</p>
<h3 id="heading-example-2-what-happens-when-the-lists-are-uneven">Example 2: What Happens When the Lists Are Uneven?</h3>
<p>Let’s say you have lists of different lengths. What happens then? The <code>zip()</code> function is smart enough to stop as soon as it reaches the end of the shortest list.</p>
<pre><code class="lang-python"><span class="hljs-comment"># Lists of different lengths</span>
fruits = [<span class="hljs-string">"apple"</span>, <span class="hljs-string">"banana"</span>]
prices = [<span class="hljs-number">100</span>, <span class="hljs-number">200</span>, <span class="hljs-number">150</span>]

<span class="hljs-comment"># Zipping them together</span>
result = list(zip(fruits, prices))
print(result)
</code></pre>
<p>In this case, the <code>fruits</code> list has two elements, and the <code>prices</code> list has three. But <code>zip()</code> will only combine the first two elements, ignoring the extra value in <code>prices</code>.</p>
<p><strong>Output:</strong></p>
<pre><code class="lang-python">[(<span class="hljs-string">'apple'</span>, <span class="hljs-number">100</span>), (<span class="hljs-string">'banana'</span>, <span class="hljs-number">200</span>)]
</code></pre>
<p>Notice how the last value (<code>150</code>) in the <code>prices</code> list is ignored because there’s no third fruit to pair it with. The <code>zip()</code> function ensures that you don’t get errors when working with uneven lists, but it also means you might lose some data if your lists are not balanced.</p>
<h3 id="heading-example-3-unzipping-a-zipped-object">Example 3: Unzipping a Zipped Object</h3>
<p>What if you want to reverse the <code>zip()</code> operation? For example, after zipping two lists together, you might want to split them back into individual lists. You can do this easily using the unpacking operator <code>*</code>.</p>
<pre><code class="lang-python"><span class="hljs-comment"># Zipped lists</span>
cities = [<span class="hljs-string">"New York"</span>, <span class="hljs-string">"London"</span>, <span class="hljs-string">"Tokyo"</span>]
populations = [<span class="hljs-number">8000000</span>, <span class="hljs-number">9000000</span>, <span class="hljs-number">14000000</span>]

zipped = zip(cities, populations)

<span class="hljs-comment"># Unzipping them</span>
unzipped_cities, unzipped_populations = zip(*zipped)

print(unzipped_cities)
print(unzipped_populations)
</code></pre>
<p>Here, we first zip the <code>cities</code> and <code>populations</code> lists together. Then, using <code>zip(*zipped)</code>, we can "unzip" the combined tuples back into two separate lists. The <code>*</code> operator unpacks the zipped tuples into their original components.</p>
<p><strong>Output:</strong></p>
<pre><code class="lang-python">(<span class="hljs-string">'New York'</span>, <span class="hljs-string">'London'</span>, <span class="hljs-string">'Tokyo'</span>)
(<span class="hljs-number">8000000</span>, <span class="hljs-number">9000000</span>, <span class="hljs-number">14000000</span>)
</code></pre>
<p>This shows how you can reverse the zipping process to get the original data back.</p>
<h3 id="heading-example-4-zipping-more-than-two-lists">Example 4: Zipping More Than Two Lists</h3>
<p>You aren’t limited to just two lists with <code>zip()</code>. You can zip together as many iterables as you want. Here’s an example with three lists.</p>
<pre><code class="lang-python"><span class="hljs-comment"># Three lists to zip</span>
subjects = [<span class="hljs-string">"Math"</span>, <span class="hljs-string">"English"</span>, <span class="hljs-string">"Science"</span>]
grades = [<span class="hljs-number">88</span>, <span class="hljs-number">79</span>, <span class="hljs-number">92</span>]
teachers = [<span class="hljs-string">"Mr. Smith"</span>, <span class="hljs-string">"Ms. Johnson"</span>, <span class="hljs-string">"Mrs. Lee"</span>]

<span class="hljs-comment"># Zipping three lists together</span>
zipped_info = zip(subjects, grades, teachers)

<span class="hljs-comment"># Convert to a list to see the result</span>
print(list(zipped_info))
</code></pre>
<p>In this example, we are zipping three lists—<code>subjects</code>, <code>grades</code>, and <code>teachers</code>. The first item from each list is grouped together, then the second, and so on.</p>
<p><strong>Output:</strong></p>
<pre><code class="lang-python">[(<span class="hljs-string">'Math'</span>, <span class="hljs-number">88</span>, <span class="hljs-string">'Mr. Smith'</span>), (<span class="hljs-string">'English'</span>, <span class="hljs-number">79</span>, <span class="hljs-string">'Ms. Johnson'</span>), (<span class="hljs-string">'Science'</span>, <span class="hljs-number">92</span>, <span class="hljs-string">'Mrs. Lee'</span>)]
</code></pre>
<p>This way, you can combine multiple related pieces of information into easy-to-handle tuples.</p>
<h3 id="heading-example-5-zipping-strings">Example 5: Zipping Strings</h3>
<p>Strings are also iterables in Python, so you can zip over them just like you would with lists. Let’s try combining two strings.</p>
<pre><code class="lang-python"><span class="hljs-comment"># Zipping two strings</span>
str1 = <span class="hljs-string">"ABC"</span>
str2 = <span class="hljs-string">"123"</span>

<span class="hljs-comment"># Zipping the characters together</span>
zipped_strings = list(zip(str1, str2))
print(zipped_strings)
</code></pre>
<p>Here, the first character of <code>str1</code> is combined with the first character of <code>str2</code>, and so on.</p>
<p><strong>Output:</strong></p>
<pre><code class="lang-python">[(<span class="hljs-string">'A'</span>, <span class="hljs-string">'1'</span>), (<span class="hljs-string">'B'</span>, <span class="hljs-string">'2'</span>), (<span class="hljs-string">'C'</span>, <span class="hljs-string">'3'</span>)]
</code></pre>
<p>This is especially useful if you need to process or pair characters from multiple strings together.</p>
<h3 id="heading-example-6-zipping-dictionaries">Example 6: Zipping Dictionaries</h3>
<p>Although dictionaries are slightly different from lists, you can still use <code>zip()</code> to combine them. By default, <code>zip()</code> will only zip the dictionary keys. Let’s look at an example:</p>
<pre><code class="lang-python"><span class="hljs-comment"># Two dictionaries</span>
dict1 = {<span class="hljs-string">"name"</span>: <span class="hljs-string">"Alice"</span>, <span class="hljs-string">"age"</span>: <span class="hljs-number">25</span><span class="hljs-string">"}
dict2 = {"</span>name<span class="hljs-string">": "</span>Bo<span class="hljs-string">b", "</span>age<span class="hljs-string">": 30"</span>}

<span class="hljs-comment"># Zipping dictionary keys</span>
zipped_keys = list(zip(dict1, dict2))
print(zipped_keys)
</code></pre>
<p>Here, <code>zip()</code> pairs up the keys from both dictionaries.</p>
<p><strong>Output:</strong></p>
<pre><code class="lang-python">[(<span class="hljs-string">'name'</span>, <span class="hljs-string">'name'</span>), (<span class="hljs-string">'age'</span>, <span class="hljs-string">'age'</span>)]
</code></pre>
<p>If you want to zip the values of the dictionaries, you can do that using the <code>.values()</code> method:</p>
<pre><code class="lang-python">zipped_values = list(zip(dict1.values(), dict2.values()))
print(zipped_values)
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-python">[(<span class="hljs-string">'Alice'</span>, <span class="hljs-string">'Bob'</span>), (<span class="hljs-number">25</span>, <span class="hljs-number">30</span>)]
</code></pre>
<p>Now you can easily combine the values of the two dictionaries.</p>
<h3 id="heading-example-7-using-zip-in-loops">Example 7: Using <code>zip()</code> in Loops</h3>
<p>One of the most common uses of <code>zip()</code> is in loops when you want to process multiple lists at the same time. Here’s an example:</p>
<pre><code class="lang-python"><span class="hljs-comment"># Lists of names and scores</span>
names = [<span class="hljs-string">"Alice"</span>, <span class="hljs-string">"Bob"</span>, <span class="hljs-string">"Charlie"</span>]
scores = [<span class="hljs-number">85</span>, <span class="hljs-number">90</span>, <span class="hljs-number">88</span>]

<span class="hljs-comment"># Using zip() in a loop</span>
<span class="hljs-keyword">for</span> name, score <span class="hljs-keyword">in</span> zip(names, scores):
    print(<span class="hljs-string">f"<span class="hljs-subst">{name}</span> scored <span class="hljs-subst">{score}</span>"</span>)
</code></pre>
<p>This loop iterates over both the <code>names</code> and <code>scores</code> lists simultaneously, pairing up each name with its corresponding score.</p>
<p><strong>Output:</strong></p>
<pre><code class="lang-python">Alice scored <span class="hljs-number">85</span>
Bob scored <span class="hljs-number">90</span>
Charlie scored <span class="hljs-number">88</span>
</code></pre>
<p>Using <code>zip()</code> in loops like this makes your code cleaner and easier to read when working with related data.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>The <code>zip()</code> function is a handy tool in Python that lets you combine multiple iterables into tuples, making it easier to work with related data. Whether you're pairing up items from lists, tuples, or strings, <code>zip()</code> simplifies your code and can be especially useful in loops.</p>
<p>With the examples in this article, you should now have a good understanding of how to use <code>zip()</code> in various scenarios.</p>
<p>If you found this explanation of Python's <code>zip()</code> function helpful, you might also enjoy more in-depth programming tutorials and concepts I cover on my <a target="_blank" href="https://blog.theenthusiast.dev">blog</a>.</p>
<p>Happy coding!</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Build Good Coding Habits as a New Python Developer ]]>
                </title>
                <description>
                    <![CDATA[ When you're starting out as a new Python developer, you'll likely develop some habits, both good and bad. Coding is something of an art form. Flexibility and customization are encouraged — and you can usually write code how you want within the contex... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-build-good-coding-habits/</link>
                <guid isPermaLink="false">66c5008909edd2016542d012</guid>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                    <category>
                        <![CDATA[ python beginner ]]>
                    </category>
                
                    <category>
                        <![CDATA[ coding ]]>
                    </category>
                
                    <category>
                        <![CDATA[ #codingNewbies ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Eleanor Hecks ]]>
                </dc:creator>
                <pubDate>Tue, 20 Aug 2024 20:46:01 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1724179204764/68fe386c-336f-4f05-9652-bbf5644b5a1b.jpeg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>When you're starting out as a new Python developer, you'll likely develop some habits, both good and bad.</p>
<p>Coding is something of an art form. Flexibility and customization are encouraged — and you can usually write code how you want within the context of the language.</p>
<p>The problem is, you're communicating with the computer publicly. You need to write your code in a way that makes sense to others.</p>
<p>Also, using improper syntax or not ensuring you’re writing effectively can lead to errors in your programming. Messy code makes it extremely difficult to find those errors later. Readable, clean writing is the way to go, which means forming good coding habits early on so you’re following them throughout your entire career.</p>
<p>Here are six tips for building good coding habits as you start out in Python.</p>
<h2 id="heading-1-follow-the-pep-8-style-guide"><strong>1. Follow the PEP 8 Style Guide</strong></h2>
<p>Copywriters and other content writers typically use something called a style guide. A style guide sets rules about the formatting and organization of the text. It might explain whether to use the Oxford comma or when to use title caps and other structured approaches.</p>
<p>Python has a style guide just like this, known as PEP 8, PEP8, or PEP-8. Several skilled Python developers <a target="_blank" href="https://peps.python.org/pep-0008/">published the guide in 2001</a> to share how to write perfectly readable and consistent code.</p>
<p>Some tenets include:</p>
<ul>
<li><p>Using proper indentation techniques.</p>
</li>
<li><p>Staying below the maximum line length of 79 characters.</p>
</li>
<li><p>Using line breaks.</p>
</li>
<li><p>Employing blank lines — double or single — for functions, class, and method definitions.</p>
</li>
<li><p>Using proper naming conventions for variables, classes, functions, and so on.</p>
</li>
</ul>
<p>If you haven’t yet, read through the Python Pep 8 style guide and make sure you’re following the techniques.</p>
<h2 id="heading-2-use-the-newest-python-version"><strong>2. Use the Newest Python Version</strong></h2>
<p>Programming languages like Python go through many iterations during their life cycles. Old versions are typically phased out for newer releases. Generally, the newest release includes bug fixes, as well as security or performance improvements.</p>
<p>At a minimum, use Python 3 over Python 2, as the older version <a target="_blank" href="https://www.python.org/doc/sunset-python-2/">has reached end-of-life status</a> as of January 2020. Also, when working with third-party modules, frameworks or repositories, always reference the Minimum Required Python Version. This is the oldest version of Python that is compatible with the related components.</p>
<h2 id="heading-3-always-comment-out-specific-code"><strong>3. Always Comment Out Specific Code</strong></h2>
<p>In the moment as you’re writing your code, you know what you’re trying to achieve. When you read that code later, you might forget — or worse yet, if someone else is reading that code, they might find themselves perplexed. That’s what comments are for.</p>
<p>Every language has a way to “comment out” certain sections of code. The idea is to use descriptive yet succinct comments to explain what’s happening. Some developers forget to do this entirely, but if you start early and always follow the rule, you’ll be able to write easily followable syntax.</p>
<p>In Python, you use a “#” symbol at the start of the comment to comment out a line. To write a multi-line comment, you can use triple quotes (''') at the beginning or end or multiple hashtags per line.</p>
<p><code>#This is a regular comment.</code></p>
<pre><code class="lang-python">‘’’
This <span class="hljs-keyword">is</span> a multi-line comment.
To explain what the code <span class="hljs-keyword">is</span> doing.
‘’’
</code></pre>
<p>Commenting can be a vital part of the coding process as it allows you to better remember and visualize the ideas going through your mind as you’re coding.</p>
<p>According to experts, handwriting your notes and then transcribing them digitally through things like commenting <a target="_blank" href="https://blog.box.com/best-note-taking-methods">improves your retention by 75</a> percent. This means, when you discover a bug or want to make improvements later, you can more easily recall the relevant code snippets.</p>
<p>Inline comments can also appear in the same line as a point of code. For example:</p>
<p><code>print (“Hello World. This is my first code.”) # This is how you create an inline comment</code></p>
<h2 id="heading-4-use-a-linter"><strong>4. Use a Linter</strong></h2>
<p>A Python linter reviews code spacing, line length and various design qualifications like argument positioning. As a result, your code looks clean, organized and consistently written across multiple files in your project.</p>
<p>Bear in mind that a linter is different from an auto-formatter or beautifier — although, in modern coding, the same tool may handle both of these support functions. You can think of a linter as something that fixes practical issues versus an auto-formatter, which fixes more of the styling.</p>
<p>Linters can analyze and identify coding errors, potential bugs, misspellings or syntax problems, but also stylistic inconsistencies, such as how you’re using indents and spacing. Auto-formatters focus on the writing or stylistic part of syntax like commas, quotes, proper line length and so on. Both are helpful, but you seldom want to code without a linter handy.</p>
<p>Some examples of the best Python linters include Pylint, Flake8, Ruff, Xenon and Radon, among others. The linter used in the following screenshot is Ruff, installed via VSCode.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/08/Python-Linter-in-VSCode-with-Ruff.jpg" alt="Python Linter in VSCode with Ruff" width="600" height="400" loading="lazy"></p>
<h2 id="heading-5-rely-on-built-in-functions-and-libraries"><strong>5. Rely on Built-In Functions and Libraries</strong></h2>
<p>The beauty of Python and languages like it is that you’re never starting from scratch. You don’t have to write every single function or achievement yourself — instead, you can rely on built-in functions, libraries, frameworks, and repositories.</p>
<p>Built-in functions save you time, give you working functions, and are generally managed by a group of developers. More importantly, they boost the performance of your code and software. You can <a target="_blank" href="https://docs.python.org/3/library/functions.html">reference the official Python documentation</a> to see built-in language functions.</p>
<p>Some examples include:</p>
<ul>
<li><p><code>append()</code><strong>:</strong> Takes a single item and adds it to a list, modifying an existing list by adding to it and increasing the list by one</p>
</li>
<li><p><code>eval()</code><strong>:</strong> Evaluates any specified expression as if it’s an official Python statement</p>
</li>
<li><p><code>id()</code><strong>:</strong> Used to reference the unique identity of an object or integer</p>
</li>
<li><p><code>max()</code><strong>:</strong> Returns the maximum value of an iterable from multiple given values</p>
</li>
<li><p><code>print()</code><strong>:</strong> Displays or returns text variables to the Python console</p>
</li>
<li><p><code>round()</code><strong>:</strong> Rounds up a number or value to a given decimal place</p>
</li>
</ul>
<p>Using the most common beginner’s tutorial, when you use the <code>print()</code> function, it looks something like this:</p>
<pre><code class="lang-python">print(“Hello world I am coding.”)
</code></pre>
<p>That will return:</p>
<p>Hello world I am coding</p>
<p>That built-in function will always be recognized regardless of the IDE or coding environment you’re using, which applies to all built-in functions from append() to round().</p>
<p>On the other hand, libraries are numerous and varied — they’re much larger collections of pre-written code or functions. To use or reference libraries and their functions, you merely import them into your Python script. Examples are Requests, FastAPI, Asyncio, aiohttp, Tkinter, and more.</p>
<h2 id="heading-6-fix-code-issues-as-soon-as-possible"><strong>6. Fix Code Issues as Soon as Possible</strong></h2>
<p>When writing code, if you notice something is awry, fix it right then and there. Don’t put it off or wait until you’re testing later. You might misplace the bug or error — and imagine if you cannot find it again. Between <a target="_blank" href="https://codescene.com/blog/measuring-the-business-impact-of-low-code-quality">23%-42% of a developer’s time</a> is wasted due to bad code, which is valuable time you could be spending elsewhere.</p>
<p>Most of all, bugs and errors compound over time, so the longer you leave it, the more likely entire segments of your code will error out or stop working. Many IDEs and linters can help with this process, especially <a target="_blank" href="https://docs.python.org/3/library/logging.html#module-logging">if you’re using the logging module</a> instead of merely printing results.</p>
<p>Python’s logging module tracks events during runtime — when a program is running. Essentially, this allows you to identify problems or errors while testing your code. It may flag warnings pertaining to errors, debugging or code-related events, but it can also help you understand the runtime behavior of your project — all things you might overlook during the writing process.</p>
<p>You can see and analyze user interactions, for example, especially if external users are testing your application. Most importantly, the logging module is an audit tool that’s invaluable once you start testing or running the code you’ve written. Don’t code without it.</p>
<h2 id="heading-practice-makes-perfect"><strong>Practice Makes Perfect</strong></h2>
<p>There are many things to consider when working with Python, and it doesn’t matter how skilled or adept you are. Following Python best practices is always the way to go. But in the end, the best way to learn is always to take a hands-on approach, which means practice.</p>
<p>Continue using Python, even just to create simple or small projects for yourself. Practice using the habits discussed here and writing clean code. You should also read code from other developers to see how they approach the process.</p>
 ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
