<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ Sara Jadhav - freeCodeCamp.org ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ Sara Jadhav - freeCodeCamp.org ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Sat, 23 May 2026 22:19:36 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/author/Eccentric-/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ How to Build a Calculator with Tkinter in Python  ]]>
                </title>
                <description>
                    <![CDATA[ In this tutorial, you'll learn how to create a simple arithmetic calculator in Python with Tkinter. The project will be one of your first steps towards building an actual GUI in Python. This is a hand ]]>
                </description>
                <link>https://www.freecodecamp.org/news/build-a-calculator-with-tkinter-in-python/</link>
                <guid isPermaLink="false">6a07203c99d875f5cd667635</guid>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                    <category>
                        <![CDATA[ GUI ]]>
                    </category>
                
                    <category>
                        <![CDATA[ tkinter ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Sara Jadhav ]]>
                </dc:creator>
                <pubDate>Fri, 15 May 2026 13:31:40 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/uploads/covers/5fc16e412cae9c5b190b6cdd/0ae14c91-3e47-464c-b392-1026321a7764.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>In this tutorial, you'll learn how to create a simple arithmetic calculator in Python with Tkinter. The project will be one of your first steps towards building an actual GUI in Python.</p>
<p>This is a hands-on tutorial, which will help you form your early GUI projects. It's meant for anyone who wants to start building visual projects in Python.</p>
<p>The Tkinter library is a standard built-in Python library which helps us make Graphical User Interfaces in Python. Since it's a built-in library, we don't have to separately install it. So, once you have Python installed on your computer, you just have to set it up and you're good to follow along here.</p>
<p>But keep in mind that Tkinter may not be installed with your Python from the distributor end. To check if it's installed or not, open your command prompt and type:</p>
<pre><code class="language-plaintext">python -m tkinter
</code></pre>
<p>This will open up a Tkinter specimen window if Tkinter is installed and working on your computer.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a href="#heading-what-do-we-want-to-see-in-our-project">What Do We Want to See in Our Project?</a></p>
</li>
<li><p><a href="#heading-how-to-set-up-the-window">How to Set Up the Window</a></p>
</li>
<li><p><a href="#heading-how-to-name-the-window">How to Name the Window</a></p>
</li>
<li><p><a href="#heading-how-to-create-frames-in-the-window">How to Create Frames in the Window</a></p>
</li>
<li><p><a href="#heading-how-to-add-buttons-to-the-window">How to Add Buttons to the Window</a></p>
</li>
<li><p><a href="#heading-how-to-add-the-output-screen-of-the-calculator">How to Add the Output Screen of the Calculator</a></p>
</li>
<li><p><a href="#heading-how-to-make-the-numbers-visible-on-the-output-screen">How to Make the Numbers Visible on the Output Screen</a></p>
</li>
<li><p><a href="#heading-how-to-add-a-scrollbar-to-the-output-screen">How to Add a Scrollbar to the Output Screen</a></p>
</li>
<li><p><a href="#heading-how-to-add-the-equal-to-button">How to Add the Equal To Button</a></p>
</li>
<li><p><a href="#heading-how-to-add-the-ac-button">How to Add the AC Button</a></p>
</li>
<li><p><a href="#heading-wrapping-up">Wrapping Up</a></p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>Before starting, here are some prerequisites for this tutorial which will help you get the most out of it:</p>
<ul>
<li><p>Basic Python Syntax</p>
</li>
<li><p>Understanding of how to import and use libraries and its different functions</p>
</li>
<li><p>Understanding of how to use different attributes of the module</p>
</li>
</ul>
<p>Now that we know what we need to proceed in this tutorial, let's actually dive-in the process!</p>
<p>The first step for building any project is to create a clear-cut idea of what you want to build. Let's look at what we're going to make.</p>
<h2 id="heading-what-do-we-want-to-see-in-our-project">What Do We Want to See in Our Project?</h2>
<p>We're going to build a simple arithmetic calculator. The calculator works as follows:</p>
<ul>
<li><p>It has all the numerals (0, 1, 2, ...., 9) in a keyboard.</p>
</li>
<li><p>It has basic arithmetic (+, -, /, *, =) operators lining the keyboard.</p>
</li>
<li><p>The calculator is non-resizable, that is the user can't extend the width or the height of the application window.</p>
</li>
<li><p>The calculator has a screen above the keyboard which shows the user input and the final answer.</p>
</li>
<li><p>Finally, the calculator has an 'AC' button which stands for 'All Clear' which erases everything on the output screen of the window and allows the user to use it again.</p>
</li>
</ul>
<p>With this, we have a clear idea about what we're going to build.</p>
<p>Also, you can create the UI beforehand and place the widgets accordingly on the window. Here's an image of the UI we'll create in this tutorial:</p>
<img src="https://cdn.hashnode.com/uploads/covers/67e55054a0be57d730442ec0/93d8458d-f829-4edb-9651-622a14f9444a.png" alt="UI of the calculator " style="display:block;margin:0 auto" width="249" height="388" loading="lazy">

<h2 id="heading-how-to-set-up-the-window">How to Set Up the Window</h2>
<p>To set up our main window where we'll later add our widgets, first we need to import the Tkinter library into our program. Then we'll initialize the window using the <code>tk.Tk()</code> function. To display the window on the screen continuously until we quit manually, we'll use the <code>mainloop()</code> function. Here's what the code looks like:</p>
<pre><code class="language-python">import tkinter as tk

# screen initialization
root = tk.Tk()

# This keeps the window active
root.mainloop()
</code></pre>
<p>The<code>root</code> variable represents our window. So, from now on, we'll be adding the widgets to this window.</p>
<p>When a user hits "Run", you'll see a blank window on your screen as shown in the image below. Congrats! This is your first GUI.</p>
<img src="https://cdn.hashnode.com/uploads/covers/67e55054a0be57d730442ec0/ee60f1d8-ea5b-415d-96bf-90941ecd9424.png" alt="Blank tkinter window " style="display:block;margin:0 auto" width="214" height="241" loading="lazy">

<h2 id="heading-how-to-name-the-window">How to Name the Window</h2>
<p>The 'tk' written on the Title Bar is the default title of the window. To set our own window title, we can use the <code>title()</code> function. The following code shows how you can do that:</p>
<pre><code class="language-python">import tkinter as tk

root = tk.Tk()

# Naming the window
root.title("Calculator")

root.mainloop()
</code></pre>
<p>On executing the program, we get the following window:</p>
<img src="https://cdn.hashnode.com/uploads/covers/67e55054a0be57d730442ec0/ef6ca391-3744-4915-9118-4996124adc85.png" alt="Blank window with title changed to 'Calculator'" style="display:block;margin:0 auto" width="302" height="281" loading="lazy">

<p>Now you should be able to see that the title of the window changed successfully.</p>
<h2 id="heading-how-to-create-frames-in-the-window">How to Create Frames in the Window</h2>
<p>After setting up the window, now we have to place the buttons on it. For placing the buttons, we need to create a container in which we'll put them.</p>
<p>The container could be the main window, but we'll avoid that for this project. This is because we want to place some buttons to the side of and below others to create our keyboard. To make it easier, we'll create Frame containers.</p>
<p>A Frame container represents a vertical column of the window. The initial dimension of the frame is 0 x 0. The frame resizes accordingly when we place a widget in it.</p>
<p>We'll create four frames in our window. The first frame will contain the buttons 1, 4, 7, and AC. The second frame will contain the buttons 2, 5, 8, and 0, the third frame will contain the buttons 3, 6, 9, and =, and the last frame will contain the buttons +, -, x, and / (just like in the UI shown above).</p>
<p>We can create frames in Tkinter using <code>tk.Frame()</code>. We'll pass the parent container for the Frame – that is, the main window in its argument. The following code should make it clear:</p>
<pre><code class="language-python">import tkinter as tk

# screen initialization
root = tk.Tk()

# Naming the window
root.title("Calculator")

# Creating Frames
frame1 = tk.Frame(root)
frame1.pack(side='left', anchor='n')
frame2 = tk.Frame(root)
frame2.pack(side='left', anchor='n')
frame3 = tk.Frame(root)
frame3.pack(side='left', anchor='n')
frame4 = tk.Frame(root)
frame4.pack(side='left', anchor='n')

# This keeps the window active
root.mainloop()
</code></pre>
<p>The <code>pack()</code> function embeds the Frame geometrically on the window. The <code>side='left'</code> parameter embeds the Frames to the extreme left of the screen. By default, this is set to the center. <code>anchor='n'</code> tells us that the widgets should be placed starting from the very top of the frame. By default, the widgets start adding from the center of the Frame. The 'n' in the <code>anchor='n'</code> stands for 'North'.</p>
<p>An important thing to note is that, since we defined <code>frame1</code> early in the program, it will occupy the extreme left portion of the window. But even though <code>frame2</code> is also set to occupy extreme left, the two frames <code>frame1</code> and <code>frame2</code> won't overlap. Instead <code>frame2</code> will take a position so that it goes as far left as it can go on the window without overlapping <code>frame1</code>. So frames <code>frame1</code>, <code>frame2</code>, <code>frame3</code> and <code>frame4</code> are side by side on the left side of the window.</p>
<h2 id="heading-how-to-add-buttons-to-the-window">How to Add Buttons to the Window</h2>
<p>We can create a button widget in Tkinter by using the <code>tk.Button()</code> function. The <code>tk.Button()</code> function consists of various parameters:</p>
<ul>
<li><p><strong>master:</strong> This allows us to provide the parent container in which we have to place our button. This expects a container object.</p>
</li>
<li><p><strong>text:</strong> In this parameter, we have to pass the text which we want to display on our button. This expects a string.</p>
</li>
<li><p><strong>font:</strong> This expects a tuple with the first element providing the name of the font and the next element providing the font size.</p>
</li>
<li><p><strong>image:</strong> This allows us to put an image over our button.</p>
</li>
<li><p><strong>bg:</strong> This allows us to set the background colour for our button.</p>
</li>
<li><p><strong>fg:</strong> This allows us to set the foreground colour for our button.</p>
</li>
<li><p><strong>activebackground:</strong> When the button is clicked, the colour passed in this parameter becomes visible.</p>
</li>
<li><p><strong>command:</strong> This allows us to link a command to the button.</p>
</li>
</ul>
<p>Now that we know the basics of creating a button, lets actually create the keyboard of our calculator.</p>
<p>To create the keyboard, we have to put quite a few buttons on the window. To make our work easier, we'll define a function to create our buttons, just with different text. Let's look at the code below:</p>
<pre><code class="language-python">import tkinter as tk

# screen initialization
root = tk.Tk()

# Naming the window
root.title("Calculator")

frame1 = tk.Frame(root)
frame1.pack(side='left', anchor='n')
frame2 = tk.Frame(root)
frame2.pack(side='left', anchor='n')
frame3 = tk.Frame(root)
frame3.pack(side='left', anchor='n')
frame4 = tk.Frame(root)
frame4.pack(side='left', anchor='n')

pixel = tk.PhotoImage(width=55, height=55)

def buttons(text, frame):
    button = tk.Button(frame, text=text, font=('Arial', 20), image=pixel, bg="#333300", fg="white", compound="center")
    return button


def buttons_ops(text, frame, bg, fg):
    button = tk.Button(frame, text=text, font=('Arial', 20), image=pixel, bg=bg, fg=fg, activebackground="black",
                        compound="center")
    return button

btn1 = buttons('1',frame1).pack()
btn4 = buttons('4', frame1).pack()
btn7 = buttons('7', frame1).pack()

btn2 = buttons('2', frame2).pack()
btn5 = buttons('5', frame2).pack()
btn8 = buttons('8', frame2).pack()
btn0 = buttons_ops('0', frame2, '#333300', 'white').pack()

plus = buttons_ops('+', frame4, 'black', 'white').pack()
minus= buttons_ops('-', frame4,  'black', 'white').pack()
mul = buttons_ops('x', frame4, 'black', 'white').pack()
div = buttons_ops('/', frame4, 'black', 'white').pack()

btn3 = buttons('3', frame3).pack()
btn6 = buttons('6', frame3).pack()
btn9 = buttons('9', frame3).pack()

# This keeps the window active
root.mainloop()
</code></pre>
<p>Now let's break it down:</p>
<p>First, we created an Tkinter image object via <code>tk.PhotoImage()</code>. This is a transparent image. The purpose behind creating this image is to set a perfect width and height of the button pixel-wise. The <code>compound='center'</code> ensures that the button text is aligned at the center of the transparent image.</p>
<p>You can change the size of the button by changing the <code>width</code> and <code>height</code> parameters of the <code>pixel</code> object.</p>
<p>Secondly, we created a function which takes the 'text' and the 'container frame' as the argument. Inside the function, we created a button object and returned it. For the numerical buttons, we've created the function <code>buttons</code> whereas for operator buttons, we've created the function <code>buttons_ops</code>. This was done only to ensure different style of buttons (in terms of background and foreground, and so on).</p>
<p>You can change the colours of the buttons by making changes in the <code>bg</code> and <code>fg</code> parameters of the <code>tk.Button()</code> function.</p>
<p>Then we created all the buttons with these two functions. The <code>pack()</code> function puts the buttons in their respective places. Remember that we haven't created the <code>=</code> and <code>AC</code> buttons.</p>
<p>When we execute the program, the following window will pop up:</p>
<img src="https://cdn.hashnode.com/uploads/covers/67e55054a0be57d730442ec0/012c6883-d79b-44fa-8171-0c6cfc837b4e.png" alt="Window with embedded buttons " style="display:block;margin:0 auto" width="269" height="289" loading="lazy">

<p>You can try clicking the buttons to make sure that everything is working great up to this point.</p>
<h2 id="heading-how-to-add-the-output-screen-of-the-calculator">How to Add the Output Screen of the Calculator</h2>
<p>For the output screen of the calculator, we'll be using the <code>Entry</code> object in Tkinter. The <code>Entry</code> object will be the best match in this case because we want a single line screen to showcase the user input. We could also use a <code>Text</code> object, but it provides a multiline area. So here, we'll just be using the <code>Entry</code> object.</p>
<p>Also, since we want the output screen to be on the top of the keyboard, we need to define and embed this object before embedding the frames.</p>
<p>The <code>Entry</code> object is created using the <code>tk.Entry()</code> function. This has similar parameters to the <code>tk.Button()</code> function. The following code creates an entry box:</p>
<pre><code class="language-python">import tkinter as tk

# screen initialization
root = tk.Tk()

# Naming the window
root.title("Calculator")

# creating the output screen
entry = tk.Entry(root, width=9, font=('Arial', 38, 'bold'), state='readonly')
entry.pack(pady=(30, 10))


frame1 = tk.Frame(root)
frame1.pack(side='left', anchor='n')
frame2 = tk.Frame(root)
frame2.pack(side='left', anchor='n')
frame3 = tk.Frame(root)
frame3.pack(side='left', anchor='n')
frame4 = tk.Frame(root)
frame4.pack(side='left', anchor='n')

pixel = tk.PhotoImage(width=55, height=55)

def buttons(text, frame):
    button = tk.Button(frame, text=text, font=('Arial', 20), image=pixel, bg="#333300", fg="white", compound="center")
    return button


def buttons_ops(text, frame, bg, fg):
    button = tk.Button(frame, text=text, font=('Arial', 20), image=pixel, bg=bg, fg=fg, activebackground="black",
                        compound="center")
    return button

btn1 = buttons('1',frame1).pack()
btn4 = buttons('4', frame1).pack()
btn7 = buttons('7', frame1).pack()

btn2 = buttons('2', frame2).pack()
btn5 = buttons('5', frame2).pack()
btn8 = buttons('8', frame2).pack()
btn0 = buttons_ops('0', frame2, '#333300', 'white').pack()

plus = buttons_ops('+', frame4, 'black', 'white').pack()
minus= buttons_ops('-', frame4,  'black', 'white').pack()
mul = buttons_ops('x', frame4, 'black', 'white').pack()
div = buttons_ops('/', frame4, 'black', 'white').pack()

btn3 = buttons('3', frame3).pack()
btn6 = buttons('6', frame3).pack()
btn9 = buttons('9', frame3).pack()

# This keeps the window active
root.mainloop()
</code></pre>
<p>In the code above, we put the parent container of the <code>entry</code> object as the main window <code>root</code>. I set the <code>width</code> parameter to 9 as it fit well with the dimensions of the window and the keyboard. You can try it out with different values for width and set a perfectly sized output screen.</p>
<p>You may have noticed that we didn't use the <code>pack()</code> on the same line as object definition. This is because using <code>pack()</code> on the same line as object definition is a bad practice as it limits certain functionality.</p>
<p>So, why did we use the <code>pack()</code> function on the same line while creating buttons? This is because we didn't work heavily with the buttons in this project, so we attempted to reduce the lines of code.</p>
<p>In the <code>tk.Entry()</code> function, we set <code>state='readonly'</code>. This prohibits any direct text input into the the output screen. That means, we can only use the buttons to show the characters on the output screen. By default, this is set to <code>state='normal'</code>, which allows direct input from the keyboard into the entry box.</p>
<p>The <code>pady</code> parameter inside the <code>pack()</code> function leaves the given amount of pixels above and below the object. To perform such an operation, let's say to pad 10 pixels on both sides of the object, we can write <code>pady=10</code> .</p>
<p>Here, we didn't want the same amount of padding above and below the object. So we used a tuple with first element representing the pixels to pad above the output screen, and the second element representing the pixels to pad below the output screen.</p>
<p>Up until now, our GUI looks as shown below:</p>
<img src="https://cdn.hashnode.com/uploads/covers/67e55054a0be57d730442ec0/dc15d02d-2243-4751-b825-53d5b4061daf.png" alt="Window with embedded output screen" style="display:block;margin:0 auto" width="262" height="389" loading="lazy">

<p>We can now see that the output screen is set perfectly.</p>
<h2 id="heading-how-to-make-the-numbers-visible-on-the-output-screen">How to Make the Numbers Visible on the Output Screen</h2>
<p>Next step is to make characters visible on the output screen. Every button that we click should render on the output screen. For this, we have to link commands to each button. Let's first look at the code and then see how it works:</p>
<pre><code class="language-python">import tkinter as tk

# screen initialization
root = tk.Tk()

# Naming the window
root.title("Calculator")

entry = tk.Entry(root, width=9, font=('Arial', 38, 'bold'),state='readonly')
entry.pack(pady=(30, 10))


frame1 = tk.Frame(root)
frame1.pack(side='left', anchor='n')
frame2 = tk.Frame(root)
frame2.pack(side='left', anchor='n')
frame3 = tk.Frame(root)
frame3.pack(side='left', anchor='n')
frame4 = tk.Frame(root)
frame4.pack(side='left', anchor='n')

pixel = tk.PhotoImage(width=55, height=55)

def command(text):
    entry.config(state='normal')
    entry.insert(tk.END, text) 
    entry.config(state='readonly')  


def buttons(text, frame):
    button = tk.Button(frame, text=text, font=('Arial', 20), image=pixel, bg="#333300", fg="white", compound="center",
                       command=lambda :command(text))
    return button


def buttons_ops(text, frame, bg, fg):
    button = tk.Button(frame, text=text, font=('Arial', 20), image=pixel, bg=bg, fg=fg, activebackground="black",
                        compound="center", command=lambda:command(text))
    return button

btn1 = buttons('1',frame1).pack()
btn4 = buttons('4', frame1).pack()
btn7 = buttons('7', frame1).pack()

btn2 = buttons('2', frame2).pack()
btn5 = buttons('5', frame2).pack()
btn8 = buttons('8', frame2).pack()
btn0 = buttons_ops('0', frame2, '#333300', 'white').pack()

plus = buttons_ops('+', frame4, 'black', 'white').pack()
minus= buttons_ops('-', frame4,  'black', 'white').pack()
mul = buttons_ops('x', frame4, 'black', 'white').pack()
div = buttons_ops('/', frame4, 'black', 'white').pack()

btn3 = buttons('3', frame3).pack()
btn6 = buttons('6', frame3).pack()
btn9 = buttons('9', frame3).pack()

# This keeps the window active
root.mainloop()
</code></pre>
<p>In the code above, we defined a new function called <code>command()</code>. This function takes one argument <code>text</code>. Inside the function, we changed the <code>state</code> of the <code>entry</code> object to <code>normal</code> via <code>config</code>. By doing this, we can now make changes in the text of the <code>entry</code> object.</p>
<p>Then we used the <code>insert()</code> function for the <code>entry</code> object. The <code>insert()</code> function appends the <code>text</code> argument to the existing set of characters.</p>
<p>The first argument of the <code>insert()</code> function takes the index where the text will be inserted. <code>tk.END</code> represents the last character of the text in the object. The second argument of the <code>insert()</code> function takes the text that is to be inserted.</p>
<p>Finally, we change the <code>state</code> of the object again to <code>readonly</code> to prohibit any outside input other than our defined calculator keyboard.</p>
<p>Now let's look at the <code>buttons</code> and the <code>buttons_ops</code> functions. You may have noticed that we've added the <code>command</code> parameter to the <code>tk.Button()</code> function. The <code>lambda</code> tells the program to perform the command only when the button is clicked.</p>
<p>Collectively, <code>command=lambda:command(text)</code> means that, on clicking the buttons which we have defined up until now, it executes the <code>command()</code> function and shows the pressed button character on the output screen.</p>
<p>Now try clicking some buttons on your window. They should appear on the output screen as shown below:</p>
<img src="https://cdn.hashnode.com/uploads/covers/67e55054a0be57d730442ec0/e8cc0cd5-dc0c-4f01-9b21-7dec6874d00f.png" alt="Image of the calculator showing input on the calculator screen" style="display:block;margin:0 auto" width="262" height="395" loading="lazy">

<h2 id="heading-how-to-add-a-scrollbar-to-the-output-screen">How to Add a Scrollbar to the Output Screen</h2>
<p>Now, you might have encountered a problem: when you input a large number of characters, you were able to see only the first few characters. The rest were invisible.</p>
<p>To tackle this, we'll add a scrollbar to the output screen.</p>
<p>First, we'll create a scrollbar object via <code>tk.Scrollbar()</code> before the <code>entry</code> object. The following code shows how:</p>
<pre><code class="language-python">import tkinter as tk

# screen initialization
root = tk.Tk()

# Naming the window
root.title("Calculator")

scrollbar = tk.Scrollbar(root, orient='horizontal')

entry = tk.Entry(root, width=9, font=('Arial', 38, 'bold'), state='readonly', xscrollcommand=scrollbar.set)
entry.pack(pady=(30, 10))

scrollbar.config(command=entry.xview)
scrollbar.pack()

frame1 = tk.Frame(root)
frame1.pack(side='left', anchor='n')
frame2 = tk.Frame(root)
frame2.pack(side='left', anchor='n')
frame3 = tk.Frame(root)
frame3.pack(side='left', anchor='n')
frame4 = tk.Frame(root)
frame4.pack(side='left', anchor='n')

pixel = tk.PhotoImage(width=55, height=55)

def command(text):
    entry.config(state='normal')
    entry.insert(tk.END, text)
    entry.config(state='readonly')


def buttons(text, frame):
    button = tk.Button(frame, text=text, font=('Arial', 20), image=pixel, bg="#333300", fg="white", compound="center",
                       command=lambda :command(text))
    return button


def buttons_ops(text, frame, bg, fg):
    button = tk.Button(frame, text=text, font=('Arial', 20), image=pixel, bg=bg, fg=fg, activebackground="black",
                        compound="center", command=lambda:command(text))
    return button

btn1 = buttons('1',frame1).pack()
btn4 = buttons('4', frame1).pack()
btn7 = buttons('7', frame1).pack()

btn2 = buttons('2', frame2).pack()
btn5 = buttons('5', frame2).pack()
btn8 = buttons('8', frame2).pack()
btn0 = buttons_ops('0', frame2, '#333300', 'white').pack()

plus = buttons_ops('+', frame4, 'black', 'white').pack()
minus= buttons_ops('-', frame4,  'black', 'white').pack()
mul = buttons_ops('x', frame4, 'black', 'white').pack()
div = buttons_ops('/', frame4, 'black', 'white').pack()

btn3 = buttons('3', frame3).pack()
btn6 = buttons('6', frame3).pack()
btn9 = buttons('9', frame3).pack()

# This keeps the window active
root.mainloop()
</code></pre>
<p>The <code>orient</code> parameter in the <code>tk.Scrollbar()</code> object determines the nature of the scrollbar. Here, we've aligned it with the X-axis. We also added a parameter in the original <code>entry</code> object. The <code>xscrollcommand</code> sets the scrollbar to the output screen.</p>
<p>Then we connected the scrollbar to the entry object by setting <code>command=entry.xview</code> and embedded the scrollbar in the output screen.</p>
<p>The following image shows the scrollbar. You can use the arrow signs to navigate forward or backward through the text:</p>
<img src="https://cdn.hashnode.com/uploads/covers/67e55054a0be57d730442ec0/29d0fd37-a94f-4bad-af24-6627055fd4b3.png" alt="Image of calculator with the scrollbar" style="display:block;margin:0 auto" width="283" height="429" loading="lazy">

<h2 id="heading-how-to-add-the-equal-to-button">How to Add the Equal To Button</h2>
<p>We haven't yet made the <code>equal to</code> button – so let's do that now. To start, we'll define a function called <code>cmd_equal()</code>. In this function, we'll first change the <code>state</code> of the <code>entry</code> to <code>normal</code>. Then we'll extract the text in the output screen using the <code>entry.get()</code> function and replace 'x' by '*'. We do this because multiplication is represented by '*' and not 'x'.</p>
<p>Then we'll add a <code>try-except</code> block. We'll try to evaluate the mathematical expression that we extracted using Python's built-in <code>eval()</code> function. If that's invalid, instead of throwing an error, we'll output 'Invalid' onto our screen.</p>
<pre><code class="language-python">import tkinter as tk

# screen initialization
root = tk.Tk()

# Naming the window
root.title("Calculator")

scrollbar = tk.Scrollbar(root, orient='horizontal')

entry = tk.Entry(root, width=9, font=('Arial', 38, 'bold'), state='readonly', xscrollcommand=scrollbar.set)
entry.pack(pady=(30, 10))

scrollbar.config(command=entry.xview)
scrollbar.pack()

frame1 = tk.Frame(root)
frame1.pack(side='left', anchor='n')
frame2 = tk.Frame(root)
frame2.pack(side='left', anchor='n')
frame3 = tk.Frame(root)
frame3.pack(side='left', anchor='n')
frame4 = tk.Frame(root)
frame4.pack(side='left', anchor='n')

pixel = tk.PhotoImage(width=55, height=55)

def command(text):
    entry.config(state='normal')
    entry.insert(tk.END, text)
    entry.config(state='readonly')

def cmd_equal():
    entry.config(state='normal')
    txt = entry.get().replace('x', '*')

    try:
        result = eval(txt)

    except:
        result = 'INVALID'
    entry.delete(0, tk.END)
    entry.insert(tk.END, result)
    entry.config(state='readonly')   


def buttons(text, frame):
    button = tk.Button(frame, text=text, font=('Arial', 20), image=pixel, bg="#333300", fg="white", compound="center",
                       command=lambda :command(text))
    return button


def buttons_ops(text, frame, bg, fg):
    button = tk.Button(frame, text=text, font=('Arial', 20), image=pixel, bg=bg, fg=fg, activebackground="black",
                        compound="center", command=lambda:command(text))
    return button

btn1 = buttons('1',frame1).pack()
btn4 = buttons('4', frame1).pack()
btn7 = buttons('7', frame1).pack()

btn2 = buttons('2', frame2).pack()
btn5 = buttons('5', frame2).pack()
btn8 = buttons('8', frame2).pack()
btn0 = buttons_ops('0', frame2, '#333300', 'white').pack()

plus = buttons_ops('+', frame4, 'black', 'white').pack()
minus= buttons_ops('-', frame4,  'black', 'white').pack()
mul = buttons_ops('x', frame4, 'black', 'white').pack()
div = buttons_ops('/', frame4, 'black', 'white').pack()

btn3 = buttons('3', frame3).pack()
btn6 = buttons('6', frame3).pack()
btn9 = buttons('9', frame3).pack()
equal= tk.Button(frame3, text='=', font=('Arial', 20), image=pixel, bg='white', fg='black', activebackground="black",
                        compound="center", command=lambda: cmd_equal()).pack()

# This keeps the window active
root.mainloop()
</code></pre>
<p>Here, we've also used <code>entry.delete()</code>. This function will delete all the text on the output screen from the first argument's index (that is from the 0th index) to the last argument's index, that is to the end of the text (represented by <code>tk.END</code>).</p>
<p>Then we inserted our result onto the output screen using <code>entry.insert()</code>. An important thing to note is that we've embedded the <code>equal to</code> button below the definition of <code>btn9</code> in the same frame. This puts our <code>equal to</code> button in just the right place.</p>
<p>The following images show the initial and final screens, respectively.</p>
<img src="https://cdn.hashnode.com/uploads/covers/67e55054a0be57d730442ec0/8051ce51-5139-4668-97e2-b5c742c687a1.png" alt="Calculator window showing mathematical expression " style="display:block;margin:0 auto" width="270" height="407" loading="lazy">

<p>On clicking the equal to button:</p>
<img src="https://cdn.hashnode.com/uploads/covers/67e55054a0be57d730442ec0/1ddc6b4c-3e51-4237-b9f6-90986bff1963.png" alt="Calculator window showing evaluated mathematical expression" style="display:block;margin:0 auto" width="267" height="407" loading="lazy">

<h2 id="heading-how-to-add-the-ac-button">How to Add the AC Button</h2>
<p>Now finally, we'll define our last function: <code>cmd_ac()</code>. This function will delete everything on the output screen. We'll do this by first changing the <code>state</code> to <code>normal</code>, then using <code>entry.delete()</code>, and lastly changing the <code>state</code> back to <code>readonly</code>. Then we'll put this function in the <code>command()</code> parameter of the <code>ac</code> button.</p>
<p>To keep the UI from dismantling when we expand the window, we'll use the <code>resizable()</code> function. This functions takes two arguments: one corresponds to the permission to expand the width and the other to the height. To prohibit expansion of the window, we'll set both the parameters to <code>False</code>.</p>
<p>So the final code will be:</p>
<pre><code class="language-python">import tkinter as tk

# screen initialization
root = tk.Tk()

# Naming the window
root.title("Calculator")

scrollbar = tk.Scrollbar(root, orient='horizontal')

entry = tk.Entry(root, width=9, font=('Arial', 38, 'bold'), state='readonly', xscrollcommand=scrollbar.set)
entry.pack(pady=(30, 10))

scrollbar.config(command=entry.xview)
scrollbar.pack()

frame1 = tk.Frame(root)
frame1.pack(side='left', anchor='n')
frame2 = tk.Frame(root)
frame2.pack(side='left', anchor='n')
frame3 = tk.Frame(root)
frame3.pack(side='left', anchor='n')
frame4 = tk.Frame(root)
frame4.pack(side='left', anchor='n')

pixel = tk.PhotoImage(width=55, height=55)

def command(text):
    entry.config(state='normal')
    entry.insert(tk.END, text)
    entry.config(state='readonly')

def cmd_ac():
    entry.config(state='normal')
    entry.delete(0, tk.END)
    entry.config(state='readonly')

def cmd_equal():
    entry.config(state='normal')
    txt = entry.get().replace('x', '*')

    try:
        result = eval(txt)

    except:
        result = 'INVALID'
    entry.delete(0, tk.END)
    entry.insert(tk.END, result)
    entry.config(state='readonly')


def buttons(text, frame):
    button = tk.Button(frame, text=text, font=('Arial', 20), image=pixel, bg="#333300", fg="white", compound="center",
                       command=lambda :command(text))
    return button


def buttons_ops(text, frame, bg, fg):
    button = tk.Button(frame, text=text, font=('Arial', 20), image=pixel, bg=bg, fg=fg, activebackground="black",
                        compound="center", command=lambda:command(text))
    return button

btn1 = buttons('1',frame1).pack()
btn4 = buttons('4', frame1).pack()
btn7 = buttons('7', frame1).pack()
ac = tk.Button(frame1, text="AC", font=('Arial', 20), image=pixel, bg="#666699", fg="white", compound="center",
                        command=lambda: cmd_ac()).pack()

btn2 = buttons('2', frame2).pack()
btn5 = buttons('5', frame2).pack()
btn8 = buttons('8', frame2).pack()
btn0 = buttons_ops('0', frame2, '#333300', 'white').pack()

plus = buttons_ops('+', frame4, 'black', 'white').pack()
minus= buttons_ops('-', frame4,  'black', 'white').pack()
mul = buttons_ops('x', frame4, 'black', 'white').pack()
div = buttons_ops('/', frame4, 'black', 'white').pack()

btn3 = buttons('3', frame3).pack()
btn6 = buttons('6', frame3).pack()
btn9 = buttons('9', frame3).pack()
equal= tk.Button(frame3, text='=', font=('Arial', 20), image=pixel, bg='white', fg='black', activebackground="black",
                        compound="center", command=lambda: cmd_equal()).pack()


root.resizable(0,0) 
# This keeps the window active
root.mainloop()
</code></pre>
<p>When we hit run, this should display our final project.</p>
<h2 id="heading-wrapping-up">Wrapping Up</h2>
<p>So now you know how to build a simple arithmetic calculator. To strengthen and build upon the concepts that you learned here, you can try to add some more functionality to this calculator. Here are some ideas for you to practice the things learnt here:</p>
<ul>
<li><p>Adding a decimal point button to the calculator to allow users work with fractional numbers.</p>
</li>
<li><p>Adding percentage button to the calculator to allow users calculate percentages.</p>
</li>
<li><p>Adding a delete button to the calculator which, instead of clearing entire screen, deletes one character at a time.</p>
</li>
<li><p>Making the calculator 'computer keyboard interactive', that is, allowing input directly from the computer keyboard. (Hint for this task: changing the <code>state</code> of the <code>entry</code> object to <code>normal</code>, and adding conditions for 'invalid' expressions).</p>
</li>
</ul>
<p>Thanks for reading!</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Use the Polars Library in Python for Data Analysis ]]>
                </title>
                <description>
                    <![CDATA[ In this article, I’ll give you a beginner-friendly introduction to the Polars library in Python. Polars is an open-source library, originally written in Rust, which makes data wrangling easier in Python. The syntax of Polars is very similar to Pandas... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-use-the-polars-library-in-python-for-data-analysis/</link>
                <guid isPermaLink="false">6939b88a5a4b3354fde8c07b</guid>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Python 3 ]]>
                    </category>
                
                    <category>
                        <![CDATA[ python beginner ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Polars ]]>
                    </category>
                
                    <category>
                        <![CDATA[ Programming Blogs ]]>
                    </category>
                
                    <category>
                        <![CDATA[ dataset ]]>
                    </category>
                
                    <category>
                        <![CDATA[ dataframe ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Sara Jadhav ]]>
                </dc:creator>
                <pubDate>Wed, 10 Dec 2025 18:14:34 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1765325732081/94ab547b-fdaf-41bb-ae60-ad03be31211a.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>In this article, I’ll give you a beginner-friendly introduction to the Polars library in Python.</p>
<p>Polars is an open-source library, originally written in Rust, which makes data wrangling easier in Python. The syntax of Polars is very similar to Pandas, so if you’ve worked with Pandas or the PySpark library before, using Polars should be a breeze.</p>
<p>Polars excels at giving fast results. It’s also memory efficient and helps you optimize your code using parallelism. It also lets you convert data from and to various libraries like NumPy, Pandas, and others.</p>
<p>In this tutorial, we’ll be learning about the Polars Library from absolute scratch, from installing and importing the library on the system, to manipulating data in a dataset with the help of this library.</p>
<p>First, we’ll look at Polars basic functions. We’ll be also writing some practical code, which will help you apply what you’ve learned. Finally, we’ll be working with an example dataset to solidify some more key Polars concepts. Let’s dive in.</p>
<h2 id="heading-table-of-contents">Table of Contents</h2>
<ul>
<li><p><a class="post-section-overview" href="#heading-prerequisites">Prerequisites</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-installing-and-importing-the-polars-library">Installing and Importing the Polars Library</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-is-a-series">What is a Series?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-what-is-a-dataframe">What is a DataFrame?</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-read-csv-files-with-polars">How to Read CSV Files with Polars</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-some-other-important-functions">Some other Important Functions</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-summary">Summary</a></p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>Even though this tutorial is beginner-friendly, having some basic knowledge of the following areas will help you understand this article better:</p>
<ul>
<li><p>Basic Python syntax</p>
</li>
<li><p>Data structures</p>
</li>
<li><p>Ability to import libraries and knowledge of using functions and methods</p>
</li>
<li><p>Basics of NumPy and Pandas will come in handy (not necessary).</p>
</li>
</ul>
<p>Now, that you’re aware of the prior requirements to follow along, let’s get started with our tutorial.</p>
<h2 id="heading-installing-and-importing-the-polars-library">Installing and Importing the Polars Library</h2>
<p>To install the Polars library, you can use the following command in your terminal:</p>
<p><code>pip install polars</code></p>
<p>Now, this works if you already have the pip package manager on your system. If you’re on a conda environment, you can work with this:</p>
<p><code>conda install -c conda-forge polars</code></p>
<p>But I strongly recommend using the pip package manager to avoid various inconveniences.</p>
<p>Let’s import Polars in our program. We’ll follow the same process as we use for importing other libraries in Python:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> polars <span class="hljs-keyword">as</span> pl <span class="hljs-comment"># pl is a conventional alias</span>
</code></pre>
<p>While creating a Polars object with the data, it’s important to know the size of our data. Polars has the capacity to have 2³² rows in the DataFrame. To load more data, use the following command to install the Polars library:</p>
<p><code>pip install polars[rt64]</code></p>
<p>If you want to use the Polars library right away without actually installing it on your system, using a Google Colab notebook is the best option. When using a Google Colab Notebook, you can directly import and start using Polars in your program. I’ll be using Google Colab Notebook for this tutorial.</p>
<h2 id="heading-what-is-a-series">What is a Series?</h2>
<p>A series is a fundamental element of a DataFrame. It’s a 1-dimensional data-structure that you can correlate with a ‘list’ in Python or a ‘1-D array’ in NumPy. But the difference between a series and a 1-D array is that the former is labeled while the later is not. Many series come together to form a DataFrame.</p>
<p>We can create a series with homogenous data as well as heterogenous data.</p>
<h3 id="heading-creating-a-series-with-homogenous-data">Creating a Series with Homogenous Data</h3>
<p>In a series, the datatype of all the elements should be the same. If it’s not, an error is thrown.</p>
<p>The syntax to define a Polars series is as follows:</p>
<p><code>var_name = pl.Series(“column_name”, [values])</code></p>
<p>The following code shows an example of a homogenous series definition in Python:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> polars <span class="hljs-keyword">as</span> pl
series_homo = pl.Series(<span class="hljs-string">"Numbers"</span>, [<span class="hljs-string">'One'</span>, <span class="hljs-string">'Two'</span>, <span class="hljs-string">'Three'</span>, <span class="hljs-string">'Four'</span>, <span class="hljs-string">'Five'</span>])
print(series_homo)
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-plaintext">shape: (5,)
Series: 'Numbers' [str]
[
    "One"
    "Two"
    "Three"
    "Four"
    "Five"
]
</code></pre>
<p>In the above code, we first imported the Polars library using the <code>pl</code> alias to start using it throughout the code. Using aliases is a matter of choice, but <code>pl</code> is a conventional one (like <code>np</code> for NumPy and <code>pd</code> for Pandas). The benefit of using conventional aliases is that when you hand over the code to someone else, it’s easy for them to follow along.</p>
<p>Next, we used the <code>pl.Series()</code> function to create a Polars series object. As its first parameter, we passed the label for our series (<code>Numbers</code> in this case). Then we passed the values to be stores in the form of a list. Remember that the list of values that we pass acts as a single argument. Finally, we printed our series.</p>
<p>We can see that the output tells us about the dimensions of the the Polars object as well as the datatype of the series. The shape (rows, columns) tells us about the the number of rows and columns present in the Polars object.</p>
<p>We can find the data-type of a homogenous series explicitly by using the <code>dtype</code> method.</p>
<pre><code class="lang-python">print(series_homo.dtype)
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-plaintext">String
</code></pre>
<h3 id="heading-creating-a-series-with-heterogenous-data">Creating a Series with Heterogenous Data</h3>
<p>Heterogenous data means that the data-type of all the elements is not the same. The syntax to define a series with heterogenous data is as follows:</p>
<p><code>var_name = pl.Series(“Column_name”, [values], strict=False)</code></p>
<p>So you’re probably wondering, based on what I said above: how can we have a series with heterogenous data? Well, one thing to note is that a series is always homogenous irrespective of the data that is fed to it. I’ll explain below - first let’s look at this code:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> polars <span class="hljs-keyword">as</span> pl

series_hetero = pl.Series(<span class="hljs-string">"Numbers"</span>, [<span class="hljs-number">1</span>, <span class="hljs-string">"Two"</span>, <span class="hljs-number">3</span>, <span class="hljs-string">"Four"</span>], strict=<span class="hljs-literal">False</span>)
print(series_hetero)
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-plaintext">shape: (4,)
Series: 'Numbers' [str]
[
    "1"
    "Two"
    "3"
    "Four"
]
</code></pre>
<p>Here, we created a series object using the <code>pl.Series()</code> function, labelled it, and passed the values that we want in our series.</p>
<p>But you’ll notice that we have provided heterogenous data (data that doesn’t have the same datatype) to the function. Usually, this throws an error. But as we have set the <code>strict</code> parameter as False, the function now becomes lenient with the schema of the series. (The schema is just the expected data-type of the values that are to be recorded in the series.)</p>
<p>If no particular schema is defined for a series that’s fed heterogenous data, <code>pl.Series()</code> sets the schema to <code>pl.Utf8</code> (string datatype). You can see this automatic fixing of the schema in the above example. This prevents the program from bugging, as a string datatype can comprehend characters – numbers as well as symbols.</p>
<p>Also, we can see that datatype of all elements is the same (<code>pl.Utf8</code>). This means that the series is homogenous, even though we put heterogenous data in it.</p>
<p>If we define a schema for the series, then the Polars library converts all the records – which show a different datatype than the defined schema – to null objects. This should be clear in the following example:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> polars <span class="hljs-keyword">as</span> pl
<span class="hljs-comment"># defined the schema as Integer bit 32</span>
series = pl.Series(<span class="hljs-string">"ints"</span>, [<span class="hljs-number">1</span>, <span class="hljs-number">-2</span>, <span class="hljs-number">3</span>, <span class="hljs-number">4</span>, <span class="hljs-number">5</span>, <span class="hljs-string">'Thirteen'</span>, <span class="hljs-string">'Fourteen'</span>], dtype=pl.Int32, strict=<span class="hljs-literal">False</span>)
print(series)
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-plaintext">shape: (7,)
Series: 'ints' [i32]
[
    1
    -2
    3
    4
    5
    null
    null
]
</code></pre>
<p>Here, we can see that the last two entities were ‘String’, but since we set the schema as ‘Integer’, they were reflected as null records.</p>
<p>So as you can see, the leniency of the program depends on whether you set the <code>strict</code> parameter to True of False. If we set it as True, we enforce the schema to the data strictly. Upon failing to obey the schema, the program raises an exception. On the other hand, if we set the <code>strict</code> parameter as False, the series still preserves its homogenous nature by turning schema-disobeying elements to null.</p>
<p>Now that you understand how series work, we’re ready to move on to DataFrames.</p>
<h2 id="heading-what-is-a-dataframe">What is a DataFrame?</h2>
<p>A DataFrame is a two-dimensional data structure that you can use to store large numbers of related parameters of the collected data. It’s also useful for analyzing that data. A DataFrame is nothing more than the collection of many series, each labelled differently to store different aspects of data.</p>
<p>Here’s the syntax to create a Polars DataFrame object:</p>
<p><code>var_name = pl.DataFrame({key: value pairs}, schema)</code></p>
<p>The following example shows you how to define a DataFrame object in Python:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> polars <span class="hljs-keyword">as</span> pl
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np

schema = {<span class="hljs-string">"Number"</span>: pl.UInt32, <span class="hljs-string">"Natural Log"</span>: <span class="hljs-literal">None</span>, <span class="hljs-string">"Log Base 10"</span>: <span class="hljs-literal">None</span>}

df = pl.DataFrame(
    {
        <span class="hljs-string">"Number"</span> : np.arange(<span class="hljs-number">1</span>, <span class="hljs-number">11</span>),
        <span class="hljs-string">"Natural Log"</span> : [np.log(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>,<span class="hljs-number">11</span>)],
        <span class="hljs-string">'Log Base 10'</span> : [np.log10(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>,<span class="hljs-number">11</span>)]
        },
    schema=schema
    )
print(df)
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-plaintext">shape: (10, 3)
┌────────┬─────────────┬─────────────┐
│ Number ┆ Natural Log ┆ Log Base 10 │
│ ---    ┆ ---         ┆ ---         │
│ u32    ┆ f64         ┆ f64         │
╞════════╪═════════════╪═════════════╡
│ 1      ┆ 0.0         ┆ 0.0         │
│ 2      ┆ 0.693147    ┆ 0.30103     │
│ 3      ┆ 1.098612    ┆ 0.477121    │
│ 4      ┆ 1.386294    ┆ 0.60206     │
│ 5      ┆ 1.609438    ┆ 0.69897     │
│ 6      ┆ 1.791759    ┆ 0.778151    │
│ 7      ┆ 1.94591     ┆ 0.845098    │
│ 8      ┆ 2.079442    ┆ 0.90309     │
│ 9      ┆ 2.197225    ┆ 0.954243    │
│ 10     ┆ 2.302585    ┆ 1.0         │
└────────┴─────────────┴─────────────┘
</code></pre>
<p>Above, we created a Polars DataFrame object with the <code>pl.DataFrame()</code> function. In the function, we created a dictionary as an argument for passing the values of the DataFrame.</p>
<p>In the dictionary, each key-value pair represents a series. Each key represents the label of the series, whereas its value represent the values of the series. The values are passed in the form of a list as each key can map to only one value.</p>
<p>Then we defined the schema for the DataFrame. Again, the schema is a dictionary, where each key-value pair corresponds to the schema of the series. In the schema, every key represents the label of the series (to map the schema to the correct series) and its value represents the schema.</p>
<p>In the output, we can see that we got a nice table representing our data. The labels are neatly separated from the data and below them, their schema is also represented.</p>
<h3 id="heading-what-is-a-schema">What is a Schema?</h3>
<p>A schema refers to the definition of the datatype of the series. We fix a particular datatype to the homogenous series to avoid getting in mixed-data.</p>
<p>For example, in the above code, we set the datatype of the column <code>Number</code> to <code>Unsigned Integer - 32 bit (pl.UInt32)</code> as we don’t want to put negative integers in our NumPy logarithm function.</p>
<p>Now, if we want to hide the datatype (that’s written below each label), we can use the following function:</p>
<pre><code class="lang-python">pl.Config.set_tbl_hide_column_data_types(active=<span class="hljs-literal">True</span>)
</code></pre>
<h3 id="heading-the-head-tail-and-glimpse-functions">The Head, Tail, and Glimpse Functions</h3>
<p>The <code>head()</code>, <code>tail()</code> and <code>glimpse()</code> functions are used to have a quick look at the data by reviewing certain records (rows). These are useful especially for large datasets for taking a look at the data, for example to see which columns are present, what type of data is present in each column, and so on.</p>
<p>The <code>head()</code> function prints the given number of rows (passed as the argument of the <code>head()</code> function) from the top of the DataFrame. If no argument is passed, it prints the first five rows of the DataFrame.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> polars <span class="hljs-keyword">as</span> pl
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np

schema = {<span class="hljs-string">"Number"</span>: pl.UInt32, <span class="hljs-string">"Natural Log"</span>: <span class="hljs-literal">None</span>, <span class="hljs-string">"Log Base 10"</span>: <span class="hljs-literal">None</span>}

df = pl.DataFrame(
    {
        <span class="hljs-string">"Number"</span> : np.arange(<span class="hljs-number">1</span>, <span class="hljs-number">11</span>),
        <span class="hljs-string">"Natural Log"</span> : [np.log(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>,<span class="hljs-number">11</span>)],
        <span class="hljs-string">'Log Base 10'</span> : [np.log10(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>,<span class="hljs-number">11</span>)]
        },
    schema=schema
    )
pl.Config.set_tbl_hide_column_data_types(active=<span class="hljs-literal">True</span>)
print(df.head(<span class="hljs-number">3</span>))
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-plaintext">shape: (3, 3)
┌────────┬─────────────┬─────────────┐
│ Number ┆ Natural Log ┆ Log Base 10 │
╞════════╪═════════════╪═════════════╡
│ 1      ┆ 0.0         ┆ 0.0         │
│ 2      ┆ 0.693147    ┆ 0.30103     │
│ 3      ┆ 1.098612    ┆ 0.477121    │
└────────┴─────────────┴─────────────┘
</code></pre>
<p>In this example, we have the used the same DataFrame that we just created. Then we used the <code>head()</code> function to output the first three rows of the DataFrame. Also, you may now notice that the schema representation under column names has disappeared. This is because we used <code>pl.Config.set_tbl_hide_column_data_types(active=True)</code>.</p>
<p>The <code>glimpse()</code> function presents the data briefly and in a horizontal manner (rows are represented as columns and columns are represented as rows) for better readability.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> polars <span class="hljs-keyword">as</span> pl
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np

schema = {<span class="hljs-string">"Number"</span>: pl.UInt32, <span class="hljs-string">"Natural Log"</span>: <span class="hljs-literal">None</span>, <span class="hljs-string">"Log Base 10"</span>: <span class="hljs-literal">None</span>}

df = pl.DataFrame(
    {
        <span class="hljs-string">"Number"</span> : np.arange(<span class="hljs-number">1</span>, <span class="hljs-number">11</span>),
        <span class="hljs-string">"Natural Log"</span> : [np.log(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>,<span class="hljs-number">11</span>)],
        <span class="hljs-string">'Log Base 10'</span> : [np.log10(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>,<span class="hljs-number">11</span>)]
        },
    schema=schema
    )
pl.Config.set_tbl_hide_column_data_types(active=<span class="hljs-literal">True</span>)
print(df.glimpse())
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-plaintext">Rows: 10
Columns: 3
$ Number      &lt;u32&gt; 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
$ Natural Log &lt;f64&gt; 0.0, 0.6931471805599453, 1.0986122886681098, 1.3862943611198906, 1.6094379124341003, 1.791759469228055, 1.9459101490553132, 2.0794415416798357, 2.1972245773362196, 2.302585092994046
$ Log Base 10 &lt;f64&gt; 0.0, 0.3010299956639812, 0.47712125471966244, 0.6020599913279624, 0.6989700043360189, 0.7781512503836436, 0.8450980400142568, 0.9030899869919435, 0.9542425094393249, 1.0

None
</code></pre>
<p>Here, we used the <code>glimpse()</code> function on our previously created DataFrame <code>df</code>. We can see the output as our transposed DataFrame. Also, <code>None</code> is returned. This is because, by default, <code>glimpse()</code> sets its <code>return_as_string</code> parameter to <code>None</code>. To change it to string, we can set the <code>return_as_string</code> parameter to True. The following example shows how to do it:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> polars <span class="hljs-keyword">as</span> pl
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np

schema = {<span class="hljs-string">"Number"</span>: pl.UInt32, <span class="hljs-string">"Natural Log"</span>: <span class="hljs-literal">None</span>, <span class="hljs-string">"Log Base 10"</span>: <span class="hljs-literal">None</span>}

df = pl.DataFrame(
    {
        <span class="hljs-string">"Number"</span> : np.arange(<span class="hljs-number">1</span>, <span class="hljs-number">11</span>),
        <span class="hljs-string">"Natural Log"</span> : [np.log(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>,<span class="hljs-number">11</span>)],
        <span class="hljs-string">'Log Base 10'</span> : [np.log10(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>,<span class="hljs-number">11</span>)]
        },
    schema=schema
    )
pl.Config.set_tbl_hide_column_data_types(active=<span class="hljs-literal">True</span>)
print(<span class="hljs-string">f'Returned as String: \n<span class="hljs-subst">{df.glimpse(return_as_string=<span class="hljs-literal">True</span>)}</span>'</span>)
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-plaintext">Returned as String: 
Rows: 10
Columns: 3
$ Number      &lt;u32&gt; 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
$ Natural Log &lt;f64&gt; 0.0, 0.6931471805599453, 1.0986122886681098, 1.3862943611198906, 1.6094379124341003, 1.791759469228055, 1.9459101490553132, 2.0794415416798357, 2.1972245773362196, 2.302585092994046
$ Log Base 10 &lt;f64&gt; 0.0, 0.3010299956639812, 0.47712125471966244, 0.6020599913279624, 0.6989700043360189, 0.7781512503836436, 0.8450980400142568, 0.9030899869919435, 0.9542425094393249, 1.0
</code></pre>
<p>In the above code, we can see that the DataFrame is returned as a string and <code>None</code> is not returned.</p>
<p>Finally, the <code>tail()</code> function outputs the given number of rows (passed as the argument of the <code>tail()</code> function) from the bottom of the dataset. When no argument is passed, it outputs the last 5 rows by default.</p>
<p>This is useful for checking if our data was completely loaded. Checking the first few records using the <code>head()</code> function and the last few records with the <code>tail()</code> function ensures that the data is correctly and totally loaded.</p>
<p>Also, we can check if there are any empty records at the end of the dataset. Having empty records at the end of the dataset can be fatal in some cases. For example, if you have to train an ML model on a dataset and you split the dataset statically into testing and training datasets, the empty rows at the end are going to cause an issue. So, checking our data beforehand is a best practice, and these functions help us do it.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> polars <span class="hljs-keyword">as</span> pl
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np

schema = {<span class="hljs-string">"Number"</span>: pl.UInt32, <span class="hljs-string">"Natural Log"</span>: <span class="hljs-literal">None</span>, <span class="hljs-string">"Log Base 10"</span>: <span class="hljs-literal">None</span>}

df = pl.DataFrame(
    {
        <span class="hljs-string">"Number"</span> : np.arange(<span class="hljs-number">1</span>, <span class="hljs-number">11</span>),
        <span class="hljs-string">"Natural Log"</span> : [np.log(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>,<span class="hljs-number">11</span>)],
        <span class="hljs-string">'Log Base 10'</span> : [np.log10(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>,<span class="hljs-number">11</span>)]
        },
    schema=schema
    )
pl.Config.set_tbl_hide_column_data_types(active=<span class="hljs-literal">True</span>)
print(df.tail(<span class="hljs-number">3</span>))
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-plaintext">shape: (3, 3)
┌────────┬─────────────┬─────────────┐
│ Number ┆ Natural Log ┆ Log Base 10 │
╞════════╪═════════════╪═════════════╡
│ 8      ┆ 2.079442    ┆ 0.90309     │
│ 9      ┆ 2.197225    ┆ 0.954243    │
│ 10     ┆ 2.302585    ┆ 1.0         │
└────────┴─────────────┴─────────────┘
</code></pre>
<p>In the above code, we used the <code>tail()</code> function on the dataset (that we created earlier) and passed ‘3’ as our argument. Thus our program returned the last three rows of the dataset.</p>
<h3 id="heading-the-sample-function">The Sample Function</h3>
<p>The <code>sample()</code> function returns a given number of random rows in random order based on their occurrence in the DataFrame. This helps to avoid biased sampling of data.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> polars <span class="hljs-keyword">as</span> pl
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np

schema = {<span class="hljs-string">"Number"</span>: pl.UInt32, <span class="hljs-string">"Natural Log"</span>: <span class="hljs-literal">None</span>, <span class="hljs-string">"Log Base 10"</span>: <span class="hljs-literal">None</span>}

df = pl.DataFrame(
    {
        <span class="hljs-string">"Number"</span> : np.arange(<span class="hljs-number">1</span>, <span class="hljs-number">11</span>),
        <span class="hljs-string">"Natural Log"</span> : [np.log(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>,<span class="hljs-number">11</span>)],
        <span class="hljs-string">'Log Base 10'</span> : [np.log10(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>,<span class="hljs-number">11</span>)]
        },
    schema=schema
    )
pl.Config.set_tbl_hide_column_data_types(active=<span class="hljs-literal">True</span>)
print(df.sample(<span class="hljs-number">3</span>))
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-plaintext">shape: (3, 3)
┌────────┬─────────────┬─────────────┐
│ Number ┆ Natural Log ┆ Log Base 10 │
╞════════╪═════════════╪═════════════╡
│ 6      ┆ 1.791759    ┆ 0.778151    │
│ 5      ┆ 1.609438    ┆ 0.69897     │
│ 10     ┆ 2.302585    ┆ 1.0         │
└────────┴─────────────┴─────────────┘
</code></pre>
<p>We can see in the output that we got random rows of the data in a random order of their occurrence in the dataset (row 5 comes before row 6 in the DataFrame, yet by sampling we got row 5 after row 6.) Sampling is a good practice as it helps avoid overfitting in ML in some cases and gives us a general idea about the entire dataset.</p>
<h3 id="heading-concatenating-two-dataframes">Concatenating Two DataFrames</h3>
<p>In a nutshell, ‘concatenating’ simply means ‘linking’. Adding or linking one dataset to another – basically, stacking one on top of another – is concatenating the two datasets.</p>
<p>For example, in the previous DataFrame, we had numbers from 1 to 10 and their logarithms. Now, if we want to make it 1 to 20, we have to concatenate a different dataset containing numbers 11 to 20 to the former dataset.</p>
<p>The following code shows how this works:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> polars <span class="hljs-keyword">as</span> pl
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np

schema = {<span class="hljs-string">"Number"</span>: pl.UInt32, <span class="hljs-string">"Natural Log"</span>: <span class="hljs-literal">None</span>, <span class="hljs-string">"Log Base 10"</span>: <span class="hljs-literal">None</span>}

df = pl.DataFrame(
    {
        <span class="hljs-string">"Number"</span> : np.arange(<span class="hljs-number">1</span>, <span class="hljs-number">11</span>),
        <span class="hljs-string">"Natural Log"</span> : [np.log(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>,<span class="hljs-number">11</span>)],
        <span class="hljs-string">'Log Base 10'</span> : [np.log10(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>,<span class="hljs-number">11</span>)]
        },
    schema=schema
    )
pl.Config.set_tbl_hide_column_data_types(active=<span class="hljs-literal">True</span>)

<span class="hljs-comment"># new dataset created for concatenation</span>
df1 = pl.DataFrame({
    <span class="hljs-string">"Number"</span> : [x <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">11</span>, <span class="hljs-number">21</span>)],
    <span class="hljs-string">"Log Base 10"</span> : [np.log10(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">11</span>,<span class="hljs-number">21</span>)],
    <span class="hljs-string">"Natural Log"</span> : [np.log(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">11</span>, <span class="hljs-number">21</span>)]
}, schema=schema)

print(pl.concat([df, df1], how=<span class="hljs-string">'vertical'</span>)) <span class="hljs-comment"># concatenating the two datasets</span>
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-plaintext">shape: (20, 3)
┌────────┬─────────────┬─────────────┐
│ Number ┆ Natural Log ┆ Log Base 10 │
╞════════╪═════════════╪═════════════╡
│ 1      ┆ 0.0         ┆ 0.0         │
│ 2      ┆ 0.693147    ┆ 0.30103     │
│ 3      ┆ 1.098612    ┆ 0.477121    │
│ 4      ┆ 1.386294    ┆ 0.60206     │
│ 5      ┆ 1.609438    ┆ 0.69897     │
│ …      ┆ …           ┆ …           │
│ 16     ┆ 2.772589    ┆ 1.20412     │
│ 17     ┆ 2.833213    ┆ 1.230449    │
│ 18     ┆ 2.890372    ┆ 1.255273    │
│ 19     ┆ 2.944439    ┆ 1.278754    │
│ 20     ┆ 2.995732    ┆ 1.30103     │
└────────┴─────────────┴─────────────┘
</code></pre>
<p>In this code, we first created the DataFrame <code>df</code>. Then we created another DataFrame <code>df1</code>. Next, we used <code>pl.concat()</code> to concatenate the DataFrames.</p>
<p>The first argument that we passed is the list of the DataFrames that are to be linked. The <code>how</code> parameter defines the manner of concatenation. ‘Vertical’ in this context means that we are linking DataFrames vertically (adding more rows).</p>
<p>The important thing to note here is that schema incompatibility may raise an exception. If the DataFrames that are to be concatenated have different schemas, there will be a schema incompatibility problem. So it’s better to keep the schemas of both the datasets (that are to be concatenated) the same.</p>
<p>Here, we introduced a variable named <code>schema</code> containing the schema parameter of the DataFrame and we applied it to both the DataFrames to avoid schema incompatibility.</p>
<p>Also, concatenation occurs in the order of the passed arguments. For example, in the above code, <code>df</code> appears prior to <code>df1</code>, thus in the linked DataFrame, <code>df</code> appears first and then <code>df1</code>. If we had changed the sequence of values, the concatenated DataFrame would start from <code>df1</code> and then <code>df</code>.</p>
<p>The following code explains that:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> polars <span class="hljs-keyword">as</span> pl
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np

schema = {<span class="hljs-string">"Number"</span>: pl.UInt32, <span class="hljs-string">"Natural Log"</span>: <span class="hljs-literal">None</span>, <span class="hljs-string">"Log Base 10"</span>: <span class="hljs-literal">None</span>}

df = pl.DataFrame(
    {
        <span class="hljs-string">"Number"</span> : np.arange(<span class="hljs-number">1</span>, <span class="hljs-number">11</span>),
        <span class="hljs-string">"Natural Log"</span> : [np.log(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>,<span class="hljs-number">11</span>)],
        <span class="hljs-string">'Log Base 10'</span> : [np.log10(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>,<span class="hljs-number">11</span>)]
        },
    schema=schema
    )
pl.Config.set_tbl_hide_column_data_types(active=<span class="hljs-literal">True</span>)

<span class="hljs-comment"># new dataset created for concatenation</span>
df1 = pl.DataFrame({
    <span class="hljs-string">"Number"</span> : [x <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">11</span>, <span class="hljs-number">21</span>)],
    <span class="hljs-string">"Log Base 10"</span> : [np.log10(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">11</span>,<span class="hljs-number">21</span>)],
    <span class="hljs-string">"Natural Log"</span> : [np.log(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">11</span>, <span class="hljs-number">21</span>)]
}, schema=schema)

print(pl.concat([df1, df], how=<span class="hljs-string">'vertical'</span>)) <span class="hljs-comment"># sequence changed from [df,df1] to [df1, df]</span>
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-plaintext">shape: (20, 3)
┌────────┬─────────────┬─────────────┐
│ Number ┆ Natural Log ┆ Log Base 10 │
╞════════╪═════════════╪═════════════╡
│ 11     ┆ 2.397895    ┆ 1.041393    │
│ 12     ┆ 2.484907    ┆ 1.079181    │
│ 13     ┆ 2.564949    ┆ 1.113943    │
│ 14     ┆ 2.639057    ┆ 1.146128    │
│ 15     ┆ 2.70805     ┆ 1.176091    │
│ …      ┆ …           ┆ …           │
│ 6      ┆ 1.791759    ┆ 0.778151    │
│ 7      ┆ 1.94591     ┆ 0.845098    │
│ 8      ┆ 2.079442    ┆ 0.90309     │
│ 9      ┆ 2.197225    ┆ 0.954243    │
│ 10     ┆ 2.302585    ┆ 1.0         │
└────────┴─────────────┴─────────────┘
</code></pre>
<p>Here, we can see that the <code>df1</code> appears first and then <code>df</code> appears (unlike the previous example). Thus, the sequence of the values matters.</p>
<h3 id="heading-how-to-join-two-dataframes">How to Join Two DataFrames</h3>
<p><strong>Joining</strong> datasets and <strong>concatenating</strong> datasets are two different concepts. While concatenating means ‘linking’ two separate datasets, <a target="_blank" href="https://www.freecodecamp.org/news/understanding-sql-joins/">joining</a> refers to combining datasets based on a shared column (a key).<br>The computer matches rows from both datasets where the key values are the same.</p>
<p>In the above dataset ‘df’, we’ll add a new column by joining the dataset ‘df’ with another DataFrame.</p>
<pre><code class="lang-python"><span class="hljs-comment"># new dataframe</span>
new_col = pl.DataFrame({
    <span class="hljs-string">"Number"</span> : [x <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>, <span class="hljs-number">11</span>)],
    <span class="hljs-string">"Log Base 2"</span> : [np.log2(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>, <span class="hljs-number">11</span>)]
})

new_data = df.join(new_col, on=<span class="hljs-string">"Number"</span>, how=<span class="hljs-string">"left"</span>) <span class="hljs-comment"># Both have one column same to map values</span>

print(new_data.head())
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-plaintext">shape: (5, 4)
┌────────┬─────────────┬─────────────┬────────────┐
│ Number ┆ Natural Log ┆ Log Base 10 ┆ Log Base 2 │
╞════════╪═════════════╪═════════════╪════════════╡
│ 1      ┆ 0.0         ┆ 0.0         ┆ 0.0        │
│ 2      ┆ 0.693147    ┆ 0.30103     ┆ 1.0        │
│ 3      ┆ 1.098612    ┆ 0.477121    ┆ 1.584963   │
│ 4      ┆ 1.386294    ┆ 0.60206     ┆ 2.0        │
│ 5      ┆ 1.609438    ┆ 0.69897     ┆ 2.321928   │
└────────┴─────────────┴─────────────┴────────────┘
</code></pre>
<p>In this example, we used the join function on <code>df</code> and passed <code>new_col</code> as its argument. This is why the columns of the <code>df</code> function occur prior to the column of the <code>new_col</code> dataset. The parameter <code>on</code> should be given a column name on the basis of which the two datasets are to be joined.</p>
<p>Here, we first mapped the elements of the column <code>Number</code> and its corresponding rows and joined the DataFrames accordingly.</p>
<p>If we used the <code>join()</code> function on the <code>new_col</code> DataFrame, the columns of <code>df</code> would appear later than the column in <code>new_col</code>. The following code will make it clear:</p>
<pre><code class="lang-python"><span class="hljs-comment"># new dataframe</span>
new_col = pl.DataFrame({
    <span class="hljs-string">"Number"</span> : [x <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>, <span class="hljs-number">11</span>)],
    <span class="hljs-string">"Log Base 2"</span> : [np.log2(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>, <span class="hljs-number">11</span>)]
})

new_data = new_col.join(df, on=<span class="hljs-string">"Number"</span>, how=<span class="hljs-string">"left"</span>) <span class="hljs-comment"># passed df as argument</span>

print(new_data.head())
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-plaintext">shape: (5, 4)
┌────────┬────────────┬─────────────┬─────────────┐
│ Number ┆ Log Base 2 ┆ Natural Log ┆ Log Base 10 │
╞════════╪════════════╪═════════════╪═════════════╡
│ 1      ┆ 0.0        ┆ 0.0         ┆ 0.0         │
│ 2      ┆ 1.0        ┆ 0.693147    ┆ 0.30103     │
│ 3      ┆ 1.584963   ┆ 1.098612    ┆ 0.477121    │
│ 4      ┆ 2.0        ┆ 1.386294    ┆ 0.60206     │
│ 5      ┆ 2.321928   ┆ 1.609438    ┆ 0.69897     │
└────────┴────────────┴─────────────┴─────────────┘
</code></pre>
<p>You can notice that the column ‘Log Base 2’ appears prior to other columns (unlike in the previous example). Thus this change is significant.</p>
<h3 id="heading-how-to-use-the-withcolumns-function">How to Use the <code>with_columns()</code> Function</h3>
<p>The <code>with_columns()</code> function enables us to make changes to the column and print it as a new column with existing columns from the original dataset. This is similar to the <code>join()</code> function.</p>
<p>The following example will make it clear:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> polars <span class="hljs-keyword">as</span> pl
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np

df = pl.DataFrame(
    {
        <span class="hljs-string">"Number"</span> : np.arange(<span class="hljs-number">1</span>, <span class="hljs-number">11</span>),
        <span class="hljs-string">"Natural Log"</span> : [np.log(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>,<span class="hljs-number">11</span>)],
        <span class="hljs-string">'Log Base 10'</span> : [np.log10(x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> range(<span class="hljs-number">1</span>,<span class="hljs-number">11</span>)]
        },
    schema=schema
    )
new_data = df.with_columns((np.log2(pl.col(<span class="hljs-string">"Number"</span>))).alias(<span class="hljs-string">"Log Base 2"</span>))

print(new_data.head())
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-plaintext">shape: (5, 4)
┌────────┬─────────────┬─────────────┬────────────┐
│ Number ┆ Natural Log ┆ Log Base 10 ┆ Log Base 2 │
╞════════╪═════════════╪═════════════╪════════════╡
│ 1      ┆ 0.0         ┆ 0.0         ┆ 0.0        │
│ 2      ┆ 0.693147    ┆ 0.30103     ┆ 1.0        │
│ 3      ┆ 1.098612    ┆ 0.477121    ┆ 1.584963   │
│ 4      ┆ 1.386294    ┆ 0.60206     ┆ 2.0        │
│ 5      ┆ 1.609438    ┆ 0.69897     ┆ 2.321928   │
└────────┴─────────────┴─────────────┴────────────┘
</code></pre>
<p>In this example, we have a DataFrame <code>df</code>. To add a column to it , we use the <code>with_columns()</code> function. In this function, we selected column named ‘Number’ using the <code>pl.col()</code> function and put it inside the <code>np.log2()</code> to get the log base 2 value for every record. Finally, to label the new column, we used the <code>alias()</code> function, with the label passed to it as an argument.</p>
<p>Now that we know about the basics of DataFrames, let’s look at how we can work with CSV files.</p>
<h2 id="heading-how-to-read-csv-files-with-polars">How to Read CSV Files with Polars</h2>
<p>Reading CSV files with Polars is extremely similar to how it works in Pandas. For this tutorial, I’ll be using the Titanic Dataset. Here’s the <a target="_blank" href="https://www.kaggle.com/datasets/yasserh/titanic-dataset?select=Titanic-Dataset.csv">link to the dataset</a> so you can download it. In this part of the tutorial, we’ll be mainly talking about column selection (useful in feature selection) and filtering the data.</p>
<p>Here’s the syntax for reading a CSV file:</p>
<p><code>var_name = pl.read_csv(“path_dataset“)</code></p>
<p>Example code:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> polars <span class="hljs-keyword">as</span> pl

data = pl.read_csv(<span class="hljs-string">"/titanic_dataset.csv"</span>)
print(data.head())
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-plaintext">shape: (5, 12)
┌─────────────┬──────────┬────────┬─────────────────────┬───┬─────────┬─────────┬───────┬──────────┐
│ PassengerId ┆ Survived ┆ Pclass ┆ Name                ┆ … ┆ Ticket  ┆ Fare    ┆ Cabin ┆ Embarked │
╞═════════════╪══════════╪════════╪═════════════════════╪═══╪═════════╪═════════╪═══════╪══════════╡
│ 892         ┆ 0        ┆ 3      ┆ Kelly, Mr. James    ┆ … ┆ 330911  ┆ 7.8292  ┆ null  ┆ Q        │
│ 893         ┆ 1        ┆ 3      ┆ Wilkes, Mrs. James  ┆ … ┆ 363272  ┆ 7.0     ┆ null  ┆ S        │
│             ┆          ┆        ┆ (Ellen Need…        ┆   ┆         ┆         ┆       ┆          │
│ 894         ┆ 0        ┆ 2      ┆ Myles, Mr. Thomas   ┆ … ┆ 240276  ┆ 9.6875  ┆ null  ┆ Q        │
│             ┆          ┆        ┆ Francis             ┆   ┆         ┆         ┆       ┆          │
│ 895         ┆ 0        ┆ 3      ┆ Wirz, Mr. Albert    ┆ … ┆ 315154  ┆ 8.6625  ┆ null  ┆ S        │
│ 896         ┆ 1        ┆ 3      ┆ Hirvonen, Mrs.      ┆ … ┆ 3101298 ┆ 12.2875 ┆ null  ┆ S        │
│             ┆          ┆        ┆ Alexander (Helg…    ┆   ┆         ┆         ┆       ┆          │
└─────────────┴──────────┴────────┴─────────────────────┴───┴─────────┴─────────┴───────┴──────────┘
</code></pre>
<p>We can get the statistical analysis of the data by using the <code>describe()</code> function.</p>
<pre><code class="lang-python">print(data.describe())
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-plaintext">shape: (9, 13)
┌────────────┬─────────────┬──────────┬──────────┬───┬─────────────┬───────────┬───────┬──────────┐
│ statistic  ┆ PassengerId ┆ Survived ┆ Pclass   ┆ … ┆ Ticket      ┆ Fare      ┆ Cabin ┆ Embarked │
╞════════════╪═════════════╪══════════╪══════════╪═══╪═════════════╪═══════════╪═══════╪══════════╡
│ count      ┆ 418.0       ┆ 418.0    ┆ 418.0    ┆ … ┆ 418         ┆ 417.0     ┆ 91    ┆ 418      │
│ null_count ┆ 0.0         ┆ 0.0      ┆ 0.0      ┆ … ┆ 0           ┆ 1.0       ┆ 327   ┆ 0        │
│ mean       ┆ 1100.5      ┆ 0.363636 ┆ 2.26555  ┆ … ┆ null        ┆ 35.627188 ┆ null  ┆ null     │
│ std        ┆ 120.810458  ┆ 0.481622 ┆ 0.841838 ┆ … ┆ null        ┆ 55.907576 ┆ null  ┆ null     │
│ min        ┆ 892.0       ┆ 0.0      ┆ 1.0      ┆ … ┆ 110469      ┆ 0.0       ┆ A11   ┆ C        │
│ 25%        ┆ 996.0       ┆ 0.0      ┆ 1.0      ┆ … ┆ null        ┆ 7.8958    ┆ null  ┆ null     │
│ 50%        ┆ 1101.0      ┆ 0.0      ┆ 3.0      ┆ … ┆ null        ┆ 14.4542   ┆ null  ┆ null     │
│ 75%        ┆ 1205.0      ┆ 1.0      ┆ 3.0      ┆ … ┆ null        ┆ 31.5      ┆ null  ┆ null     │
│ max        ┆ 1309.0      ┆ 1.0      ┆ 3.0      ┆ … ┆ W.E.P. 5734 ┆ 512.3292  ┆ G6    ┆ S        │
└────────────┴─────────────┴──────────┴──────────┴───┴─────────────┴───────────┴───────┴──────────┘
</code></pre>
<h3 id="heading-how-to-select-columns-from-the-dataset">How to Select Columns from the Dataset</h3>
<p>Now we’re going to learn how to select certain columns from the dataset and transform those columns into a new DataFrame. This can be useful if we want to train an ML model based on only certain columns and not the entire dataset (that is, using feature selection).</p>
<p>Let’s first look at the code below:</p>
<pre><code class="lang-python">new_df = data.select(
    pl.col(<span class="hljs-string">"Survived"</span>),
    pl.col(<span class="hljs-string">"Name"</span>),
    pl.col(<span class="hljs-string">"Age"</span>),
    pl.col(<span class="hljs-string">"Sex"</span>)
)

print(new_df.head())
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-plaintext">shape: (5, 4)
┌──────────┬─────────────────────────────────┬──────┬────────┐
│ Survived ┆ Name                            ┆ Age  ┆ Sex    │
╞══════════╪═════════════════════════════════╪══════╪════════╡
│ 0        ┆ Kelly, Mr. James                ┆ 34.5 ┆ male   │
│ 1        ┆ Wilkes, Mrs. James (Ellen Need… ┆ 47.0 ┆ female │
│ 0        ┆ Myles, Mr. Thomas Francis       ┆ 62.0 ┆ male   │
│ 0        ┆ Wirz, Mr. Albert                ┆ 27.0 ┆ male   │
│ 1        ┆ Hirvonen, Mrs. Alexander (Helg… ┆ 22.0 ┆ female │
└──────────┴─────────────────────────────────┴──────┴────────┘
</code></pre>
<p>In the code above, we selected four columns using the <code>select()</code> and <code>pl.col()</code> functions from the Titanic Dataset and transformed them into a new DataFrame called <code>new_df</code>.</p>
<p>Now, we can filter this data however we want. Let’s make a new DataFrame by filtering out only surviving passengers from the dataset:</p>
<pre><code class="lang-python">survived_data = data.select(
    pl.col(<span class="hljs-string">"Survived"</span>),
    pl.col(<span class="hljs-string">"Name"</span>),
    pl.col(<span class="hljs-string">"Age"</span>),
    pl.col(<span class="hljs-string">"Sex"</span>)
).filter(pl.col(<span class="hljs-string">"Survived"</span>)==<span class="hljs-number">1</span>)

print(survived_data.head())
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-plaintext">shape: (5, 4)
┌──────────┬─────────────────────────────────┬──────┬────────┐
│ Survived ┆ Name                            ┆ Age  ┆ Sex    │
╞══════════╪═════════════════════════════════╪══════╪════════╡
│ 1        ┆ Wilkes, Mrs. James (Ellen Need… ┆ 47.0 ┆ female │
│ 1        ┆ Hirvonen, Mrs. Alexander (Helg… ┆ 22.0 ┆ female │
│ 1        ┆ Connolly, Miss. Kate            ┆ 30.0 ┆ female │
│ 1        ┆ Abrahim, Mrs. Joseph (Sophie H… ┆ 18.0 ┆ female │
│ 1        ┆ Snyder, Mrs. John Pillsbury (N… ┆ 23.0 ┆ female │
└──────────┴─────────────────────────────────┴──────┴────────┘
</code></pre>
<p>In the above code, we used the <code>filter()</code> function. This function helps us gather data that applies to our given condition. In the above example, we added the condition that, “Every element in the column named ‘Survived’ should be equal to 1”. Hence, we got our required data.</p>
<h2 id="heading-some-other-important-functions">Some Other Important Functions</h2>
<h3 id="heading-how-to-print-the-names-of-the-columns-of-a-dataset">How to Print the Names of the Columns of a Dataset</h3>
<p>You can print the names of a column using the <code>columns</code> method. The following code shows how to use the columns method:</p>
<pre><code class="lang-python">print(data.columns) <span class="hljs-comment"># data --&gt; Titanic Dataset</span>
</code></pre>
<p><strong>Output:</strong></p>
<blockquote>
<p>['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp', 'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked']</p>
</blockquote>
<h3 id="heading-how-to-index-a-dataset">How to Index a Dataset</h3>
<p>Indexing a dataset means adding an index column to the existing dataset. It can prove useful in keeping track of the rows of the dataset.</p>
<p>We can index the dataset using the <code>with_row_index()</code> function. Inside this function, we can pass the argument to name this new index column. If we don’t pass any argument, the index column name is set as ‘index’ by default.</p>
<pre><code class="lang-python">data = pl.read_csv(<span class="hljs-string">"/titanic_dataset.csv"</span>).with_row_index(<span class="hljs-string">'#'</span>) <span class="hljs-comment"># naming the index column as '#'</span>
print(data.head())
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-plaintext">shape: (5, 13)
┌─────┬─────────────┬──────────┬────────┬───┬─────────┬─────────┬───────┬──────────┐
│ #   ┆ PassengerId ┆ Survived ┆ Pclass ┆ … ┆ Ticket  ┆ Fare    ┆ Cabin ┆ Embarked │
│ --- ┆ ---         ┆ ---      ┆ ---    ┆   ┆ ---     ┆ ---     ┆ ---   ┆ ---      │
│ u32 ┆ i64         ┆ i64      ┆ i64    ┆   ┆ str     ┆ f64     ┆ str   ┆ str      │
╞═════╪═════════════╪══════════╪════════╪═══╪═════════╪═════════╪═══════╪══════════╡
│ 0   ┆ 892         ┆ 0        ┆ 3      ┆ … ┆ 330911  ┆ 7.8292  ┆ null  ┆ Q        │
│ 1   ┆ 893         ┆ 1        ┆ 3      ┆ … ┆ 363272  ┆ 7.0     ┆ null  ┆ S        │
│ 2   ┆ 894         ┆ 0        ┆ 2      ┆ … ┆ 240276  ┆ 9.6875  ┆ null  ┆ Q        │
│ 3   ┆ 895         ┆ 0        ┆ 3      ┆ … ┆ 315154  ┆ 8.6625  ┆ null  ┆ S        │
│ 4   ┆ 896         ┆ 1        ┆ 3      ┆ … ┆ 3101298 ┆ 12.2875 ┆ null  ┆ S        │
└─────┴─────────────┴──────────┴────────┴───┴─────────┴─────────┴───────┴──────────┘
</code></pre>
<h3 id="heading-how-to-rename-columns-in-the-dataset">How to Rename Columns in the Dataset</h3>
<p>Lastly, to rename columns in the Dataset, we use the <code>rename()</code> function.</p>
<pre><code class="lang-python">data = pl.read_csv(<span class="hljs-string">"/titanic_dataset.csv"</span>).with_row_index(<span class="hljs-string">'#'</span>).rename({<span class="hljs-string">'PassengerId'</span>:<span class="hljs-string">'renamed_col'</span>})
print(data.head())
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="lang-plaintext">shape: (5, 13)
┌─────┬─────────────┬──────────┬────────┬───┬─────────┬─────────┬───────┬──────────┐
│ #   ┆ renamed_col ┆ Survived ┆ Pclass ┆ … ┆ Ticket  ┆ Fare    ┆ Cabin ┆ Embarked │
│ --- ┆ ---         ┆ ---      ┆ ---    ┆   ┆ ---     ┆ ---     ┆ ---   ┆ ---      │
│ u32 ┆ i64         ┆ i64      ┆ i64    ┆   ┆ str     ┆ f64     ┆ str   ┆ str      │
╞═════╪═════════════╪══════════╪════════╪═══╪═════════╪═════════╪═══════╪══════════╡
│ 0   ┆ 892         ┆ 0        ┆ 3      ┆ … ┆ 330911  ┆ 7.8292  ┆ null  ┆ Q        │
│ 1   ┆ 893         ┆ 1        ┆ 3      ┆ … ┆ 363272  ┆ 7.0     ┆ null  ┆ S        │
│ 2   ┆ 894         ┆ 0        ┆ 2      ┆ … ┆ 240276  ┆ 9.6875  ┆ null  ┆ Q        │
│ 3   ┆ 895         ┆ 0        ┆ 3      ┆ … ┆ 315154  ┆ 8.6625  ┆ null  ┆ S        │
│ 4   ┆ 896         ┆ 1        ┆ 3      ┆ … ┆ 3101298 ┆ 12.2875 ┆ null  ┆ S        │
└─────┴─────────────┴──────────┴────────┴───┴─────────┴─────────┴───────┴──────────┘
</code></pre>
<p>In the above example, we renamed the column named ‘PassengerId’ to ‘renamed_col’.</p>
<h2 id="heading-summary">Summary</h2>
<p>Now you know how to work with the Polars Python library to analyze your data more effectively.</p>
<p>In this article, you learned:</p>
<ul>
<li><p>What Polars is and how to install it</p>
</li>
<li><p>How to define series and DataFrames in Polars</p>
</li>
<li><p>Different functions to deal with DataFrames.</p>
</li>
<li><p>How to read and work with CSV files in Polars</p>
</li>
</ul>
<p>Thanks for Reading, and happy data wrangling!</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Copy Objects in Python ]]>
                </title>
                <description>
                    <![CDATA[ In this tutorial, you’ll learn about copying objects in Python using the copy module. We’ll cover how to use the copy module and when to use its copy() function and deepcopy() function, depending on the scenario. You’ll also learn which way of copyin... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-copy-objects-in-python/</link>
                <guid isPermaLink="false">68014f101e21d8d8454b2cf5</guid>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Sara Jadhav ]]>
                </dc:creator>
                <pubDate>Thu, 17 Apr 2025 18:57:20 +0000</pubDate>
                <media:content url="https://cdn.hashnode.com/res/hashnode/image/upload/v1744913871670/5ed210bd-1d42-436e-907b-5b304010dbd7.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>In this tutorial, you’ll learn about copying objects in Python using the <code>copy</code> module. We’ll cover how to use the <code>copy</code> module and when to use its <code>copy()</code> function and <code>deepcopy()</code> function, depending on the scenario. You’ll also learn which way of copying is suitable for mutable and immutable objects.</p>
<p>By the end of this tutorial, you’ll understand:</p>
<ul>
<li><p>What is the <code>copy</code> module?</p>
</li>
<li><p>The difference between copying and referencing.</p>
</li>
<li><p>The difference between a deep copy and a shallow copy.</p>
</li>
<li><p>How to actually shallow copy and deep copy objects in Python.</p>
</li>
<li><p>The difference in referencing for immutable objects and mutable objects.</p>
</li>
</ul>
<h2 id="heading-prerequisites">Prerequisites</h2>
<p>To get the most out of this tutorial, you need to have a basic understanding of the following:</p>
<ol>
<li><p>Fundamental knowledge of programming and its terminology (such as objects, memory addresses, and so on)</p>
</li>
<li><p>Basic knowledge of Python programming, especially (for this tutorial),</p>
<ul>
<li><p>Function <code>id()</code>: Outputs the memory address of the object passed as argument.</p>
</li>
<li><p>Data structures: Dictionaries and Lists.</p>
</li>
<li><p>Modules: importing and using them in the program. Basic understanding of methods and functions.</p>
</li>
</ul>
</li>
</ol>
<h2 id="heading-table-of-contents">Table of Contents:</h2>
<ol>
<li><p><a class="post-section-overview" href="#heading-what-is-the-copy-module-in-python">What is the Copy Module in Python?</a></p>
<ul>
<li><a class="post-section-overview" href="#heading-why-cant-we-just-use-the-assignment-operator">Why Can’t We Just Use the Assignment Operator?</a></li>
</ul>
</li>
<li><p><a class="post-section-overview" href="#heading-how-to-properly-copy-objects-in-python">How to Properly Copy Objects in Python</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-more-about-copying-objects-in-python">More About Copying Objects in Python</a></p>
</li>
<li><p><a class="post-section-overview" href="#heading-summary">Summary</a></p>
</li>
</ol>
<h2 id="heading-what-is-the-copy-module">What is the <code>copy</code> Module?</h2>
<p>The <code>copy</code> module is an in-built module in Python which is primarily used for copying objects in Python. it lets you make changes to a mutable object and save it as a different copy in memory. So basically, it makes a copy of the original object and stores it in a different memory location.</p>
<h3 id="heading-why-cant-we-just-use-the-assignment-operator-for-copying-objects">Why can’t we just use the assignment operator (<code>=</code>) for copying objects?</h3>
<p>If we use the assignment operator for the purpose of copying objects, it doesn’t actually <strong>copy</strong> the object – rather, it creates a binding between the object and the identifier. This means that if the original object points at memory location <code>x</code>, then the identifier in which we attempted to copy the object by using the <code>=</code> operator will also point at the same memory location, that is location <code>x</code>.</p>
<p>Now, this may create problems while manipulating various aspects of the data, as the changes that we make in the object will reflect in its binding as well.</p>
<p>Before jumping into the code, let’s first look at the difference between copying and referencing:</p>
<ul>
<li><p><strong>Copying:</strong> Creating a copy refers to replicating the target object and storing it separately into the memory, making it an independent object with same data.</p>
</li>
<li><p><strong>Referencing:</strong> Referencing an object refers to pointing to the same memory address where the target object is stored. The referenced object is just another name (which we call as an ‘alias’ in programming) to call out the original object.</p>
</li>
</ul>
<p>Let’s understand this with an example:</p>
<pre><code class="lang-python"><span class="hljs-comment"># creating a dictionary object.</span>
d1 = {
    <span class="hljs-string">'A'</span> : <span class="hljs-number">1</span>,
    <span class="hljs-string">'B'</span> : <span class="hljs-number">2</span>,
    <span class="hljs-string">'C'</span> : <span class="hljs-number">3</span>
}

<span class="hljs-comment"># using assignment operator to copy d1 in d2.</span>
d2=d1

<span class="hljs-comment"># printing both the dictionaries.</span>
print(<span class="hljs-string">f'd1 = <span class="hljs-subst">{d1}</span> \nd2 = <span class="hljs-subst">{d2}</span>'</span>)
</code></pre>
<p><strong>Output</strong>:</p>
<blockquote>
<p>d1 = {'A': 1, 'B': 2, 'C': 3}</p>
<p>d2 = {'A': 1, 'B': 2, 'C': 3}</p>
</blockquote>
<p>From above example, it may seem that the dictionary got copied in variable <code>d2</code> – but in reality, it’s just pointing to the object stored in variable <code>d1</code>. In this case, variable <code>d2</code> is just an alias or reference to the same object <code>d1</code>. We can prove this as follows:</p>
<pre><code class="lang-python">d1 = {
    <span class="hljs-string">'A'</span> : <span class="hljs-number">1</span>,
    <span class="hljs-string">'B'</span> : <span class="hljs-number">2</span>,
    <span class="hljs-string">'C'</span> : <span class="hljs-number">3</span>
}

d2=d1

d1[<span class="hljs-string">'D'</span>] = <span class="hljs-number">4</span> <span class="hljs-comment"># added a key-value pair in d1</span>

print(<span class="hljs-string">f'd1 = <span class="hljs-subst">{d1}</span> \nd2 = <span class="hljs-subst">{d2}</span>'</span>)
</code></pre>
<p>Output:</p>
<blockquote>
<p>d1 = {'A': 1, 'B': 2, 'C': 3, 'D': 4}</p>
<p>d2 = {'A': 1, 'B': 2, 'C': 3, 'D': 4}</p>
</blockquote>
<p>Now, in the above code, we appended a key-value pair in dictionary <code>d1</code> only – but the change is seen in dictionary <code>d2</code>, too. From this, it is evident that both the identifiers were referencing the same object.</p>
<p>From this we understand that, assignment operator <code>=</code> can be used for referencing the objects and we cannot use it for copying objects in true sense.</p>
<h2 id="heading-how-to-properly-copy-objects-in-python">How to Properly Copy Objects in Python</h2>
<p>Since you now understand the difference between copying and referencing, let’s see how you can actually copy objects in Python. For this, we will make use of the <code>copy</code> module (mentioned earlier).</p>
<p>Now, before using this module, you should understand the difference between a deep copy and a shallow copy.</p>
<ul>
<li><p><strong>Deep copy</strong>: While working with compound objects (also known as container objects or composite objects), deep copying means to create a copy of the inner objects as well as the outer object.</p>
</li>
<li><p><strong>Shallow copy:</strong> While working with compound objects, shallow copying refers to copying only the outer object and referencing the inner objects.</p>
</li>
</ul>
<p><strong>Note</strong>: Compound objects are objects that contain other objects inside them.</p>
<p>Let’s better understand the difference between deep and shallow copies by actually implementing them in a program:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> copy <span class="hljs-comment"># importing copy module</span>

<span class="hljs-comment"># creating a composite object</span>
categories = {
    <span class="hljs-string">'Fruits'</span> : [<span class="hljs-string">'Apple'</span>, <span class="hljs-string">'Banana'</span>, <span class="hljs-string">'Mango'</span>],
    <span class="hljs-string">'Flowers'</span> : [<span class="hljs-string">'Rose'</span>, <span class="hljs-string">'Sunflower'</span>, <span class="hljs-string">'Tulip'</span>],
}

<span class="hljs-comment"># copying the object by using the copy() function of the copy module</span>
categories_copy = copy.copy(categories)

print(<span class="hljs-string">f'Categories = <span class="hljs-subst">{categories}</span>\nCategories (Copied) = <span class="hljs-subst">{categories_copy}</span>'</span>)
</code></pre>
<p><strong>Output:</strong></p>
<blockquote>
<p>Categories = {'Fruits': ['Apple', 'Banana', 'Mango'], 'Flowers': ['Rose', 'Sunflower', 'Tulip']}</p>
<p>Categories (Copied) = {'Fruits': ['Apple', 'Banana', 'Mango'], 'Flowers': ['Rose', 'Sunflower', 'Tulip']}</p>
</blockquote>
<p>In the above example, we made a composite object called <code>categories</code> that contains lists as the inner objects. Then, we used the <code>copy()</code> function of the <code>copy</code> module to shallow copy the original object. Also, since the <code>copy</code> module is in-built, there is no need to install it manually! Now, both the objects appear similar.</p>
<p>Next, let’s modify the original object to see if the object is really copied or it is just referenced:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> copy

categories = {
    <span class="hljs-string">'Fruits'</span> : [<span class="hljs-string">'Apple'</span>, <span class="hljs-string">'Banana'</span>, <span class="hljs-string">'Mango'</span>],
    <span class="hljs-string">'Flowers'</span> : [<span class="hljs-string">'Rose'</span>, <span class="hljs-string">'Sunflower'</span>, <span class="hljs-string">'Tulip'</span>],
}

categories_copy = copy.copy(categories)

<span class="hljs-comment"># added a key-value pair in the original dictionary.</span>
categories[<span class="hljs-string">'Color'</span>] = [<span class="hljs-string">'Red'</span>, <span class="hljs-string">'Yellow'</span>, <span class="hljs-string">'Blue'</span>]

print(<span class="hljs-string">f'Categories = <span class="hljs-subst">{categories}</span>\nCategories (Copied) = <span class="hljs-subst">{categories_copy}</span>'</span>)
</code></pre>
<p><strong>Output:</strong></p>
<blockquote>
<p>Categories = {'Fruits': ['Apple', 'Banana', 'Mango'], 'Flowers': ['Rose', 'Sunflower', 'Tulip'], 'Color': ['Red', 'Yellow', 'Blue']}</p>
<p>Categories (Copied) = {'Fruits': ['Apple', 'Banana', 'Mango'], 'Flowers': ['Rose', 'Sunflower', 'Tulip']}</p>
</blockquote>
<p>Here, we can see that even when we changed the original dictionary, the copied dictionary (stored in variable <code>categories_copy</code>) remained the same. This means that we successfully copied the dictionary to a different memory location.</p>
<p>But we shallow copied the dictionary. We know that, for a shallow copied composite object, the inner objects point at same memory location as that of the original composite object. You can see this in the following example:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> copy

categories = {
    <span class="hljs-string">'Fruits'</span> : [<span class="hljs-string">'Apple'</span>, <span class="hljs-string">'Banana'</span>, <span class="hljs-string">'Mango'</span>],
    <span class="hljs-string">'Flowers'</span> : [<span class="hljs-string">'Rose'</span>, <span class="hljs-string">'Sunflower'</span>, <span class="hljs-string">'Tulip'</span>],
}

categories_copy = copy.copy(categories)

<span class="hljs-comment"># checking if the inner object list 'Fruits' of both the dictionaries point to same memory address.</span>
print(<span class="hljs-string">f"""
Do 'categories' and 'categories_copy' inner object share same memory address? 
--&gt; <span class="hljs-subst">{id(categories_copy[<span class="hljs-string">'Fruits'</span>]) == id(categories[<span class="hljs-string">'Fruits'</span>])}</span>
"""</span>)

<span class="hljs-comment"># checking if the outer objects (dictionaries) point to same memory address.</span>
print(<span class="hljs-string">f"""
Do 'categories' and 'categories_copy' outer object share same memory address? 
--&gt; <span class="hljs-subst">{id(categories_copy) == id(categories)}</span>
"""</span>)
</code></pre>
<p><strong>Output:</strong></p>
<blockquote>
<p>Do 'categories' and 'categories_copy' inner object share same memory address?</p>
<p>--&gt; True</p>
<p>Do 'categories' and 'categories_copy' outer object share same memory address?</p>
<p>--&gt; False</p>
</blockquote>
<p>In the above code, we used the same example as earlier. Then, we made use of the in-built function <code>id()</code> to extract and compare the memory addresses of both the dictionaries.</p>
<p>For inner objects, the memory address is the same. But outer objects are located at different locations in the memory. Thus, we can say that while shallow copying objects, the inner objects are only referenced, while the outer objects are copied to a separate memory address.</p>
<p>On the other hand, the <code>deepcopy()</code> function of the <code>copy</code> module copies the object completely (both inner and outer objects are stored at different memory locations). The following code shows how we can deep copy the objects within our code:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> copy

categories = {
    <span class="hljs-string">'Fruits'</span> : [<span class="hljs-string">'Apple'</span>, <span class="hljs-string">'Banana'</span>, <span class="hljs-string">'Mango'</span>],
    <span class="hljs-string">'Flowers'</span> : [<span class="hljs-string">'Rose'</span>, <span class="hljs-string">'Sunflower'</span>, <span class="hljs-string">'Tulip'</span>],
}

<span class="hljs-comment"># deep copying the dictionary</span>
categories_copy = copy.deepcopy(categories)

print(<span class="hljs-string">f"""
Do 'categories' and 'categories_copy' inner object share same memory address? 
--&gt; <span class="hljs-subst">{id(categories_copy[<span class="hljs-string">'Fruits'</span>]) == id(categories[<span class="hljs-string">'Fruits'</span>])}</span>
"""</span>)

print(<span class="hljs-string">f"""
Do 'categories' and 'categories_copy' outer object share same memory address? 
--&gt; <span class="hljs-subst">{id(categories_copy) == id(categories)}</span>
"""</span>)
</code></pre>
<p><strong>Output:</strong></p>
<blockquote>
<p>Do 'categories' and 'categories_copy' inner object share same memory address?</p>
<p>--&gt; False</p>
<p>Do 'categories' and 'categories_copy' outer object share same memory address?</p>
<p>--&gt; False</p>
</blockquote>
<p>In the code, we deep copied the dictionary by using the <code>deepcopy()</code> function. When we compared the memory addresses of the inner and the outer objects stored in both the identifiers, we can see that they are separately stored in the memory.</p>
<p>So you’ll use a shallow copy or a deep copy depending upon the situation.</p>
<p>For example, if you just want to copy the outer object and keep the nested object same for all, you should opt for a shallow copy. If you have defined a class to create students’ ID of grade X, then you might need to keep the <code>self.grade = X</code> for all the students. In such cases, you can just reference the nested object.</p>
<p>Also, for non-nested objects, the shallow copy method fulfills the purpose, as there are no nested objects and shallow copying completely copies the outer object to a different memory location.</p>
<p>On the other hand, if you want a complete, independent copy of the object, you should deep copy the object.</p>
<h2 id="heading-more-about-copying-objects-in-python">More About Copying Objects in Python</h2>
<p>You can use the <code>copy</code> module for both immutable and mutable objects. But for immutable objects, you can also use assignment operator <code>=</code> for copying objects.</p>
<p>Now, as I mentioned earlier, in this case too, the object is referenced when you use the <code>=</code> operator. But, when you mutate immutable objects, the mutated objects get stored at a different memory location. This makes the alias of the original object an independent object, pointing at the same memory address as earlier.</p>
<p>Let’s understand this with an example:</p>
<pre><code class="lang-python">str1 = <span class="hljs-string">"String"</span> <span class="hljs-comment"># created a string object</span>

str2 = str1 <span class="hljs-comment"># using '=' to reference the string stored in 'str1'</span>

print(str2, str1, sep=<span class="hljs-string">'\n'</span>) <span class="hljs-comment"># printing the strings</span>
</code></pre>
<p><strong>Output:</strong></p>
<blockquote>
<p>String</p>
<p>String</p>
</blockquote>
<p>Above, the variable <code>str2</code> referenced the string stored in variable <code>str1</code>. Basically, <code>str2</code> and <code>str1</code> point at same memory address and <code>str2</code> is just an alias of <code>str1</code>.</p>
<p>But if we go further and modify the string in <code>str1</code>, then <code>str1</code> starts to point at a new memory location (since a string is immutable, if it’s modified it gets stored at a different memory address). But <code>str2</code> will still point at the previous memory address, commonly fulfilling the purpose of copying the objects.</p>
<pre><code class="lang-python">str1 = <span class="hljs-string">"String"</span>

str2 = str1

<span class="hljs-comment"># printing memory addresses of both the variables before mutation.</span>
print(<span class="hljs-string">f"""
Memory address of str1: <span class="hljs-subst">{id(str1)}</span>
Memory address of str2: <span class="hljs-subst">{id(str2)}</span>
"""</span>)

str1+=<span class="hljs-string">'***'</span> <span class="hljs-comment"># concatenated the string '***' with str1.</span>

print(str2, str1, sep=<span class="hljs-string">'\n'</span>)

<span class="hljs-comment"># printing memory addresses of both the variables after mutation.</span>
print(<span class="hljs-string">f"""
Memory address of str1: <span class="hljs-subst">{id(str1)}</span>
Memory address of str2: <span class="hljs-subst">{id(str2)}</span>
"""</span>)
</code></pre>
<p><strong>Output:</strong></p>
<blockquote>
<p>Memory address of str1: 2652367074480</p>
<p>Memory address of str2: 2652367074480</p>
<p>String</p>
<p>String***</p>
<p>Memory address of str1: 2652367370736</p>
<p>Memory address of str2: 2652367074480</p>
</blockquote>
<p><strong>Note</strong>: Memory addresses may vary on your device from the ones shown above.</p>
<p>Now, in the above example, we first created a string object and referenced it in another variable, and printed the memory addresses of both the variables. Then, we modified the original string, and again, printed the memory addresses of both the variables.</p>
<p>In the output, we can see that the memory addresses of both the variables before mutation were the same. So you can see that both the variables were pointing at the same memory location. But after mutation, the variable <code>str1</code> started pointing at a different memory location, thus making the alias <code>str2</code> an independent object, which still points at the previous memory address.</p>
<p>To sum up, you can use the <code>=</code> operator for storing a copy of the original object if you plan to modify it further in the program.</p>
<h2 id="heading-summary">Summary</h2>
<p>In this tutorial, you learned about copying objects in Python. Specifically, we talked about:</p>
<ul>
<li><p>How the assignment operator <code>=</code> is used for referencing and not copying.</p>
</li>
<li><p>The built-in <code>copy</code> module, which provides functions that allow us to shallow copy and deep copy the objects in our program.</p>
</li>
<li><p>The concept of shallow copy and deep copy, which are essential when copying compound objects.</p>
</li>
<li><p>How a shallow copy copies the outer object and references the inner objects.</p>
</li>
<li><p>How a deep copy copies both the outer object and the inner objects.</p>
</li>
<li><p>How for immutable objects, the assignment operator works fine for copying the objects most of the time.</p>
</li>
</ul>
<p>Thanks for reading!</p>
 ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
