<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ #Scrapy - freeCodeCamp.org ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ #Scrapy - freeCodeCamp.org ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Fri, 26 Jun 2026 22:48:01 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/tag/scrapy/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ The Complete Guide to Scraping the Web for Top Rated Movies on TV ]]>
                </title>
                <description>
                    <![CDATA[ By Bert Carremans In this article, I will show how to scrape the internet for top-rated films with the Scrapy framework. The goal of this web scraper is to find films that have a high user rating on The Movie Database. The list with these films will ... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/scrape-the-web-for-top-rated-movies-on-tv/</link>
                <guid isPermaLink="false">66d45ddf4a7504b7409c3347</guid>
                
                    <category>
                        <![CDATA[ Python ]]>
                    </category>
                
                    <category>
                        <![CDATA[ #Scrapy ]]>
                    </category>
                
                    <category>
                        <![CDATA[ web scraping ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ freeCodeCamp ]]>
                </dc:creator>
                <pubDate>Fri, 03 Jan 2020 22:30:01 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2020/01/0_D52CsZmqCYvifA3M.jpeg" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>By Bert Carremans</p>
<p>In this article, I will show how to scrape the internet for top-rated films with the <a target="_blank" href="http://scrapy.org/"><strong><em>Scrapy framework</em></strong></a>. The <strong><em>goal</em></strong> of this web scraper is to find films that have a high user rating on <a target="_blank" href="https://www.themoviedb.org/">The Movie Database</a>. The list with these films will be stored in an <strong><em>SQLite database</em></strong> and <strong><em>emailed</em></strong>. This way you know you’ll never miss a blockbuster on TV again.</p>
<h1 id="heading-finding-a-good-web-page-to-scrape">Finding a good web page to scrape</h1>
<p>I start with an online TV guide to find films on Belgian TV channels. But you could easily adapt my code to use it for any other website. To make your life easier when scraping for films, make sure the website you want to scrape:</p>
<ul>
<li>has HTML tags with a <strong><em>comprehensible class or id</em></strong></li>
<li>uses classes and ids in a <strong><em>consistent</em></strong> way</li>
<li>has <strong><em>well-structured URLs</em></strong></li>
<li>contains all relevant <strong><em>TV channels on one page</em></strong></li>
<li>has a <strong><em>separate page per weekday</em></strong></li>
<li><em><strong>lists only films</strong></em> and no other program types like live shows, news, reportage, and so on. Unless you can easily distinguish the films from the other program types.</li>
</ul>
<p>With the results found we will scrape <a target="_blank" href="https://www.themoviedb.org/"><strong><em>The Movie Database</em></strong></a> (TMDB) for the film rating and some other information.</p>
<h1 id="heading-deciding-on-what-information-to-store">Deciding on what information to store</h1>
<p>I will scrape the following information about the films:</p>
<ul>
<li>film title</li>
<li>TV channel</li>
<li>the time that the film starts</li>
<li>the date the film is on TV</li>
<li>genre</li>
<li>plot</li>
<li>release date</li>
<li>link to the details page on TMDB</li>
<li>TMDB rating</li>
</ul>
<p>You could complement this list with all actors, the director, interesting film facts, and so on – all the information you’d like to know more about.</p>
<p>In Scrapy this information will be stored in the fields of an <strong><em>Item</em></strong>.</p>
<h1 id="heading-create-the-scrapy-project">Create the Scrapy project</h1>
<p>I am going to assume that you have Scrapy installed. If not, you can follow the excellent <a target="_blank" href="http://doc.scrapy.org/en/latest/intro/install.html">Scrapy installation guide</a>.</p>
<p>When Scrapy is installed, open the command line and go to the directory where you want to store the Scrapy project. Then run:</p>
<pre><code>scrapy startproject topfilms
</code></pre><p>This will create a folder structure for the top films project as shown below. You can ignore the topfilms.db file for now. This is the SQLite database that we will create in the next blog post on Pipelines.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/01/0_dZ6phochXc8Dq1L6.png" alt="Image" width="600" height="400" loading="lazy"></p>
<h1 id="heading-defining-scrapy-items">Defining Scrapy Items</h1>
<p>We’ll be working with the file <strong><em>items.py</em></strong>. Items.py is created by default when creating your Scrapy project.</p>
<p>An <code>scrapy.Item</code> is a container that will be filled during the web scraping. It will hold all the fields that we want to extract from the web page(s). The contents of the Item can be accessed in the same way as a <strong><em>Python dict</em></strong>.</p>
<p>Open items.py and add a <code>Scrapy.Item class</code> with the following fields:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> scrapy
<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">TVGuideItem</span>(<span class="hljs-params">scrapy.Item</span>):</span>
    title = scrapy.Field()
    channel = scrapy.Field()
    start_ts = scrapy.Field()
    film_date_long = scrapy.Field()
    film_date_short = scrapy.Field()
    genre = scrapy.Field()
    plot = scrapy.Field()
    rating = scrapy.Field()
    tmdb_link = scrapy.Field()
    release_date = scrapy.Field()
    nb_votes = scrapy.Field()
</code></pre>
<h1 id="heading-processing-items-with-pipelines">Processing Items with Pipelines</h1>
<p>After starting a new Scrapy project, you’ll have a file called <strong>pipelines.py</strong>. Open this file and copy-paste the code shown below. Afterward, I’ll show you step-by-step what each part of the code does.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> sqlite3 <span class="hljs-keyword">as</span> lite
con = <span class="hljs-literal">None</span>  <span class="hljs-comment"># db connection</span>
<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">StoreInDBPipeline</span>(<span class="hljs-params">object</span>):</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-params">self</span>):</span>
        self.setupDBCon()
        self.dropTopFilmsTable()
        self.createTopFilmsTable()
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">process_item</span>(<span class="hljs-params">self, item, spider</span>):</span>
        self.storeInDb(item)
        <span class="hljs-keyword">return</span> item
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">storeInDb</span>(<span class="hljs-params">self, item</span>):</span>
        self.cur.execute(<span class="hljs-string">"INSERT INTO topfilms(\
        title, \
        channel, \
        start_ts, \
        film_date_long, \
        film_date_short, \
        rating, \
        genre, \
        plot, \
        tmdb_link, \
        release_date, \
        nb_votes \
        ) \
        VALUES( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ? )"</span>,
        (
        item[<span class="hljs-string">'title'</span>],
        item[<span class="hljs-string">'channel'</span>],
        item[<span class="hljs-string">'start_ts'</span>],
        item[<span class="hljs-string">'film_date_long'</span>],
        item[<span class="hljs-string">'film_date_short'</span>],
        float(item[<span class="hljs-string">'rating'</span>]),
        item[<span class="hljs-string">'genre'</span>],
        item[<span class="hljs-string">'plot'</span>],
        item[<span class="hljs-string">'tmdb_link'</span>],
        item[<span class="hljs-string">'release_date'</span>],
        item[<span class="hljs-string">'nb_votes'</span>]
        ))
        self.con.commit()
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">setupDBCon</span>(<span class="hljs-params">self</span>):</span>
        self.con = lite.connect(<span class="hljs-string">'topfilms.db'</span>)
        self.cur = self.con.cursor()
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__del__</span>(<span class="hljs-params">self</span>):</span>
        self.closeDB()
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">createTopFilmsTable</span>(<span class="hljs-params">self</span>):</span>
        self.cur.execute(<span class="hljs-string">"CREATE TABLE IF NOT EXISTS topfilms(id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, \
        title TEXT, \
        channel TEXT, \
        start_ts TEXT, \
        film_date_long TEXT, \
        film_date_short TEXT, \
        rating TEXT, \
        genre TEXT, \
        plot TEXT, \
        tmdb_link TEXT, \
        release_date TEXT, \
        nb_votes \
        )"</span>)
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">dropTopFilmsTable</span>(<span class="hljs-params">self</span>):</span>
        self.cur.execute(<span class="hljs-string">"DROP TABLE IF EXISTS topfilms"</span>)

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">closeDB</span>(<span class="hljs-params">self</span>):</span>
        self.con.close()
</code></pre>
<p>First, we start by importing the <a target="_blank" href="https://docs.python.org/2/library/sqlite3.html">SQLite package</a> and give it the alias <code>lite</code>. We also initialize a variable <code>con</code> which is used for the database connection.</p>
<h2 id="heading-creating-a-class-to-store-items-in-the-database">Creating a class to store Items in the database</h2>
<p>Next, you create a <a target="_blank" href="https://docs.python.org/2/tutorial/classes.html"><strong><em>class</em></strong></a> with a logical name. After enabling the pipeline in the settings file (more on that later), this class will be called.</p>
<pre><code class="lang-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">StoreInDBPipeline</span>(<span class="hljs-params">object</span>):</span>
</code></pre>
<h2 id="heading-defining-the-constructor-method">Defining the constructor method</h2>
<p>The constructor method is the method with the name <code>__init__</code>. This method is automatically run when creating an instance of the <code>StoreInDBPipeline</code> class.</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-params">self</span>):</span>
    self.setupDBCon()
    self.dropTopFilmsTable()
    self.createTopFilmsTable()
</code></pre>
<p>In the constructor method, we launch three other methods which are defined below the constructor method.</p>
<h2 id="heading-setupdbcon-method">SetupDBCon Method</h2>
<p>With the method <code>setupDBCon</code>, we create the <code>topfilms</code> database (if it didn’t exist yet) and make a connection to it with the <code>connect</code> function.</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">setupDBCon</span>(<span class="hljs-params">self</span>):</span>
    self.con = lite.connect(<span class="hljs-string">'topfilms.db'</span>)
    self.cur = self.con.cursor()
</code></pre>
<p>Here we use the alias lite for the SQLite package. Secondly, we create a Cursor object with the <code>cursor</code> function. With this Cursor object, we can execute SQL statements in the database.</p>
<h2 id="heading-droptopfilmstable-method">DropTopFilmsTable Method</h2>
<p>The second method that is called in the constructor is <code>dropTopFilmsTable</code>. As the name says, it drops the table in the SQLite database.</p>
<p>Each time the web scraper is run the database will be completely removed. It is up to you if you want to do that as well. If you want to do some querying or analysis of the films’ data, you could keep the scraping results of each run.</p>
<p>I just want to see the top rated films of the coming days and nothing more. Therefore I decided to delete the database in each run.</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">dropTopFilmsTable</span>(<span class="hljs-params">self</span>):</span>
    self.cur.execute(<span class="hljs-string">"DROP TABLE IF EXISTS topfilms"</span>)
</code></pre>
<p>With the Cursor object <code>cur</code> we execute the <code>DROP</code> statement.</p>
<h2 id="heading-createtopfilmstable-method">CreateTopFilmsTable Method</h2>
<p>After dropping the top films table, we need to create it. This is done by the last method call in the constructor method.</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">createTopFilmsTable</span>(<span class="hljs-params">self</span>):</span>
    self.cur.execute(<span class="hljs-string">"CREATE TABLE IF NOT EXISTS topfilms(id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, \
    title TEXT, \
    channel TEXT, \
    start_ts TEXT, \
    film_date_long TEXT, \
    film_date_short TEXT, \
    rating TEXT, \
    genre TEXT, \
    plot TEXT, \
    tmdb_link TEXT, \
    release_date TEXT, \
    nb_votes \
    )"</span>)
</code></pre>
<p>Again we use the Cursor object <code>cur</code> to execute the <code>CREATE TABLE</code> statement. The fields that are added to the table top films are the same as in the Scrapy Item we created before. To keep things easy, I use exactly the same names in the SQLite table as in the Item. Only the <code>id</code> field is extra.</p>
<p><em><strong>Sidenote</strong></em>: a good application to look at your SQLite databases is the <a target="_blank" href="https://addons.mozilla.org/nl/firefox/addon/sqlite-manager/">SQLite Manager plugin in Firefox</a>. You can watch this <a target="_blank" href="https://youtu.be/y-yA7YT-7gw">SQLite Manager tutorial on Youtube</a> to learn how to use this plugin.</p>
<h2 id="heading-processitem-method">Process_item Method</h2>
<p>This method must be implemented in the Pipeline class and it must return a dict, an Item or DropItem exception. In our web scraper, we will return the item.</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">process_item</span>(<span class="hljs-params">self, item, spider</span>):</span>
    self.storeInDb(item)
    <span class="hljs-keyword">return</span> item
</code></pre>
<p>In contrast with the other methods explained, it has two extra arguments. The <code>item</code> that was scraped and the <code>spider</code> that scraped the item. From this method, we launch the <code>storeInDb</code> method and afterward return the item.</p>
<h2 id="heading-storeindb-method">StoreInDb Method</h2>
<p>This method executes an <code>INSERT</code> statement to insert the scraped item into the SQLite database.</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">storeInDb</span>(<span class="hljs-params">self, item</span>):</span>
    self.cur.execute(<span class="hljs-string">"INSERT INTO topfilms(\
    title, \
    channel, \
    start_ts, \
    film_date_long, \
    film_date_short, \
    rating, \
    genre, \
    plot, \
    tmdb_link, \
    release_date, \
    nb_votes \
    ) \
    VALUES( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ? )"</span>,
                     (
                         item[<span class="hljs-string">'title'</span>],
                         item[<span class="hljs-string">'channel'</span>],
                         item[<span class="hljs-string">'start_ts'</span>],
                         item[<span class="hljs-string">'film_date_long'</span>],
                         item[<span class="hljs-string">'film_date_short'</span>],
                         float(item[<span class="hljs-string">'rating'</span>]),
                         item[<span class="hljs-string">'genre'</span>],
                         item[<span class="hljs-string">'plot'</span>],
                         item[<span class="hljs-string">'tmdb_link'</span>],
                         item[<span class="hljs-string">'release_date'</span>],
                         item[<span class="hljs-string">'nb_votes'</span>]
                     ))
    self.con.commit()
</code></pre>
<p>The values for the table fields come from the item, which is an argument for this method. These values are simply called as a dict value (remember that an Item is nothing more than a dict?).</p>
<h2 id="heading-every-constructor-has-a-destructor">Every constructor has a... destructor</h2>
<p>The counterpart of the constructor method is the destructor method with the name <code>__del__</code>. In the destructor method for this pipelines class, we close the connection to the database.</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__del__</span>(<span class="hljs-params">self</span>):</span>
    self.closeDB()
</code></pre>
<h2 id="heading-closedb-method">CloseDB Method</h2>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">closeDB</span>(<span class="hljs-params">self</span>):</span>
    self.con.close()
</code></pre>
<p>In this last method, we close the database connection with the <code>close</code> function. So now we have written a fully functional pipeline. There is still one last step left to enable the pipeline.</p>
<h2 id="heading-enabling-the-pipeline-in-settingspy">Enabling the pipeline in settings.py</h2>
<p>Open the <strong><em>settings.py</em></strong> file and add the following code:</p>
<pre><code class="lang-python">ITEM_PIPELINES = {
    <span class="hljs-string">'topfilms.pipelines.StoreInDBPipeline'</span>:<span class="hljs-number">1</span>
}
</code></pre>
<p>The <strong><em>integer value</em></strong> indicates the order in which the pipelines are run. As we have only one pipeline, we assign it the value 1.</p>
<h1 id="heading-creating-a-spider-in-scrapy">Creating a Spider in Scrapy</h1>
<p>Now we’ll be looking at the core of Scrapy, the <strong><em>Spider</em></strong>. This is where the heavy lifting of your web scraper will be done. I’ll show you step-by-step how to create one.</p>
<h2 id="heading-importing-the-necessary-packages">Importing the necessary packages</h2>
<p>First of all, we’ll import the necessary packages and modules. We use the <code>CrawlSpider</code> module to follow links throughout the online TV guide.</p>
<p><code>Rule</code> and <code>LinkExtractor</code> are used to determine which links we want to follow.</p>
<p>The <code>config</code> module contains some constants like <code>DOM_1, DOM_2</code> and <code>START_URL</code> that are used in the Spider. The config module is found one directory up to the current directory. That’s why you see two dots before the config module.</p>
<p>And lastly, we import the <code>TVGuideItem</code>. This TVGuideItem will be used to contain the information during the scraping.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> scrapy
<span class="hljs-keyword">from</span> scrapy.spiders <span class="hljs-keyword">import</span> CrawlSpider, Rule
<span class="hljs-keyword">from</span> scrapy.linkextractors <span class="hljs-keyword">import</span> LinkExtractor
<span class="hljs-keyword">from</span> fuzzywuzzy <span class="hljs-keyword">import</span> fuzz
<span class="hljs-keyword">from</span> ..config <span class="hljs-keyword">import</span> *
<span class="hljs-keyword">from</span> topfilms.items <span class="hljs-keyword">import</span> TVGuideItem
</code></pre>
<h2 id="heading-telling-the-spider-where-to-go">Telling the Spider where to go</h2>
<p>Secondly we subclass the CrawlSpider class. This is done by inserting CrawlSpider as an argument for the <code>TVGuideSpider</code> class.</p>
<p>We give the Spider a <code>name</code>, provide the <code>allowed_domains</code> (e.g. themoviedb.org) and the <code>start_urls</code>. The start_urls is in my case the web page of the TV guide, so you should change this by your own preferred website.</p>
<p>With <code>rules</code> and the <code>deny</code> argument we tell the Spider which URLs (not) to follow on the start URL. The URL not to follow is specified with a regular expression.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/01/0_r0r11AyaIBEC7ODH.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>I am not interested in the films that were shown yesterday, don't allow the Spider to follow URLs ending with “<em>gisteren</em>“.</p>
<p>OK, but which URLs should the Spider follow? For that, I use the <code>restrict_xpaths</code> argument. It says to follow all URLs with the <code>class=”button button–beta”</code>. These are in fact URLs with films for the coming week.</p>
<p>Finally, with the <code>callback</code> argument we let the Spider know what to do when it is following one of the URLs. It will execute the function <code>parse_by_day</code>. I’ll explain that in the next part.</p>
<pre><code class="lang-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">TVGuideSpider</span>(<span class="hljs-params">CrawlSpider</span>):</span>
    name = <span class="hljs-string">"tvguide"</span>
    allowed_domains = [DOM_1, DOM_2]
    start_urls = [START_URL]
<span class="hljs-comment"># Extract the links from the navigation per day</span>
    <span class="hljs-comment"># We will not crawl the films for yesterday</span>
    rules = (
        Rule(LinkExtractor(allow=(), deny=(<span class="hljs-string">r'\/gisteren'</span>), restrict_xpaths=(<span class="hljs-string">'//a[@class="button button--beta"]'</span>,)), callback=<span class="hljs-string">"parse_by_day"</span>, follow= <span class="hljs-literal">True</span>),
    )
</code></pre>
<h2 id="heading-parsing-the-followed-urls">Parsing the followed urls</h2>
<p>The <code>parse_by_day</code> function, part of the TVGuideScraper, scrapes the web pages with the overview of all films per channel per day. The <code>response</code> argument comes from the <code>Request</code> that has been launched when running the web scraping program.</p>
<p>On the web page being scraped you need to find the HTML elements that are used to show the information we are interested in. Two good tools for this are the <a target="_blank" href="https://developers.google.com/web/tools/chrome-devtools/">Chrome Developer Tools</a> and the <a target="_blank" href="https://addons.mozilla.org/nl/firefox/addon/firebug/">Firebug plugin in Firefox</a>.</p>
<p>One thing we want to store is the <code>date</code> for the films we are scraping. This date can be found in the paragraph (p) in the div with <code>class="grid__col__inner"</code>. Clearly, this is something you should modify for the page you are scraping.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/01/0_-EML6UFd2TbqVCmY.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>With the <code>xpath method</code> of the Response object, we extract the text in the paragraph. I learned a lot of this in the <a target="_blank" href="http://zvon.org/comp/r/tut-XPath_1.html">tutorial on how to use the xpath function</a>.</p>
<p>By using <code>extract_first</code>, we make sure that we do not store this date as a list. Otherwise, this will give us issues when storing the date in the SQLite database.</p>
<p>Afterwards, I perform some data cleaning on film_date_long and create <code>film_date_short</code> with the format YYYYMMDD. I created this YYYYMMDD format to sort the films chronologically later on.</p>
<p>Next, the TV channel is scraped. If it is in the list of <code>ALLOWED_CHANNELS</code> (defined in the config module), we continue to scrape the title and starting time. This information is stored in the item, which is initiated by <code>TVGuideItem()</code>.</p>
<p>After this, we want to continue scraping on The Movie Database. We will use the URL <a target="_blank" href="https://www.themoviedb.org/search?query="><strong>https://www.themoviedb.org/search?query=</strong></a> to show search results for the film being scraped. To this URL, we want to add the film title (<code>url_part</code> in the code). We simply re-use the URL part that is found in the link on the TV guide web page.</p>
<p>With that URL, we create a new request and continue on TMDB. With <code>request.meta['item'] = item</code> we add the already scraped data to the request. This way we can continue to fill up our current TVGuideItem.</p>
<p><code>Yield request</code> actually launches the request.</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">parse_by_day</span>(<span class="hljs-params">self, response</span>):</span>
    film_date_long = response.xpath(<span class="hljs-string">'//div[@class="grid__col__inner"]/p/text()'</span>).extract_first()
    film_date_long = film_date_long.rsplit(<span class="hljs-string">','</span>,<span class="hljs-number">1</span>)[<span class="hljs-number">-1</span>].strip()  <span class="hljs-comment"># Remove day name and white spaces</span>
    <span class="hljs-comment"># Create a film date with a short format like YYYYMMDD to sort the results chronologically</span>
    film_day_parts = film_date_long.split()
    months_list = [<span class="hljs-string">'januari'</span>, <span class="hljs-string">'februari'</span>, <span class="hljs-string">'maart'</span>,
                  <span class="hljs-string">'april'</span>, <span class="hljs-string">'mei'</span>, <span class="hljs-string">'juni'</span>, <span class="hljs-string">'juli'</span>,
                  <span class="hljs-string">'augustus'</span>, <span class="hljs-string">'september'</span>, <span class="hljs-string">'oktober'</span>,
                  <span class="hljs-string">'november'</span>, <span class="hljs-string">'december'</span> ]
    year = str(film_day_parts[<span class="hljs-number">2</span>])
    month = str(months_list.index(film_day_parts[<span class="hljs-number">1</span>]) + <span class="hljs-number">1</span>).zfill(<span class="hljs-number">2</span>)
    day = str(film_day_parts[<span class="hljs-number">0</span>]).zfill(<span class="hljs-number">2</span>)
    film_date_short = year + month + day
    <span class="hljs-keyword">for</span> col_inner <span class="hljs-keyword">in</span> response.xpath(<span class="hljs-string">'//div[@class="grid__col__inner"]'</span>):
        chnl = col_inner.xpath(<span class="hljs-string">'.//div[@class="tv-guide__channel"]/h6/a/text()'</span>).extract_first()
        <span class="hljs-keyword">if</span> chnl <span class="hljs-keyword">in</span> ALLOWED_CHANNELS:
            <span class="hljs-keyword">for</span> program <span class="hljs-keyword">in</span> col_inner.xpath(<span class="hljs-string">'.//div[@class="program"]'</span>):
                item = TVGuideItem()
                item[<span class="hljs-string">'channel'</span>] = chnl
                item[<span class="hljs-string">'title'</span>] = program.xpath(<span class="hljs-string">'.//div[@class="title"]/a/text()'</span>).extract_first()
                item[<span class="hljs-string">'start_ts'</span>] = program.xpath(<span class="hljs-string">'.//div[@class="time"]/text()'</span>).extract_first()
                item[<span class="hljs-string">'film_date_long'</span>] = film_date_long
                item[<span class="hljs-string">'film_date_short'</span>] = film_date_short
                detail_link = program.xpath(<span class="hljs-string">'.//div[@class="title"]/a/@href'</span>).extract_first()
                url_part = detail_link.rsplit(<span class="hljs-string">'/'</span>,<span class="hljs-number">1</span>)[<span class="hljs-number">-1</span>]
                <span class="hljs-comment"># Extract information from the Movie Database www.themoviedb.org</span>
                request = scrapy.Request(<span class="hljs-string">"https://www.themoviedb.org/search?query="</span>+url_part,callback=self.parse_tmdb)
                request.meta[<span class="hljs-string">'item'</span>] = item  <span class="hljs-comment"># Pass the item with the request to the detail page</span>
    <span class="hljs-keyword">yield</span> request
</code></pre>
<h2 id="heading-scraping-additional-information-on-the-movie-database">Scraping additional information on The Movie DataBase</h2>
<p>As you can see in the request created in the function <code>parse_by_day</code>, we use the callback function <code>parse_tmdb</code>. This function is used during the request to scrape the TMDB website.</p>
<p>In the first step, we get the item information that was passed by the <code>parse_by_day</code> function.</p>
<p>The page with search results on TMDB can possibly list multiple search results for the same film title (url_part passed in the query). We also check whether there are results with <code>if tmddb_titles</code>.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/01/0_ncBMqbk9fzZ-Szi0.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>We use the <a target="_blank" href="https://pypi.python.org/pypi/fuzzywuzzy">fuzzywuzzy</a> package to perform fuzzy matching on the film titles. In order to use the fuzzywuzzy package we need to add the <code>import</code> statement together with the previous import statements.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> fuzzywuzzy <span class="hljs-keyword">import</span> fuzz
</code></pre>
<p>If we find a 90% match we use that search result to do the rest of the scraping. We do not look at the other search results anymore. To do that we use the <code>break</code> statement.</p>
<p>Next, we gather <code>genre</code>, <code>rating</code> and <code>release_date</code> from the search results page in a similar way we used the xpath function before. To get a YYYYMMDD format for the release date, we execute some data processing with the <code>split</code> and <code>join</code> functions.</p>
<p>Again we want to launch a new request to the details page on TMDB. This request will call the <code>parse_tmdb_detail</code> function to extract the film plot and number of votes on TMDB. This is explained in the next section.</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">parse_tmdb</span>(<span class="hljs-params">self, response</span>):</span>
    item = response.meta[<span class="hljs-string">'item'</span>]  <span class="hljs-comment"># Use the passed item</span>


    tmdb_titles = response.xpath(<span class="hljs-string">'//a[@class="title result"]/text()'</span>).extract()
    <span class="hljs-keyword">if</span> tmdb_titles:  <span class="hljs-comment"># Check if there are results on TMDB</span>
        <span class="hljs-keyword">for</span> tmdb_title <span class="hljs-keyword">in</span> tmdb_titles:
            match_ratio = fuzz.ratio(item[<span class="hljs-string">'title'</span>], tmdb_title)
            <span class="hljs-keyword">if</span> match_ratio &gt; <span class="hljs-number">90</span>:
                item[<span class="hljs-string">'genre'</span>] = response.xpath(<span class="hljs-string">'.//span[@class="genres"]/text()'</span>).extract_first()
                item[<span class="hljs-string">'rating'</span>] = response.xpath(<span class="hljs-string">'//span[@class="vote_average"]/text()'</span>).extract_first()
                release_date = response.xpath(<span class="hljs-string">'.//span[@class="release_date"]/text()'</span>).extract_first()
                release_date_parts = release_date.split(<span class="hljs-string">'/'</span>)
                item[<span class="hljs-string">'release_date'</span>] = <span class="hljs-string">"/"</span>.join(
                    [release_date_parts[<span class="hljs-number">1</span>].strip(), release_date_parts[<span class="hljs-number">0</span>].strip(), release_date_parts[<span class="hljs-number">2</span>].strip()])
                tmdb_link = <span class="hljs-string">"https://www.themoviedb.org"</span> + response.xpath(
                    <span class="hljs-string">'//a[@class="title result"]/@href'</span>).extract_first()
                item[<span class="hljs-string">'tmdb_link'</span>] = tmdb_link
                <span class="hljs-comment"># Extract more info from the detail page</span>
                request = scrapy.Request(tmdb_link, callback=self.parse_tmdb_detail)
                request.meta[<span class="hljs-string">'item'</span>] = item  <span class="hljs-comment"># Pass the item with the request to the detail page</span>
    <span class="hljs-keyword">yield</span> request
    <span class="hljs-keyword">break</span>  <span class="hljs-comment"># We only consider the first match</span>
    <span class="hljs-keyword">else</span>:
        <span class="hljs-keyword">return</span>
</code></pre>
<h2 id="heading-scraping-the-film-plot-from-the-details-page">Scraping the film plot from the details page</h2>
<p>The last function we’ll discuss is a short one. As before we get the item passed by the parse_tmdb function and scrape the details page for the <code>plot</code> and <code>number of votes</code>.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/01/0_C-Tj8dZ8yxfx_3gV.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p>At this stage, we are finished scraping the information for the film. In other words, the item for the film is completely filled up. Scrapy will then use the code written in the pipelines to process these data and put it in the database.</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">parse_tmdb_detail</span>(<span class="hljs-params">self, response</span>):</span>
    item = response.meta[<span class="hljs-string">'item'</span>]  <span class="hljs-comment"># Use the passed item</span>
    item[<span class="hljs-string">'nb_votes'</span>] = response.xpath(<span class="hljs-string">'//span[@itemprop="ratingCount"]/text()'</span>).extract_first()
    item[<span class="hljs-string">'plot'</span>] = response.xpath(<span class="hljs-string">'.//p[@id="overview"]/text()'</span>).extract_first()
    <span class="hljs-keyword">yield</span> item
</code></pre>
<h1 id="heading-using-extensions-in-scrapy">Using Extensions in Scrapy</h1>
<p>In the section about Pipelines, we already saw how we store the scraping results in an SQLite database. Now I will show you how you can <strong><em>send the scraping results via email.</em></strong> This way you get a nice overview of the top rated films for the coming week in your mailbox.</p>
<h2 id="heading-importing-the-necessary-packages-1">Importing the necessary packages</h2>
<p>We will be working with the file <strong><em>extensions.py</em></strong>. This file is automatically created in the root directory when you created the Scrapy project. We start by importing the packages which we’ll use later in this file.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> logging
<span class="hljs-keyword">from</span> scrapy <span class="hljs-keyword">import</span> signals
<span class="hljs-keyword">from</span> scrapy.exceptions <span class="hljs-keyword">import</span> NotConfigured
<span class="hljs-keyword">import</span> smtplib
<span class="hljs-keyword">import</span> sqlite3 <span class="hljs-keyword">as</span> lite
<span class="hljs-keyword">from</span> config <span class="hljs-keyword">import</span> *
</code></pre>
<p>The <code>logging</code> package is not really required. But this package can be useful for debugging your program or just to write some information to the log.<br>The <code>signals</code> module will help us to know when the spider has been opened and closed. We will send the email with the films after the spider has done its job.</p>
<p>From the <code>scrapy.exceptions</code> module we import the method <code>NotConfigured</code>. This will be raised when the extension is not configured in the <strong><em>settings.py</em></strong> file. Concretely the parameter <code>MYEXT_ENABLED</code> must be set to <code>True</code>. We’ll see this later in the code.</p>
<p>The <code>smtplib</code> package is imported to be able to send the email. I use my Gmail address to send the email, but you could adapt the code in config.py to use another email service.</p>
<p>Lastly, we import the <code>sqlite3</code> package to extract the top-rated films from the database and import <code>config</code> to get our constants.</p>
<h2 id="heading-creating-the-sendemail-class-in-the-extensions">Creating the SendEmail class in the extensions</h2>
<p>First, we define the <code>logger</code> object. With this object we can write messages to the log at certain events. Then we create the <code>SendEmail</code> class with the constructor method. In the constructor, we assign <code>FROMADDR</code> and <code>TOADDR</code> to the corresponding attributes of the class. These constants are set in the <strong><em>config.py</em></strong> file. I used my Gmail address for both attributes.</p>
<pre><code class="lang-python">logger = logging.getLogger(__name__)
<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">SendEmail</span>(<span class="hljs-params">object</span>):</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-params">self</span>):</span>
        self.fromaddr = FROMADDR
        self.toaddr  = TOADDR
</code></pre>
<h2 id="heading-instantiating-the-extension-object">Instantiating the extension object</h2>
<p>The first method of the <code>SendEmail</code> object is <code>from_crawler</code>. The first check we do is whether <code>MYEXT_ENABLED</code> is enabled in the settings.py file. If this is not the case, we raise a <code>NotConfigured</code> exception. When this happens, the rest of the code in the extension is not executed.</p>
<p>In the <strong><em>settings.py</em></strong> file we need to add the following code to enable this extension.</p>
<pre><code class="lang-python">MYEXT_ENABLED = <span class="hljs-literal">True</span>
EXTENSIONS = {
    <span class="hljs-string">'topfilms.extensions.SendEmail'</span>: <span class="hljs-number">500</span>,
    <span class="hljs-string">'scrapy.telnet.TelnetConsole'</span>: <span class="hljs-literal">None</span>
}
</code></pre>
<p>So we set the Boolean flag <code>MYEXT_ENABLED</code> to <code>True</code>. Then we add our own extension <code>SendEmail</code> to the <code>EXTENSIONS</code> dictionary. The integer value of 500 specifies the order in which the extension must be executed. I also had to disable the <code>TelnetConsole</code>. Otherwise sending the email did not work. This extension is disabled by putting <code>None</code>instead of an integer order value.</p>
<p>Next, we instantiate the extension object with the <code>cls()</code> function. To this extension object we connect some <code>signals</code>. We are interested in the <code>spider_opened</code> and <code>spider_closed</code>signals. And lastly we return the <code>ext</code> object.</p>
<pre><code>@classmethod
def from_crawler(cls, crawler):
    # first check <span class="hljs-keyword">if</span> the extension should be enabled and raise
    # NotConfigured otherwise
    <span class="hljs-keyword">if</span> not crawler.settings.getbool(<span class="hljs-string">'MYEXT_ENABLED'</span>):
        raise NotConfigured
    # instantiate the extension object
    ext = cls()
    # connect the extension object to signals
    crawler.signals.connect(ext.spider_opened, signal=signals.spider_opened)
    crawler.signals.connect(ext.spider_closed, signal=signals.spider_closed)
    # <span class="hljs-keyword">return</span> the extension object
    <span class="hljs-keyword">return</span> ext
</code></pre><h2 id="heading-define-the-actions-in-the-spideropened-event">Define the actions in the spider_opened event</h2>
<p>When the spider has been opened we simply want to write this to the log. Therefore we use the <code>logger</code> object which we created at the top of the code. With the <code>info</code> method we write a message to the log. <code>Spider.name</code> is replaced by the name we defined in the TVGuideSpider.py file.</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">spider_opened</span>(<span class="hljs-params">self, spider</span>):</span>
    logger.info(<span class="hljs-string">"opened spider %s"</span>, spider.name)
</code></pre>
<h2 id="heading-sending-the-email-after-the-spiderclosed-event">Sending the email after the spider_closed event</h2>
<p>In the last method of the <code>SendEmail</code> class we send the email containing the overview with top rated films.</p>
<p>Again we send a notification to the log that the spider has been closed. Secondly, we create a connection to the SQLite database containing all the films of the coming week for the <strong>_ALLOWED<em>CHANNELS.</em></strong> We select the films with a <code>rating &gt;= 6.5</code>. You can change the rating to a higher or lower threshold as you wish. The resulting films are then sorted by <code>film_date_short</code>, which has the YYYYMMDD format and by the starting time <code>start_ts</code>.</p>
<p>We fetch all rows in the cursor <code>cur</code> and check whether we have some results with the <code>len</code> function. It is possible to have no results when you set the threshold rating too high, for example.</p>
<p>With <code>for row in data</code> we go through each resulting film. We extract all the interesting information from the <code>row</code>. For some data we apply some encoding with <code>encode('ascii','ignore')</code>. This is to ignore some of the special characters like é, à, è, and so on. Otherwise we get errors when sending the email.</p>
<p>When all data about the film is gathered, we compose a string variable <code>topfilm</code>. Each <code>topfilm</code> is then concatenated to the variable <code>topfilms_overview</code>, which will be the message of the email we send. If we have no film in our query result, we mention this in a short message.</p>
<p>At the end, we send the message with the Gmail address, thanks to the <code>smtplib</code> package.</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">spider_closed</span>(<span class="hljs-params">self, spider</span>):</span>
    logger.info(<span class="hljs-string">"closed spider %s"</span>, spider.name)
    <span class="hljs-comment"># Getting films with a rating above a threshold</span>
    topfilms_overview = <span class="hljs-string">""</span>
    con = lite.connect(<span class="hljs-string">'topfilms.db'</span>)
    cur = con.execute(
        <span class="hljs-string">"SELECT title, channel, start_ts, film_date_long, plot, genre, release_date, rating, tmdb_link, nb_votes "</span>
        <span class="hljs-string">"FROM topfilms "</span>
        <span class="hljs-string">"WHERE rating &gt;= 6.5 "</span>
        <span class="hljs-string">"ORDER BY film_date_short, start_ts"</span>)


    data = cur.fetchall()
    <span class="hljs-keyword">if</span> len(data) &gt; <span class="hljs-number">0</span>:  <span class="hljs-comment"># Check if we have records in the query result</span>
        <span class="hljs-keyword">for</span> row <span class="hljs-keyword">in</span> data:
            title = row[<span class="hljs-number">0</span>].encode(<span class="hljs-string">'ascii'</span>, <span class="hljs-string">'ignore'</span>)
            channel = row[<span class="hljs-number">1</span>]
            start_ts = row[<span class="hljs-number">2</span>]
            film_date_long = row[<span class="hljs-number">3</span>]
            plot = row[<span class="hljs-number">4</span>].encode(<span class="hljs-string">'ascii'</span>, <span class="hljs-string">'ignore'</span>)
            genre = row[<span class="hljs-number">5</span>]
            release_date = row[<span class="hljs-number">6</span>].rstrip()
            rating = row[<span class="hljs-number">7</span>]
            tmdb_link = row[<span class="hljs-number">8</span>]
            nb_votes = row[<span class="hljs-number">9</span>]
            topfilm = <span class="hljs-string">' - '</span>.join([title, channel, film_date_long, start_ts])
            topfilm = topfilm + <span class="hljs-string">"\r\n"</span> + <span class="hljs-string">"Release date: "</span> + release_date
            topfilm = topfilm + <span class="hljs-string">"\r\n"</span> + <span class="hljs-string">"Genre: "</span> + str(genre)
            topfilm = topfilm + <span class="hljs-string">"\r\n"</span> + <span class="hljs-string">"TMDB rating: "</span> + rating + <span class="hljs-string">" from "</span> + nb_votes + <span class="hljs-string">" votes"</span>
            topfilm = topfilm + <span class="hljs-string">"\r\n"</span> + plot
            topfilm = topfilm + <span class="hljs-string">"\r\n"</span> + <span class="hljs-string">"More info on: "</span> + tmdb_link
            topfilms_overview = <span class="hljs-string">"\r\n\r\n"</span>.join([topfilms_overview, topfilm])
    con.close()
    <span class="hljs-keyword">if</span> len(topfilms_overview) &gt; <span class="hljs-number">0</span>:
        message = topfilms_overview
    <span class="hljs-keyword">else</span>:
        message = <span class="hljs-string">"There are no top rated films for the coming week."</span>
    msg = <span class="hljs-string">"\r\n"</span>.join([
        <span class="hljs-string">"From: "</span> + self.fromaddr,
        <span class="hljs-string">"To: "</span> + self.toaddr,
        <span class="hljs-string">"Subject: Top Films Overview"</span>,
        message
    ])
    username = UNAME
    password = PW
    server = smtplib.SMTP(GMAIL)
    server.ehlo()
    server.starttls()
    server.login(username, password)
    server.sendmail(self.fromaddr, self.toaddr, msg)
    server.quit()
</code></pre>
<h2 id="heading-result-of-sending-emails-via-extensions">Result of sending emails via Extensions</h2>
<p>The final result of this piece of code is an overview with top rated films in your mailbox. Great! Now you don’t have to look this up anymore on the online TV guide.</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2020/01/0_SuRZuKi2RIkRJD3y.png" alt="Image" width="600" height="400" loading="lazy"></p>
<h1 id="heading-tricks-to-avoid-ip-banning">Tricks to avoid IP banning</h1>
<p>When you make many requests in a short period of time, you risk being banned by the server. In this final section, I’ll show you some tricks to avoid IP banning.</p>
<h2 id="heading-delaying-your-requests">Delaying your requests</h2>
<p>One simple way to avoid IP banning is to <strong><em>pause between each request</em></strong>. In Scrapy this can be done by simply setting a parameter in the <strong><em>settings.py</em></strong> file. As you probably noticed, the settings.py file has a lot of parameters commented out.</p>
<p>Search for the parameter <code>DOWNLOAD_DELAY</code> and uncomment it. I set the <strong><em>pause length to 2 seconds</em></strong>. Depending on how many requests you have to make, you can change this. But I would set it to at least 1 second.</p>
<pre><code class="lang-python">DOWNLOAD_DELAY=<span class="hljs-number">2</span>
</code></pre>
<h2 id="heading-more-advanced-way-for-avoiding-ip-banning">More advanced way for avoiding IP banning</h2>
<p>By default, each time you make a request, you do this with the <strong><em>same user agent</em></strong>. Thanks to the package <code>fake_useragent</code> we can easily change the user agent for each request.</p>
<p>All credits for this piece of code go to <a target="_blank" href="https://github.com/alecxe/scrapy-fake-useragent">Alecxe</a> who wrote a nice Python script to make use of the fake_useragent package.</p>
<p>First, we create a folder <strong>_scrapy_fake<em>useragent</em></strong> in the root directory of our web scraper project. In this folder we add two files:</p>
<ul>
<li><strong><strong>_init</strong>.py_</strong> which is an empty file</li>
<li><strong><em>middleware.py</em></strong></li>
</ul>
<p>To use this <a target="_blank" href="http://doc.scrapy.org/en/latest/topics/spider-middleware.html">middleware</a> we need to enable it in the <strong><em>settings.py</em></strong> file. This is done with the code:</p>
<pre><code class="lang-python">DOWNLOADER_MIDDLEWARES = {
    <span class="hljs-string">'scrapy.downloadermiddleware.useragent.UserAgentMiddleware'</span>: <span class="hljs-literal">None</span>,
    <span class="hljs-string">'scrapy_fake_useragent.middleware.RandomUserAgentMiddleware'</span>: <span class="hljs-number">400</span>,
}
</code></pre>
<p>First, we disable the default <code>UserAgentMiddleware</code> of Scrapy by specifying <em>None</em> instead of an integer value. Then we enable our own middleware <code>RandomUserAgentMiddleware</code>. Intuitively, middleware is a piece of code that is executed <strong><em>during a request</em></strong>.</p>
<p>In the file <strong><em>middleware.py</em></strong> we add the code to <strong><em>randomize the user agent</em></strong> for each request. Make sure you have the fake_useragent package installed. From the <a target="_blank" href="https://pypi.python.org/pypi/fake-useragent">fake_usergent package</a> we import the <code>UserAgent</code> module. This contains <strong><em>a list of different user agents</em></strong>. In the constructor of the RandomUserAgentMiddleware class, we instantiate the UserAgent object. In the method <strong>_process<em>request</em></strong> we set the user agent to a random user agent from the <code>ua</code> object in the header of the request.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> fake_useragent <span class="hljs-keyword">import</span> UserAgent
<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">RandomUserAgentMiddleware</span>(<span class="hljs-params">object</span>):</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-params">self</span>):</span>
        super(RandomUserAgentMiddleware, self).__init__()
        self.ua = UserAgent()
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">process_request</span>(<span class="hljs-params">self, request, spider</span>):</span>
        request.headers.setdefault(<span class="hljs-string">'User-Agent'</span>, self.ua.random)
</code></pre>
<h1 id="heading-conclusion">Conclusion</h1>
<p>That was it! I hope you now have a clear view on how to use Scrapy for your web scraping projects.</p>
 ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
