Connecting a Python file with a SQL Database

Connecting a Python file with a SQL Database
0

#1

Hello, everyone. I need a little help. I am basically trying to make a crawler and then connect it to a database so that the crawled data can be stored and indexed there for retrieval later. But since I’m new to some aspects of Python and SQL, I need a little help. Any advice?


#2

Firstly, welcome to the forums.

While we are primarily here to help people with their Free Code Camp progress, we are open to people on other paths, too. Free Code Camp focuses on using JavaScript instead of Python and SQL.

With your current questions, we don’t have enough context to know what you already know or don’t know, so it is impossible to guide you without just telling you the answer (which we won’t do).

It is pretty typical on here for people to share a codepen / jsfiddle example of what they have tried so that anyone helping has more of an idea of what help is actually helpful.

Please provide some example of what you’ve tried and I’m sure you’ll get more help.

Happy coding :slight_smile:


#3

That’s the problem. As far as a connection between the two is concerned, I don’t have a code yet, not really. That’s what I am here for, to get an idea on how to handle this.


#4

Try looking into the pyodbc module. I’ve used it quite a bit for connecting to Microsoft Access Databases.


#5

Microsoft provides a python MSSQL driver. pymssql

https://docs.microsoft.com/en-us/sql/connect/python/python-driver-for-sql-server?view=sql-server-2017


#6

@owel So I download the driver, then what? What do I do next? Do I have to configure something with SQL or something?


#7

There are several example codes here.
http://pymssql.org/en/stable/pymssql_examples.html

This is just the driver for python to talk to an MSSQL server.

But you still need to know how to create/manage databases, tables and views in SQL Server, and know how to construct SQL commands. The driver is not magic, it’s just a go-between bridge between python and MSSQL server.

If you’ll be managing SQL databases, (creating tables, fields, indexes, views, stored procedures, full text, triggers, etc) you need to have Enterprise Manager software installed on your computer, and more importantly know how to use it.

https://docs.microsoft.com/en-us/sql/ssms/download-sql-server-management-studio-ssms?view=sql-server-2017


#8

I’m making the SQL connection using Visual Studio, but I’m unclear on how I can get the .py file, which is the crawler, to store crawled data into the SQL database. Suggestions?


#9

Can you post the code you have so far in the .py file?


#10

from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor

class ElectronicsSpider(CrawlSpider):
name = “electronics”
allowed_domains = [“www.olx.com.pk”]
start_urls = [
https://www.olx.com.pk/tv-video-audio/’,
https://www.olx.com.pk/games-entertainment/
]

rules = (
    Rule(LinkExtractor(allow=(), restrict_css=('.pageNextPrev',)),
         callback="parse_item",
         follow=True),)

def parse_item(self, response):
    print('Processing..' + response.url)
    # print(response.text)

#11

First off! What database are you using? Postgres, MySQL, Mongo…Hbase? Once you figure out that, you want to look if there is a client library that you can install and if you need to download native drivers (or if they are available). Native drivers are just drivers that the developers wrote to connect to the database instead of an open standard like say: ODBC. I use mostly postgres and so psycopg2 library works well. For psycopg2, I don’t think drivers are required on the client (but might be different for windows or might be a dependency that gets downloaded).

Usually, though this isn’t always the case, you establish a database connection using a connection function or by initiating a database engine that has a connect function. During this connection, you’ll have to provide a host, password and potentially a database name (if multiple entities exist within the database). Typically, you will either connect or get an error. This is normally binary: succeed or fail. If you get an error, there is a good chance that: 1. your credentials are wrong or 2. you are not establishing a connection to the database. The later is usually the result of: firewall, TCP/IP (internet) connection not being established or database permissions being off. You can usually figure this out through the error message.

If you succeed, it is often great to first query for the databases or users available within the database. This guarantees you are connected and have access to the right user/database. Afterwards, I’d query against actual data entities like relational tables or document stores. If you do this, you typically get returned an iterable object or generator, which you can manipulate.


#12

Well depending on what sql database you are using you can pip install pymssql for microsoft sql (mssql), psycopg2 for postgres (psql) or mysqldb for mysql databases Here are a few examples of using it

Microsoft sql
\\
import pymssql

conn = pymssql.connect(server=server, user=user, password=password, database=db)
cursor = conn.cursor()

cursor.execute(“SELECT COUNT(MemberID) as count FROM Members WHERE id = 1”)
row = cursor.fetchone()

conn.close()

print(row)
//////

Postgres

\\
import psycopg2

conn = psycopg2.connect(database=db, user=user, password=password, host=host, port=“5432”)
cursor = conn.cursor()

cursor.execute(‘SELECT COUNT(MemberID) as count FROM Members WHERE id = 1’)
row = cursor.fetchone()

conn.close()

print(row)
/////

mysql

\\
import MySQLdb

conn = MySQLdb.connect(host=host, user=user, passwd=passwd, db=db)
cursor = conn.cursor()

cursor.execute(‘SELECT COUNT(MemberID) as count FROM Members WHERE id = 1’)
row = cursor.fetchone()

conn.close()

print(row)
//////