In this post we will use Python to work with data in SQLite database files. We will create table, insert records into it, update table and finally perform SELECT based on some condition.
Problem statement: - Process this text file and count the number email messages per organisation (i.e. Domain name of the email address consider as unique organisation name)
Database Schema:-
Table with two columns ORGANISATION and COUNT of Emails.
< CREATE TABLE EmailCounts (ORGANISATION TEXT, COUNT INTEGER) >
SQLite database:- SQLite is a light weight relational database management system. In contrast to many other database management systems, SQLite is not a client–server database engine. Rather, it is embedded into the end program.(Source: Wiki)
Using conn.cursor() we get a hadle of file (like a file handle we obtain using open() method). Once cursor(handle) is obtained, we can use to perform operations on the data stored in the database using execute() method. Below sample code demonstrate the same in sequence.
Now we will process the this text file which contains raw data and insert data into database. Consider following code lines for processing the raw data in file.
Raw data:-
From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008
Return-Path: <postmaster@collab.sakaiproject.org>
Received: from murder (mail.umich.edu [141.211.14.90])
by frankenstein.mail.umich.edu (Cyrus v2.3.8) with LMTPA;
Sat, 05 Jan 2008 09:14:16 -0500
X-Sieve: CMU Sieve 2.3
... ...... . ..
...... . ..............
We need to open file mbox.txt and read each line, when we get Line starting from From ... process it and filter out domain name. If That domain name is not in Database Insert into database else update the count(increment count by 1).
Sample Output:-
[zytham@s158519-vm backup]$ /usr/bin/python emailCounts.py mbox.txt
Processing input data file......
-------------------------
Organization---Counts
-------------------------
iupui.edu 536
umich.edu 491
indiana.edu 178
caret.cam.ac.uk 157
vt.edu 110
uct.ac.za 96
media.berkeley.edu 56
ufp.pt 28
gmail.com 25
et.gatech.edu 17
Problem statement: - Process this text file and count the number email messages per organisation (i.e. Domain name of the email address consider as unique organisation name)
Database Schema:-
Table with two columns ORGANISATION and COUNT of Emails.
< CREATE TABLE EmailCounts (ORGANISATION TEXT, COUNT INTEGER) >
How to connect SQLite database ? :-
Python provides support for SQLite database in sqlite3. First, we need to import this in and followed by connect method is used to connect database. The connect operation makes a “connection” to the database stored in the file "emaildb.sqlite" in the current directory. If the file does not exist, it will be created.Using conn.cursor() we get a hadle of file (like a file handle we obtain using open() method). Once cursor(handle) is obtained, we can use to perform operations on the data stored in the database using execute() method. Below sample code demonstrate the same in sequence.
import sqlite3 conn = sqlite3.connect('emaildb.sqlite') cur = conn.cursor() cur.execute('DROP TABLE IF EXISTS EmailCounts ') cur.execute('CREATE TABLE EmailCounts (ORGANISATION TEXT, COUNT INTEGER)') conn.close()
Now we will process the this text file which contains raw data and insert data into database. Consider following code lines for processing the raw data in file.
Raw data:-
From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008
Return-Path: <postmaster@collab.sakaiproject.org>
Received: from murder (mail.umich.edu [141.211.14.90])
by frankenstein.mail.umich.edu (Cyrus v2.3.8) with LMTPA;
Sat, 05 Jan 2008 09:14:16 -0500
X-Sieve: CMU Sieve 2.3
... ...... . ..
...... . ..............
We need to open file mbox.txt and read each line, when we get Line starting from From ... process it and filter out domain name. If That domain name is not in Database Insert into database else update the count(increment count by 1).
Sample program to count emails corresponding to organisation
#http://www.pythonlearn.com/code/mbox.txt import sqlite3 import sys conn = sqlite3.connect('emaildb.sqlite') cur = conn.cursor() #drop table in database, if exists cur.execute('DROP TABLE IF EXISTS EmailCounts ') #create table EmailCounts cur.execute('CREATE TABLE EmailCounts (ORGANISATION TEXT, COUNT INTEGER)') if ( len(sys.argv) < 2 ) : print 'Invalid arguments, Input file missing, exiting !!' exit filename = sys.argv[1] filehandle = open(filename) print "Processing input data file......" for line in filehandle: if not line.startswith('From: ') : continue emailPart = line.split() #stephen.marquard@uct.ac.za email = emailPart[1] org1=email.split("@") #stephen.marquard orgVal = org1[1] #print org1[1] cur.execute('SELECT COUNT FROM EmailCounts WHERE ORGANISATION = ? ', (orgVal, )) row = cur.fetchone() if row is None: cur.execute('''INSERT INTO EmailCounts (ORGANISATION, COUNT) VALUES ( ?, 1 )''', ( orgVal, ) ) else : cur.execute('UPDATE EmailCounts SET COUNT=COUNT+1 WHERE ORGANISATION = ?', (orgVal, )) # This statement commits outstanding changes to disk each # time through the loop - the program can be made faster # by moving the commit so it runs only after the loop completes conn.commit() # https://www.sqlite.org/lang_select.html sqlstr = 'SELECT ORGANISATION, COUNT FROM EmailCounts ORDER BY COUNT DESC LIMIT 10' print "\n-------------------------" print "Organization---Counts" print "-------------------------" for row in cur.execute(sqlstr) : print str(row[0]), row[1] #close cur/handle cur.close()
Sample Output:-
[zytham@s158519-vm backup]$ /usr/bin/python emailCounts.py mbox.txt
Processing input data file......
-------------------------
Organization---Counts
-------------------------
iupui.edu 536
umich.edu 491
indiana.edu 178
caret.cam.ac.uk 157
vt.edu 110
uct.ac.za 96
media.berkeley.edu 56
ufp.pt 28
gmail.com 25
et.gatech.edu 17
This comment has been removed by the author.
ReplyDeleteGood Blog!!! Thanks for sharing this great information with us
ReplyDeleteSelenium Training in Chennai
Selenium Course in Chennai
selenium certification in chennai
Best selenium Training Institute in Chennai
Selenium Training in Velachery
Selenium training in Adyar
Python Training in Chennai
Software testing training in chennai
JAVA Training in Chennai
Great Post!!! Thanks for sharing your blog and waiting for your new updates.
ReplyDeleteGerman Classes in Chennai
German Classes in Bangalore
German Classes in Coimbatore
German Classes in Madurai
German Language Course in Hyderabad
German Language Classes in Chennai
German language classes in bangalore
German Language Classes in Coimbatore
Ethical Hacking Course in Bangalore
IELTS Coaching in Bangalore
ReplyDeleteThe blog you shared is very good. I expect more information from you like this blog. Thankyou.
web designing training in chennai
web designing training in bangalore
web design training coimbatore
Web Designing Course in bangalore
web designing course in madurai
Web development training in bangalore
Web development training in chennai
Big Data Course in Coimbatore
Web Designing Course in bangalore
web designing course in madurai
I appreciate that you produced this wonderful article to help us get more knowledge about this topic.
ReplyDeleteI know, it is not an easy task to write such a big article in one day, I've tried that and I've failed. But, here you are, trying the big task and finishing it off and getting good comments and ratings.
IELTS Coaching in chennai
German Classes in Chennai
GRE Coaching Classes in Chennai
TOEFL Coaching in Chennai
spoken english classes in chennai | Communication training
The development of artificial intelligence (AI) has propelled more programming architects, information scientists, and different experts to investigate the plausibility of a vocation in machine learning. Notwithstanding, a few newcomers will in general spotlight a lot on hypothesis and insufficient on commonsense application. machine learning projects for final year In case you will succeed, you have to begin building machine learning projects in the near future.
ReplyDeleteProjects assist you with improving your applied ML skills rapidly while allowing you to investigate an intriguing point. Furthermore, you can include projects into your portfolio, making it simpler to get a vocation, discover cool profession openings, and Final Year Project Centers in Chennai even arrange a more significant compensation.
Data analytics is the study of dissecting crude data so as to make decisions about that data. Data analytics advances and procedures are generally utilized in business ventures to empower associations to settle on progressively Python Training in Chennai educated business choices. In the present worldwide commercial center, it isn't sufficient to assemble data and do the math; you should realize how to apply that data to genuine situations such that will affect conduct. In the program you will initially gain proficiency with the specialized skills, including R and Python dialects most usually utilized in data analytics programming and usage; Python Training in Chennai at that point center around the commonsense application, in view of genuine business issues in a scope of industry segments, for example, wellbeing, promoting and account.
The Nodejs Projects Angular Training covers a wide range of topics including Components, Angular Directives, Angular Services, Pipes, security fundamentals, Routing, and Angular programmability. The new Angular TRaining will lay the foundation you need to specialise in Single Page Application developer. Angular Training
Mua vé máy bay tại Aivivu, tham khảo
ReplyDeletebay từ hàn quốc về việt nam
vé máy bay hà nội sài gòn tháng 7
vé máy bay từ nha trang ra hà nội
đặt vé đi nha trang
vé máy bay đi Huế vietnam airline
taxi sân bay nội bài
Great attempts are made to publish content that are quite useful to know. I'm really liking your blog. And positive comments foster positive relationships. You're doing a fantastic job. Keep up the good work. custom erp development
ReplyDeleteNice article, Thanks for sharing.
ReplyDeletePython training in Pune