Pages

Apr 29, 2015

[Python] Find the key with most items in a dictionary

#suppose we have the following x dictionary
>>> x
defaultdict(None, {1: [2], 2: [1], 6: [9, 5], 7: [9], 8: [9], 9: [6, 7, 8]})
#we want to find the key with most items
#Python 2
>>> sorted(x.viewitems(), key=lambda k: len(k[1]), reverse=True)
[(9, [6, 7, 8]), (6, [9, 5]), (1, [2]), (2, [1]), (7, [9]), (8, [9])]
>>> sorted(x.viewitems(), key=lambda k: len(k[1]), reverse=True)[0][0]
9
#Python3
>>> sorted(x.items(), key=lambda k: len(k[1]), reverse=True)[0][0]
9
view raw most_item.py hosted with ❤ by GitHub
Reference: http://stackoverflow.com/questions/9363670/sort-dictionary-by-number-of-values-under-each-key

Apr 22, 2015

[Python] A simple way to sort a dictionary

>>> import operator
>>> x #x is a dictionary
{'cde': 456, 'abc': 789, 'efg': 123}
# only return keys
>>> sorted(x)
['abc', 'cde', 'efg']
# sort by key, "reverse=True" means from large to small
>>> sorted(x.items(), key=operator.itemgetter(0), reverse=True)
[('efg', 123), ('cde', 456), ('abc', 789)]
# sort by value, just change the parameter in the itemgetter()
>>> sorted(x.items(), key=operator.itemgetter(1), reverse=True)
[('abc', 789), ('cde', 456), ('efg', 123)]
view raw sortDict.py hosted with ❤ by GitHub

[Python] An example of scraping date from a website and writing to a csv file

In order to scrape data from a website, I used the "BeautifulSoup" module for Python. The data I want to get from the website (http://www.charitynavigator.org/index.cfm?bay=topten.detail&listid=24#.VTfpxa3BzGc) is the "10 Super-Sized Charities."



A sample code is shown below:
from urllib.request import urlopen
from bs4 import BeautifulSoup
import csv
def outputCSV(dataSet, filename):
print(dataSet)
with open(filename, 'w', newline='') as csvfile:
csvW = csv.writer(csvfile)
csvW.writerows(dataSet)
csvfile.close()
def main():
html = urlopen('http://www.charitynavigator.org/index.cfm?bay=topten.detail&listid=24#.VTfpxa3BzGc')
soup = BeautifulSoup(html.read())
dataSet = []
data = []
#get column name
for row in soup('table')[0].findAll('tr')[0].findAll('th'):
data.append(row.contents[0])
dataSet.append(data)
#get data
for i in range(1, len(soup('table')[0].findAll('tr'))):
data = []
for row in soup('table')[0].findAll('tr')[i].findAll('td'):
if row.a == None:
tmpString = row.contents
else:
#remove hyperlink
tmpString = row.a.contents
if tmpString != []:
data.append(tmpString[0].strip())
if data != []:
#print(data)
dataSet.append(data)
for row in dataSet:
print(row)
outputCSV(dataSet, 'web_scrape.csv')
if __name__ == '__main__':
main()
view raw scrape.py hosted with ❤ by GitHub
And the result is

Apr 16, 2015

[Python] defaultdict will not always generate default value

If you didn't assign default data type, then the defaultdict will not generate a default value for a missing key. For this case, we can use get(key) to find the key value and it will not cause error if the key is missing. See the following example.
#defaultdict will not always generate default value
#if you didn't assign default data type to it, using defaultdict[key] to find the key might get an error
#defaultdict has default data type
>>> x = defaultdict(int)
>>> x
defaultdict(<type 'int'>, {})
>>> x["abc"]
0
>>> x
defaultdict(<type 'int'>, {'abc': 0})
#defaultdict has no default data type
>>> x = defaultdict()
>>> x
defaultdict(None, {})
>>> x["abc"]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'abc'
#for safety, instead, we can use get() to find the key
>>> x.get("abc") #will return nothing, but will not cause error
>>>
view raw deafultdict.py hosted with ❤ by GitHub

Apr 12, 2015

int is not an object in Java

Type int in Java is not an object. So when we assign a previously created int variable to another newly created int variable, they are not sharing the same memory (they are mutually independent).
int nu1 = 13;
int nu2 = nu1;
nu2 = 12;
//now nu1 is still 13, and nu2 is 12
view raw int.java hosted with ❤ by GitHub

Unlike type int, String is an object in Java. When we assign a previously created string assigned to another newly created string, they share the same object.
PS. String is not mutable.
String str1 = "Hello World";
String str2 = str1;
str2.replace("H", "X");//replace is an accessor, not a mutator
System.out.println(str2);
//print "Hello World"
System.out.println(str1);
//print "Hello World"
String str3 = str1.replace("H", "X");
System.out.println(str3);
//print "Xello World"
view raw string.java hosted with ❤ by GitHub

Apr 4, 2015

[Python] Counter

The function of "Counter" make it very convenient to count the number of repeated items in two list. See the following example.
>>> from collections import Counter
>>> x = [1, 1, 2, 3, 3, 4, 5, 6, 6]
>>> y = [1, 2, 2, 2, 3, 3, 4, 6, 6, 6]
>>> x_counter = Counter(x)
>>> y_counter = Counter(y)
>>> x_counter
Counter({1: 2, 3: 2, 6: 2, 2: 1, 4: 1, 5: 1})
>>> y_counter
Counter({2: 3, 6: 3, 3: 2, 1: 1, 4: 1})
>>> x_counter & y_counter #find the overlapping numbers
Counter({3: 2, 6: 2, 1: 1, 2: 1, 4: 1})
>>> sum((x_counter & y_counter).values()) #summation of the total number of item overlapped
7
view raw Counter.py hosted with ❤ by GitHub

Apr 3, 2015

[Python] some operations on list

When using a list in Python, we can see the indicator of the items as follows:
x = [0, 1, 2, 3, 4, 5, 6, 7, 8]
x[0],   x[1],   x[2],   x[3],   x[4],   x[5],   x[6],   x[7],   x[8]
x[-0],  x[-8], x[-7],  x[-6],  x[-5], x[-4],  x[-3],  x[-2], x[-1]   <= reverse direction
x[-9]
some usages of list in Python are shown below:
>>> x = [0,1,2,3,4,5,6,7,8]
>>> x[1:5]
[1, 2, 3, 4] #print items from x[1] to x[4] (excluding x[5])
>>> x[1:]
[1, 2, 3, 4, 5, 6, 7, 8] #print the items start from x[1] to the end
>>> x[:5]
[0, 1, 2, 3, 4] #print the first 5 items
>>> x[1::3]
[1, 4, 7] #print items..x[1 + 0*3], x[1 + 1*3], x[1 + 2*3] ....
>>> x[-1]
8 #print the last item in x
>>> x[:-1]
[0, 1, 2, 3, 4, 5, 6, 7] #print items start from the head, x[0], to the one before x[-1] (excluding x[-1])
#note that if you want to copy a list x to a new list y, using y = x doesn't perform the copy task, it is a reference
#example of reference operation
>>> x = [1, 2, 3, 4, 5]
>>> y = x
>>> y[1] = 7
>>> x
[1, 7, 3, 4, 5]
#example of copy operation
>>> y = x[:]
>>> y[0] = 7
>>> x
[1, 7, 3, 4, 5] #not affected by the operation on y
view raw list.py hosted with ❤ by GitHub