Pages

Mar 24, 2015

[Python] strip vs. replace

Here are some examples showing the differences between the function of "strip()" and "replace()" in Python.
#EX.1
>>> x = "citt""
>>> x.strip(""")
'ci'
>>> x.replace(""","")
'citt'
#EX.2
>>> x = ""city" is"
>>> x.strip(""")
'city" is'
>>> x.replace(""","")
'city is'
Be careful! strip() will delete any character in the beginning or end of the word that matches "any" character in the word we put in the strip function. For example, we put "& quot;" as a parameter into strip(), then any character matches ('&', 'q', 'u', 'o', 't', ';') will be deleted. It doesn't have to be exactly the same "string."

Mar 22, 2015

[Python] Find part of speech (POS) by using NLTK

>>> import nltk
#remember to use list
>>> x = nltk.pos_tag(['New'])
>>> print(x)
[('New', 'NNP')]
>>> print(x[0][1])
NNP
#if you don't use list, then the pos_tag function will find the pos for each character
>>> x = nltk.pos_tag('New')
>>> print(x)
[('N', 'NNP'), ('e', 'VBP'), ('w', 'NN')]
view raw pos.py hosted with ❤ by GitHub

Mar 15, 2015

[Python] Performance enhance by not using range() for comparison

When I was trying to enhance the performance of my Python code, the frist thing I tried is reducing the usage of functions. After I substituted the range() function, the performance had improved a lot, from 404ms to 106ms. The following is the difference between the two versions of the "ZigZag Conversion" code.

404ms
class Solution:
# @return a string
def convert(self, s, nRows):
row_len = 2 * nRows - 2
string = ["" for _ in range(nRows)]
if nRows <= 2:
for i in range(len(s)):
string[i % nRows] += s[i]
else:
for i in range(len(s)):
res = i % row_len
if res in range(nRows, row_len):
string[row_len - res] += s[i]
else:
string[res] += s[i]
output = ""
for i in range(nRows):
output += string[i]
return output
view raw ZigZag_404.py hosted with ❤ by GitHub

106ms
class Solution:
# @return a string
def convert(self, s, nRows):
row_len = 2 * nRows - 2
string = ["" for _ in range(nRows)]
if nRows <= 2:
for i in range(len(s)):
string[i % nRows] += s[i]
else:
for i in range(len(s)):
res = i % row_len
if res >= nRows and res < row_len: # I just replaced the range() function here
string[row_len - res] += s[i]
else:
string[res] += s[i]
output = ""
for i in range(nRows):
output += string[i]
return output
view raw ZigZag_106.py hosted with ❤ by GitHub
Although the performance result depends on the test data, this substitution really improves the entire performance.

Mar 13, 2015

[Python] Find synonyms by using WordNet, NLTK

WordNet is a great reference when you want to find synonyms of a word. For more information, please see the website: http://wordnet.princeton.edu/

The following Python code is a simple example.
#Python 3
#Find synonyms of a word
>>> from nltk.corpus import wordnet as wn
>>> syn = wn.synsets("car")
>>> syn
[Synset('car.n.01'), Synset('car.n.02'), Synset('car.n.03'), Synset('car.n.04'), Synset('cable_car.n.01')]
>>> syn[0].lemma_names()
['car', 'auto', 'automobile', 'machine', 'motorcar']
#Note that synonyms are always in singular form.
#If you want to compare words, need to deal with the words that are in plural form.
view raw WordNet.py hosted with ❤ by GitHub

WordNet Interface: http://www.nltk.org/howto/wordnet.html

[Python] try... except

#python
try: #prevent unexpected crash
ofile = open(filename)
except:
print "file cannot be openned:", filename
exit()
#read one line at a time
for line in ofile:
line = line.rstrip() #delete "/n" from original file line
print line
#read "whole" file at a time
wholeFile = ofile.read()
view raw read_file.py hosted with ❤ by GitHub

Mar 12, 2015

Apply BFS to special MST problems

Given a undirected and connected graph, and the edges have weights of either 1 or 2 (actually it is also applicable for weight larger than 2). How can we find a MST of the graph by using BFS?

Ans (I guess):
First, we can view an edge with weight of 2 as two edges with weight of 1 connected together. Then we can simply apply BFS to solve this problem. After finish the BFS algorithm, we check all the edges originally with weight of 2. If both parts of an edge are not both chosen as a path of the MST, then we delete this path. Finally, the result is a MST.

Use pdb to debug Python code

use pdb to debug python code:


$ python -m pdb filename.py
view raw pdb hosted with ❤ by GitHub
some (not all) commands of pdb:
h: see the entire help list

l: l(ist) [first [,last]]
List source code for the current file.
Without arguments, list 11 lines around the current line or continue the previous listing.
With one argument, list 11 lines starting at that line.
With two arguments, list the given range; if the second argument is less than the first, it is a count. 

s:
Execute the current line, stop at the first possible occasion
(either in a function that is called or in the current function).

c
Continue execution, only stop when a breakpoint is encountered.

b
b(reak) ([file:]lineno | function) [, condition]

p: p expression
Print the value of the expression.

for more information: https://docs.python.org/2/library/pdb.html