Posted: Friday, January 30, 2009
Due: Friday, February 6, 2009, 10pm (electronic submission via Blackboard)
Your code for each problem should be in a separate file. Name the files problem1.py and problem2.py and make sure to follow any additional instructions given within each problem.
Deoxyribonucleic acid (DNA) consists of two long chains of nucleotides twisted into a double helix and joined by hydrogen bonds between the complementary bases adenine and thymine or cytosine and guanine. The main role of DNA is the storage of information and DNA is often compared to code containing instructions. Each nucleotide has a complement: for 'a' it is 't' (and vice versa) and for 'c' it is 'g' (and vice versa). In this problem we view DNA as a string of characters over a four-letter alphabet, 'a', 'c', 'g', and 't'. Using operations on strings, write functions to do the following:
Use string operations to implement functions DNA_complement and DNA_heavy_max. Appendix A in Zelle contains a good summary of the string operations you may want to consider. Write and test each function individually. Then put the two functions into file problem1.py which should read strings from an input file (one string per line) and invoke each function on the string. Here is sample files: dnastrings1.txt, dnastrings2.txt.
Your code in file problem1.py should be organized as followed:
# CS190C: Spring 2009 # FirstName LastName # Problem Set 3, Problem 1 import string def DNA_complement(str): #returns complemented string #write code def DNA_heavy_max(str): #returns most frequently occurring nucleotide #write code if __name__ == '__main__': f = open('filename', 'r') # open a file containing DNA strings, one per line for line in f: print 'DNA sequence =', line # call function DNA complement # call function DNA_heavy_max # print complemented string and most frequently occurring nucleotide
The format
if __name__ == '__main__':
is generally used when writing code with functions (instead of using a main function). It allows one to have code act as either reusable modules or as standalone programs. You can find out more at http://effbot.org/pyfaq/tutor-what-is-if-name-main-for.htm.
Many applications dealing with data sets containing points in a given dimension need queries determining spatial relationships. This problem considers 2-dimensional points given by their (x,y) coordinates and asks you to write a number of functions implementing spatial queries. You are given code for a number of “helper” function related to I/O and data generation:
These functions are in file ps3_09_p2_lib.py. You can use them without making any changes or you can change them. In either case, you need to submit this file - without changing its name - along with file problem2.py.
A skeleton of problem2.py is available here: problem2.py and it is also shown below. The functions in the skeleton program have place-holders with return statements so you can run the program before completing the functions.
Write and test each function individually and start with small and simple data sets. Examples of data files are available here: testdata0.txt, testdata1.txt. Note that one can generate an input file easily using functions random_points and save_points.
Skeleton of problem2.py you can use:
# CS190C: Spring 2009 # FirstName LastName # Problem Set 3, Problem 2 from ps3_09_p2_lib import * #import helper functions def distance(p, q): return 0 #return distance; write code def closest(points, p): return [5,5] #return closest point; write code def nearest_neighbors(points): return [[0,0], [5,5]] #return pair of points; write code if __name__ == '__main__': # Create a list of 100 random points myPoints = random_points(100, 2) # Save the points to a file save_points(myPoints, "points.txt") # Plot the points show_points(myPoints, (0,1,0)) # Create three extra points pointA = [1.0, 0.0] pointB = [1.0, 1.0] pointC = [0.0, 0.0] # Print the distance between two points print "The distance between", pointA, "and", pointB,"is:", distance(pointA, pointB) # Find the point in myPoints closest to point B closeToB = closest(myPoints, pointB); print "The point closes to", pointB, "is:", closeToB # Find the nearest neighbors in myPoints pointX, pointY = nearest_neighbors(myPoints) print "The nearest neighbors are:", pointX, "and", pointY