Table of Contents

CS 190C, Spring 2009: Problem Set 3

Posted: Friday, January 30, 2009

Due: Friday, February 6, 2009, 10pm (electronic submission via Blackboard)

Your code for each problem should be in a separate file. Name the files problem1.py and problem2.py and make sure to follow any additional instructions given within each problem.

Problem 1: Manipulating DNA

Deoxyribonucleic acid (DNA) consists of two long chains of nucleotides twisted into a double helix and joined by hydrogen bonds between the complementary bases adenine and thymine or cytosine and guanine. The main role of DNA is the storage of information and DNA is often compared to code containing instructions. Each nucleotide has a complement: for 'a' it is 't' (and vice versa) and for 'c' it is 'g' (and vice versa). In this problem we view DNA as a string of characters over a four-letter alphabet, 'a', 'c', 'g', and 't'. Using operations on strings, write functions to do the following:

Use string operations to implement functions DNA_complement and DNA_heavy_max. Appendix A in Zelle contains a good summary of the string operations you may want to consider. Write and test each function individually. Then put the two functions into file problem1.py which should read strings from an input file (one string per line) and invoke each function on the string. Here is sample files: dnastrings1.txt, dnastrings2.txt.

Your code in file problem1.py should be organized as followed:

# CS190C: Spring 2009
# FirstName LastName
# Problem Set 3, Problem 1
 
import string
 
def DNA_complement(str):
    #returns complemented string
    #write code
 
def DNA_heavy_max(str):
    #returns most frequently occurring nucleotide
    #write code
 
if __name__ == '__main__':
 
    f = open('filename', 'r')
    # open a file containing DNA strings, one per line 
 
    for line in f:
        print 'DNA sequence =', line
        # call function DNA complement
        # call function DNA_heavy_max
        # print complemented string and most frequently occurring nucleotide

The format

 if  __name__ == '__main__': 

is generally used when writing code with functions (instead of using a main function). It allows one to have code act as either reusable modules or as standalone programs. You can find out more at http://effbot.org/pyfaq/tutor-what-is-if-name-main-for.htm.

Problem 2: Spatial Queries

Many applications dealing with data sets containing points in a given dimension need queries determining spatial relationships. This problem considers 2-dimensional points given by their (x,y) coordinates and asks you to write a number of functions implementing spatial queries. You are given code for a number of “helper” function related to I/O and data generation:

These functions are in file ps3_09_p2_lib.py. You can use them without making any changes or you can change them. In either case, you need to submit this file - without changing its name - along with file problem2.py.

The functions to be written

A skeleton of problem2.py is available here: problem2.py and it is also shown below. The functions in the skeleton program have place-holders with return statements so you can run the program before completing the functions.

Write and test each function individually and start with small and simple data sets. Examples of data files are available here: testdata0.txt, testdata1.txt. Note that one can generate an input file easily using functions random_points and save_points.

Skeleton of problem2.py you can use:

# CS190C: Spring 2009
# FirstName LastName
# Problem Set 3, Problem 2
 
from ps3_09_p2_lib import *  #import helper functions
 
def distance(p, q):
    return 0  #return distance; write code
 
def closest(points, p):
    return [5,5]  #return closest point; write code
 
def nearest_neighbors(points):
    return  [[0,0], [5,5]]  #return pair of points; write code 
 
if __name__ == '__main__':
    # Create a list of 100 random points
    myPoints = random_points(100, 2)
 
    # Save the points to a file
    save_points(myPoints, "points.txt")
 
    # Plot the points
    show_points(myPoints, (0,1,0))
 
    # Create three extra points
    pointA = [1.0, 0.0]
    pointB = [1.0, 1.0]
    pointC = [0.0, 0.0]
 
    # Print the distance between two points
    print "The distance between", pointA, "and", pointB,"is:", distance(pointA, pointB)
 
    # Find the point in myPoints closest to point B
    closeToB = closest(myPoints, pointB);
    print "The point closes to", pointB, "is:", closeToB
 
    # Find the nearest neighbors in myPoints
    pointX, pointY = nearest_neighbors(myPoints)
    print "The nearest neighbors are:", pointX, "and", pointY