Strings and Methods

Authors
Dr. Nicholas Del Grosso | Dr. Sangeetha Nandakumar | Dr. Ole Bialas | Dr. Atle E. Rimehaug

Section 1: Strings and DNA Sequences

string objects in Python represent text. They can be created in several ways:

>>> 'Hello'  # with apostrophes ("single-quotes")
'Hello'

>>> "Hello"  # with quotation marks ("double-quotes")
"Hello"

>>> """Hello,     
... my name is
... Nick"""   # with triple-double-quotes (a "docstring", used for multi-line text)
'Hello,\nmy name is\nNick'  

>>> str(32)  # using the str() function to change into a string
'32'

Nucleotide sequences are often represented as strings:

>>> seq = 'GCATTGGCT'

Built-In String Operations, Functions, and Methods

Modify the DNA sequences below in a single line of code to match what’s asked for. Functions and methods that may be used are:

Operations

The same operations we’ve used on numbers and lists work on strings!

  • 'GTC' * 3 # Repeats a string N times
  • 'GTC' + 'GTC' # Concatenates two strings
  • 'GTC'[0]
  • 'GTC'[-1]
  • 'GTC'[1:]
  • 'GTC'[:-1]
  • 'GTC'[::-1] # Reverses the sequence
  • 'GTC' == 'GTC' # If they are the same, then True
  • 'GTC' != 'GTC' # If they are different, then True

Built-In Functions for Strings and their Methods

Strings also contain their own functions. Functions inside types are called “Methods”, and they are a way to automatically put the string into the function.

Function Method Syntax Equivalent Description
str.count('GTC', 'A') 'GTC'.count('A') Count the number of characters in string.
str.upper('GtC') 'GtC'.upper() Make all letters in string uppercase.
str.lower('GTc') 'GTc'.lower() Make all letters in string lowercase.
str.isdigit('GTC') 'GTC'.isdigit() Check if string characters are digits.
str.index('GTC', 'T') 'GTC'.index('T') Return index/indeces at which character can be found in string.
str.replace('GTC', 'G', 'C') 'GTC'.replace('G', 'C') Replace character in string.
str.split('GTC-CCA', '-') 'GTC-CCA'.split('-') Split string at index where the specified characters is found.
len('GTC') -None- Return the length (number of characters) of string.

::: {#exm-} Count the Number of “G” in the following sequence: :::

Exercises

seq = "GTGTCAGTCCCCATGAATCGATAG"
seq.count('G')
6

Exercise: Count the Number of “C” in the following sequence:

seq = "GTGTCAGTCCCCATGAATCGATAG"
Solution
seq = "GTGTCAGTCCCCATGAATCGATAG"
seq.count('C')
6

Exercise: Count the number of “AT” repeats in the following sequence:

seq = "GTGTCAGTCCCCATGAATCGATAG"
Solution
seq = "GTGTCAGTCCCCATGAATCGATAG"
seq.count('AT')
3

Exercise: Concatenate the following two sequences (i.e. combine them into one sequence)

seq1 = "GTGTCAGT"
seq2 = "TGAATCGATAG"
Solution
seq1 = "GTGTCAGT"
seq2 = "TGAATCGATAG"
seq1 + seq2
'GTGTCAGTTGAATCGATAG'

Exercise: How long is the following sequence?

seq = "GTGTCAGTCCCCATGAATCGATAG"
Solution
seq = "GTGTCAGTCCCCATGAATCGATAG"
len(seq)
24

Exercise: What is the 2nd nucleotide in this sequence?

seq = "GTGTCAGTCCCCATGAATCGATAG"
Solution
seq = "GTGTCAGTCCCCATGAATCGATAG"
seq[1]
'T'

Exercise: What is the 3rd-from-the-last nucleotide in this sequence?

seq = "GTGTCAGTCCCCATGAATCGATAG"
Solution
seq = "GTGTCAGTCCCCATGAATCGATAG"
seq[-2]
'A'

Exercise: Repeat the following sequence 13 times

gc = "GC"
Solution
gc = "GC"
gc * 13
'GCGCGCGCGCGCGCGCGCGCGCGCGC'

Exercise: Replace the incorrect letter with an empty string (i.e. delete the letter) (Hint: an empty string is just a pair of quotes, like '' or "")

seq = "GTGXXGTXCCXCCATGXAATCGXATA"
Solution
seq = "GTGXXGTXCCXCCATGXAATCGXATA"
seq.replace('X', '')
'GTGGTCCCCATGAATCGATA'

Exercise: Access only the first six nucleotides in this sequence

seq = "GTGTCAGTCCCCATGAATCGATAG"
Solution
seq = "GTGTCAGTCCCCATGAATCGATAG"
seq[:6]
'GTGTCA'

Exercise: Standardize the formatting of this sequence by either upper- or lower-casing the letters

seq = "GtCGAaaCCgTaGcTAgc"
Solution
seq = "GtCGAaaCCgTaGcTAgc"
seq.upper()
'GTCGAAACCGTAGCTAGC'

Exercise: Split the following string around the empty space into a list of sequences (Hint: the string for a space is quotes with a space between them, like ' ' or " ")

seqs = "GTTCGAAAG GACCTGATTATAG AACCGATTTA"
Solution
seqs = "GTTCGAAAG GACCTGATTATAG AACCGATTTA"
seqs.split(' ')
['GTTCGAAAG', 'GACCTGATTATAG', 'AACCGATTTA']

Exercise: Reverse this sequence

seq = "GTGTCAGTCCCCATGAATCGATAG"
Solution
seq = "GTGTCAGTCCCCATGAATCGATAG"
seq[::-1]
'GATAGCTAAGTACCCCTGACTGTG'

Exercise: What percentage of strong nucleotides (G and C) are there in this sequence? (Hint: count the Gs and Cs, then divide by the total number of nucleotides)

seq = "GTGTCAGTCCCCATGAATCGATAG"
Solution
seq = "GTGTCAGTCCCCATGAATCGATAG"
(seq.count("G") + seq.count("C")) / len(seq)
0.5

Exercise: Is this sequence the same forwards and backwards (i.e. a palindrome)?

seq = "TCGATCTAGCGCGAATATCGGAGAAGAGGCTATAAGCGCGATCTAGCT"
Solution
seq = "TCGATCTAGCGCGAATATCGGAGAAGAGGCTATAAGCGCGATCTAGCT"
seq[::-1] == seq
True

Section 2: Text Files: Reading and Writing Strings to Files

Strings can be saved to text files by making a File object with the open() function and writing the string to it. Here are two ways to do it:

my_file = open('myfile.txt', 'w')  # get a file object open in 'write' mode
my_file.write('This is my text')  # call the file.write() method
my_file.close()  # call the file.close() method

Reading works in a similar way

my_file = open('myfile.txt')
text = my_file.read()
my_file.close()

Exercises

Exercise: Write the following sequence to a text file named “sequence.txt”:

seq = "GTGTCAGTCCCCATGAATCGATAG"
Solution
file = open('sequence.ignore.txt', 'w')
file.write(seq)
file.close()

Exercise: Read the sequence from the file back into Python.

Solution
file = open('sequence.ignore.txt')
seq = file.read()
file.close()
seq
'GTGTCAGTCCCCATGAATCGATAG'