Strings and Methods
Authors
Section 1: Strings and DNA Sequences
string objects in Python represent text. They can be created in several ways:
>>> 'Hello' # with apostrophes ("single-quotes")
'Hello'
>>> "Hello" # with quotation marks ("double-quotes")
"Hello"
>>> """Hello,
... my name is
... Nick""" # with triple-double-quotes (a "docstring", used for multi-line text)
'Hello,\nmy name is\nNick'
>>> str(32) # using the str() function to change into a string
'32'Nucleotide sequences are often represented as strings:
>>> seq = 'GCATTGGCT'Built-In String Operations, Functions, and Methods
Modify the DNA sequences below in a single line of code to match what’s asked for. Functions and methods that may be used are:
Operations
The same operations we’ve used on numbers and lists work on strings!
'GTC' * 3 # Repeats a string N times'GTC' + 'GTC' # Concatenates two strings'GTC'[0]'GTC'[-1]'GTC'[1:]'GTC'[:-1]'GTC'[::-1] # Reverses the sequence'GTC' == 'GTC' # If they are the same, then True'GTC' != 'GTC' # If they are different, then True
Built-In Functions for Strings and their Methods
Strings also contain their own functions. Functions inside types are called “Methods”, and they are a way to automatically put the string into the function.
| Function | Method Syntax Equivalent | Description |
|---|---|---|
str.count('GTC', 'A') |
'GTC'.count('A') |
Count the number of characters in string. |
str.upper('GtC') |
'GtC'.upper() |
Make all letters in string uppercase. |
str.lower('GTc') |
'GTc'.lower() |
Make all letters in string lowercase. |
str.isdigit('GTC') |
'GTC'.isdigit() |
Check if string characters are digits. |
str.index('GTC', 'T') |
'GTC'.index('T') |
Return index/indeces at which character can be found in string. |
str.replace('GTC', 'G', 'C') |
'GTC'.replace('G', 'C') |
Replace character in string. |
str.split('GTC-CCA', '-') |
'GTC-CCA'.split('-') |
Split string at index where the specified characters is found. |
len('GTC') |
-None- | Return the length (number of characters) of string. |
::: {#exm-} Count the Number of “G” in the following sequence: :::
Exercises
seq = "GTGTCAGTCCCCATGAATCGATAG"
seq.count('G')6Exercise: Count the Number of “C” in the following sequence:
seq = "GTGTCAGTCCCCATGAATCGATAG"Solution
seq = "GTGTCAGTCCCCATGAATCGATAG"
seq.count('C')6Exercise: Count the number of “AT” repeats in the following sequence:
seq = "GTGTCAGTCCCCATGAATCGATAG"Solution
seq = "GTGTCAGTCCCCATGAATCGATAG"
seq.count('AT')3Exercise: Concatenate the following two sequences (i.e. combine them into one sequence)
seq1 = "GTGTCAGT"
seq2 = "TGAATCGATAG"Solution
seq1 = "GTGTCAGT"
seq2 = "TGAATCGATAG"
seq1 + seq2'GTGTCAGTTGAATCGATAG'Exercise: How long is the following sequence?
seq = "GTGTCAGTCCCCATGAATCGATAG"Solution
seq = "GTGTCAGTCCCCATGAATCGATAG"
len(seq)24Exercise: What is the 2nd nucleotide in this sequence?
seq = "GTGTCAGTCCCCATGAATCGATAG"Solution
seq = "GTGTCAGTCCCCATGAATCGATAG"
seq[1]'T'Exercise: What is the 3rd-from-the-last nucleotide in this sequence?
seq = "GTGTCAGTCCCCATGAATCGATAG"Solution
seq = "GTGTCAGTCCCCATGAATCGATAG"
seq[-2]'A'Exercise: Repeat the following sequence 13 times
gc = "GC"Solution
gc = "GC"
gc * 13'GCGCGCGCGCGCGCGCGCGCGCGCGC'Exercise: Replace the incorrect letter with an empty string (i.e. delete the letter) (Hint: an empty string is just a pair of quotes, like '' or "")
seq = "GTGXXGTXCCXCCATGXAATCGXATA"Solution
seq = "GTGXXGTXCCXCCATGXAATCGXATA"
seq.replace('X', '')'GTGGTCCCCATGAATCGATA'Exercise: Access only the first six nucleotides in this sequence
seq = "GTGTCAGTCCCCATGAATCGATAG"Solution
seq = "GTGTCAGTCCCCATGAATCGATAG"
seq[:6]'GTGTCA'Exercise: Standardize the formatting of this sequence by either upper- or lower-casing the letters
seq = "GtCGAaaCCgTaGcTAgc"Solution
seq = "GtCGAaaCCgTaGcTAgc"
seq.upper()'GTCGAAACCGTAGCTAGC'Exercise: Split the following string around the empty space into a list of sequences (Hint: the string for a space is quotes with a space between them, like ' ' or " ")
seqs = "GTTCGAAAG GACCTGATTATAG AACCGATTTA"Solution
seqs = "GTTCGAAAG GACCTGATTATAG AACCGATTTA"
seqs.split(' ')['GTTCGAAAG', 'GACCTGATTATAG', 'AACCGATTTA']Exercise: Reverse this sequence
seq = "GTGTCAGTCCCCATGAATCGATAG"Solution
seq = "GTGTCAGTCCCCATGAATCGATAG"
seq[::-1]'GATAGCTAAGTACCCCTGACTGTG'Exercise: What percentage of strong nucleotides (G and C) are there in this sequence? (Hint: count the Gs and Cs, then divide by the total number of nucleotides)
seq = "GTGTCAGTCCCCATGAATCGATAG"Solution
seq = "GTGTCAGTCCCCATGAATCGATAG"
(seq.count("G") + seq.count("C")) / len(seq)0.5Exercise: Is this sequence the same forwards and backwards (i.e. a palindrome)?
seq = "TCGATCTAGCGCGAATATCGGAGAAGAGGCTATAAGCGCGATCTAGCT"Solution
seq = "TCGATCTAGCGCGAATATCGGAGAAGAGGCTATAAGCGCGATCTAGCT"
seq[::-1] == seqTrueSection 2: Text Files: Reading and Writing Strings to Files
Strings can be saved to text files by making a File object with the open() function and writing the string to it. Here are two ways to do it:
my_file = open('myfile.txt', 'w') # get a file object open in 'write' mode
my_file.write('This is my text') # call the file.write() method
my_file.close() # call the file.close() methodReading works in a similar way
my_file = open('myfile.txt')
text = my_file.read()
my_file.close()Exercises
Exercise: Write the following sequence to a text file named “sequence.txt”:
seq = "GTGTCAGTCCCCATGAATCGATAG"Solution
file = open('sequence.ignore.txt', 'w')
file.write(seq)
file.close()Exercise: Read the sequence from the file back into Python.
Solution
file = open('sequence.ignore.txt')
seq = file.read()
file.close()
seq'GTGTCAGTCCCCATGAATCGATAG'