Chapter 2: Working with Text (Strings)#
2.1 String#
A string is Python’s way of storing and working with text. Think of it as a sequence of characters (letters, numbers, symbols, spaces) that represent words, names, labels, or any text information.
Simple Definition: A string is text data enclosed in quotes.
"hello world"
'hello world'
'hello world'
'hello world'
'Mark says "hello"'
'Mark says "hello"'
type(1.0)
float
type('1')
str
1 + 2
3
type('1' + '2')
str
2.1.1 Creating Strings#
You can create strings using either single quotes (') or double quotes ("):
process_name = "distillation" # Double quotes
operator = 'Alice' # Single quotes
reaction = "2H2 + O2 → 2H2O" # Chemical equation as text
status_message = 'Reactor is running normally'
string = 'python'
string[5]
'n'
2.1.2 Indexing & Slicing#
Indexing lets you access individual characters in a string, while slicing lets you extract parts of a string. This is very useful when you need to work with specific portions of text data.
The key concept: Python starts counting from 0, not 1!
If we have the string "Python", here’s how the indexing works:
String: P y t h o n
Index: 0 1 2 3 4 5
First character
"P"is at index0Second character
"y"is at index1Last character
"n"is at index5
Remember: Python indexing starts at 0, not 1!
Finding Characters with .find()
The .find() method returns the index of the first occurrence of a character (or substring).
If the character is not found, it returns -1.
s = "Python"
s.find("t") # 2
s.find("n") # 5
s.find("z") # -1
string = 'Python'
-6 -5 -4 -3-2-1
string[-1]
Cell In[9], line 2
-6 -5 -4 -3-2-1
^
IndentationError: unexpected indent
string[-1]
'n'
# String Indexing Examples
print("=== Basic String Indexing ===")
# Create a simple string
word = "Python"
print(f"String: '{word}'")
print(f"Length: {len(word)} characters")
print(f"\n=== Positive Indexing (counting from start) ===")
print(f"word[0] = '{word[0]}' # First character")
print(f"word[1] = '{word[1]}' # Second character")
print(f"word[2] = '{word[2]}' # Third character")
print(f"word[3] = '{word[3]}' # Fourth character")
print(f"word[4] = '{word[4]}' # Fifth character")
print(f"word[5] = '{word[5]}' # Sixth character")
print(f"\n=== Negative Indexing (counting from end) ===")
print(f"word[-1] = '{word[-1]}' # Last character")
print(f"word[-2] = '{word[-2]}' # Second to last")
print(f"word[-3] = '{word[-3]}' # Third to last")
print(f"word[-4] = '{word[-4]}' # Fourth to last")
print(f"word[-5] = '{word[-5]}' # Fifth to last")
print(f"word[-6] = '{word[-6]}' # Sixth to last (first)")
print(f"\n=== Index Diagram ===")
print("String: P y t h o n")
print("Index: 0 1 2 3 4 5")
print("Negative:-6 -5 -4 -3 -2 -1")
print(f"\n=== Practical Example ===")
message = "Hello World"
print(f"Message: '{message}'")
print(f"First character: '{message[0]}'")
print(f"Last character: '{message[-1]}'")
print(f"Middle character: '{message[len(message)//2]}'")
=== Basic String Indexing ===
String: 'Python'
Length: 6 characters
=== Positive Indexing (counting from start) ===
word[0] = 'P' # First character
word[1] = 'y' # Second character
word[2] = 't' # Third character
word[3] = 'h' # Fourth character
word[4] = 'o' # Fifth character
word[5] = 'n' # Sixth character
=== Negative Indexing (counting from end) ===
word[-1] = 'n' # Last character
word[-2] = 'o' # Second to last
word[-3] = 'h' # Third to last
word[-4] = 't' # Fourth to last
word[-5] = 'y' # Fifth to last
word[-6] = 'P' # Sixth to last (first)
=== Index Diagram ===
String: P y t h o n
Index: 0 1 2 3 4 5
Negative:-6 -5 -4 -3 -2 -1
=== Practical Example ===
Message: 'Hello World'
First character: 'H'
Last character: 'd'
Middle character: ' '
💡 Remember: Python indexing starts at 0, not 1!
Slicing lets you extract a portion (substring) of a string. The syntax is:
string[start:end]
start: Index where slice begins (included)end: Index where slice ends (excluded)
Important: The end index is not included in the slice!
text = "Python"
text[0:2] #--> 0, 1
# print(text[1:4]) # Gets characters from index 1 to 3: "yth"
# print(text[0:3]) # Gets characters from index 0 to 2: "Pyt"
# print(text[2:]) # Gets from index 2 to end: "thon"
# print(text[:4]) # Gets from start to index 3: "Pyth"
'Py'
# String Slicing Examples
print("=== String Slicing ===")
text = "Programming"
print(f"Original string: '{text}'")
print(f"Length: {len(text)} characters")
print(f"\n=== Basic Slicing [start:end] ===")
print(f"text[0:4] = '{text[0:4]}' # Characters 0,1,2,3")
print(f"text[1:5] = '{text[1:5]}' # Characters 1,2,3,4")
print(f"text[3:7] = '{text[3:7]}' # Characters 3,4,5,6")
print(f"\n=== Slicing from Start or to End ===")
print(f"text[:5] = '{text[:5]}' # From beginning to index 4")
print(f"text[5:] = '{text[5:]}' # From index 5 to end")
print(f"text[:] = '{text[:]}' # Entire string (copy)")
print(f"\n=== Negative Slicing ===")
print(f"text[-4:] = '{text[-4:]}' # Last 4 characters")
print(f"text[:-3] = '{text[:-3]}' # All except last 3")
print(f"text[-7:-2] = '{text[-7:-2]}' # From -7 to -3")
text = "Programming"
#-4-3-2-1
text[-4:]
'ming'
print(f"\n=== Slicing with Step ===")
print(f"text[::2] = '{text[::2]}' # Every 2nd character")
print(f"text[1::2] = '{text[1::2]}' # Every 2nd, starting from index 1")
print(f"text[::3] = '{text[::3]}' # Every 3rd character")
print(f"\n=== Reverse String ===")
print(f"text[::-1] = '{text[::-1]}' # Reverse the entire string!")
print(f"\n=== Practical Examples ===")
email = "user@example.com"
print(f"Email: '{email}'")
print(f"Username: '{email[:email.index('@')]}'") # Before @
print(f"Domain: '{email[email.index('@')+1:]}'") # After @
filename = "data.txt"
print(f"\nFilename: '{filename}'")
print(f"Name part: '{filename[:-4]}'") # Without extension
print(f"Extension: '{filename[-4:]}'") # Just extension
=== String Slicing ===
Original string: 'Programming'
Length: 11 characters
=== Basic Slicing [start:end] ===
text[0:4] = 'Prog' # Characters 0,1,2,3
text[1:5] = 'rogr' # Characters 1,2,3,4
text[3:7] = 'gram' # Characters 3,4,5,6
=== Slicing from Start or to End ===
text[:5] = 'Progr' # From beginning to index 4
text[5:] = 'amming' # From index 5 to end
text[:] = 'Programming' # Entire string (copy)
=== Negative Slicing ===
text[-4:] = 'ming' # Last 4 characters
text[:-3] = 'Programm' # All except last 3
text[-7:-2] = 'rammi' # From -7 to -3
=== Slicing with Step ===
text[::2] = 'Pormig' # Every 2nd character
text[1::2] = 'rgamn' # Every 2nd, starting from index 1
text[::3] = 'Pgmn' # Every 3rd character
=== Reverse String ===
text[::-1] = 'gnimmargorP' # Reverse the entire string!
=== Practical Examples ===
Email: 'user@example.com'
Username: 'user'
Domain: 'example.com'
Filename: 'data.txt'
Name part: 'data'
Extension: '.txt'
2.1.3 Common String Operations#
Strings have many useful methods (functions) that help you work with text:
Basic String Methods#
Method |
Description |
Example |
Result |
|---|---|---|---|
|
Convert to uppercase |
|
|
|
Convert to lowercase |
|
|
|
Title case (first letter caps) |
|
|
|
Length of string |
|
|
|
Replace text |
|
|
Real Examples: Standardization & Data Cleaning#
Standardization Example: Imagine you have equipment data from different sources with inconsistent naming:
Raw data:
"pump a","PUMP B","Pump-C","centrifugal_pump_d"Standardized:
"Pump A","Pump B","Pump C","Centrifugal Pump D"
examples = ['Pump a', 'Pump B', 'pump c', 'Pump E ', 'Pump f']
standardized_list = []
for ex in examples:
standardized = ex.strip().title()
standardized = standardized.replace(' ', '_')
standardized_list.append(standardized)
standardized_list
['Pump_A', 'Pump_B', 'Pump_C', 'Pump_E', 'Pump_F']
print("=== Basic String Methods ===")
# Equipment data that might need standardization
equipment_name = "centrifugal_pump_d"
chemical_name = "BENZENE"
process_status = "normal operation"
print(f"Original: '{equipment_name}'")
print(f"Uppercase: '{equipment_name.upper()}'")
print(f"Title Case: '{equipment_name.title()}'")
print(f"Replace underscores: '{equipment_name.replace('_', ' ')}'")
print(f"Replace underscores & Title Case: '{equipment_name.replace('_', ' ').title()}'")
print(f"\nOriginal: '{chemical_name}'")
print(f"Lowercase: '{chemical_name.lower()}'")
print(f"\nOriginal: '{process_status}'")
print(f"Title Case: '{process_status.title()}'")
=== Basic String Methods ===
Original: 'centrifugal_pump_d'
Uppercase: 'CENTRIFUGAL_PUMP_D'
Title Case: 'Centrifugal_Pump_D'
Replace underscores: 'centrifugal pump d'
Replace underscores & Title Case: 'Centrifugal Pump D'
Original: 'BENZENE'
Lowercase: 'benzene'
Original: 'normal operation'
Title Case: 'Normal Operation'
print(f"\n=== String Length ===")
formula = "C6H12O6" # Glucose
safety_message = "Wear safety goggles at all times"
print(f"Chemical formula '{formula}' has {len(formula)} characters")
print(f"Safety message has {len(safety_message)} characters")
print(f"\n=== Replacing Text ===")
old_equipment = "Pump A is running"
new_equipment = old_equipment.replace("A", "B")
print(f"Original: '{old_equipment}'")
print(f"Updated: '{new_equipment}'")
=== Replacing Text ===
Original: 'Pump A is running'
Updated: 'Pump B is running'
# Useful for updating equipment IDs
reactor_list = "R-101, R-102, R-103"
updated_list = reactor_list.replace("R-", "Reactor-")
print(f"\nEquipment list: '{reactor_list}'")
print(f"Formatted list: '{updated_list}'")
Equipment list: 'R-101, R-102, R-103'
Formatted list: 'Reactor-101, Reactor-102, Reactor-103'
print(f"\n=== Practical Example: Data Cleaning ===")
# Messy data from a sensor log file
messy_data = " REACTOR temperature: 85.5 °C "
print(f"Raw data: '{messy_data}'")
# Clean it up
clean_data = messy_data.strip() # Remove extra spaces
clean_data = clean_data.lower() # Consistent case
clean_data = clean_data.title() # Make it look professional
print(f"Cleaned data: '{clean_data}'")
=== Practical Example: Data Cleaning ===
Raw data: ' REACTOR temperature: 85.5 °C '
Cleaned data: 'Reactor Temperature: 85.5 °C'
Data Cleaning Example: Sensor data files often have messy formatting:
Raw data:
" TEMPERATURE:85.5°c ","pressure: 2.3 BAR"Cleaned:
"Temperature: 85.5°C","Pressure: 2.3 bar"
Let’s see these in action:
2.1.4 String Formatting#
String formatting is the process of inserting variables and values into strings to create dynamic text. Instead of manually concatenating strings with the + operator, Python provides several elegant methods to combine text and data.
Example of the problem:
# Hard to read and maintain
name = "Reactor"
temperature = 85.5
pressure = 2.3
message = "The " + name + " temperature is " + str(temperature) + "°C and pressure is " + str(pressure) + " bar"
Python offers three main formatting methods:
Concatenation Operator (
+) - Basic but limited.format()Method - Powerful and flexiblef-strings - Modern and most readable (Python 3.6+)
temperature = 100 # Celcius
unit = 'K'
# print(f'temperature is {temperature} {unit}')
print('temperature is', temperature, unit)
temperature is 100 K
Method 1: Concatenation Operator (+)#
The basic method uses the + operator to join strings together. Important: You must convert non-string values to strings using str() function first.
Advantages:
Simple and straightforward
Works in all Python versions
Disadvantages:
Can become messy with many variables
Must manually convert numbers to strings
Hard to read with complex formatting
print("=== Basic Concatenation Examples ===")
# Chemical data
MW = 63.21
result1 = "Molar mass = " + str(MW) + " g/mol"
print(result1)
# Equipment status
reactor_id = "R-101"
temperature = 85.5
status = "The reactor " + reactor_id + " is at " + str(temperature) + "°C"
print(status)
=== Basic Concatenation Examples ===
Molar mass = 63.21 g/mol
The reactor R-101 is at 85.5°C
print("\n=== Problems with Concatenation ===")
# This gets messy quickly!
reactor = "R-102"
temp = 92.3
pressure = 2.5
flow_rate = 125.7
complex_message = "Reactor " + reactor + " status: Temperature = " + str(temp) + "°C, Pressure = " + str(pressure) + " bar, Flow rate = " + str(flow_rate) + " L/min"
print("Complex message (hard to read):")
print(complex_message)
=== Problems with Concatenation ===
Complex message (hard to read):
Reactor R-102 status: Temperature = 92.3°C, Pressure = 2.5 bar, Flow rate = 125.7 L/min
print("\n=== Common Mistakes ===")
# This will cause an error - uncomment to see:
# temperature = 85.5
# error_message = "Temperature is " + temperature # TypeError!
print("Remember: Must convert numbers to strings with str()")
=== Common Mistakes ===
Remember: Must convert numbers to strings with str()
Method 2: .format() Method#
The .format() method provides a more powerful and flexible way to format strings. You use {} as placeholders and pass values to the .format() method.
Advantages:
No need to convert numbers to strings manually
Placeholders make the template clear
Can control number formatting (decimal places, etc.)
Can reuse and reorder variables
Basic Syntax:
"template with {} placeholders".format(value1, value2)
# Method 2: .format() Method
print("=== Basic .format() Examples ===")
# Same examples as before, but cleaner
MW = 63.21
result2 = "Molar mass = {} g/mol".format(MW)
print(result2)
=== Basic .format() Examples ===
Molar mass = 63.21 g/mol
# Multiple variables
reactor_id = "R-101"
temperature = 85.5
status = "The reactor {} is at {}°C".format(reactor_id, temperature)
print(status)
The reactor R-101 is at 85.5°C
# Complex example - much cleaner!
chemical = "Benzene"
formula = "C6H6"
bp = 80.1
description = "{} ({}) has a boiling point of {}°C".format(chemical, formula, bp)
print(description)
Benzene (C6H6) has a boiling point of 80.1°C
print("\n=== Comparing Methods ===")
reactor = "R-102"
temp = 92.3
pressure = 2.5
# Concatenation (messy)
concat_msg = "Reactor " + reactor + " is at " + str(temp) + "°C, " + str(pressure) + " bar"
# .format() method (cleaner)
format_msg = "Reactor {} is at {}°C, {} bar".format(reactor, temp, pressure)
print("Concatenation:", concat_msg)
print(".format():", format_msg)
=== Comparing Methods ===
Concatenation: Reactor R-102 is at 92.3°C, 2.5 bar
.format(): Reactor R-102 is at 92.3°C, 2.5 bar
Method 3: f-strings (Formatted String Literals)#
f-strings are the most modern and readable way to format strings in Python (available in Python 3.6+). You put an f before the quotes and write variables directly inside {}.
Advantages:
Most readable and concise
Variables are directly visible in the string
Excellent performance
Can include expressions inside
{}Preferred method for new Python code
Basic Syntax:
f"template with {variable} directly inside"
# Method 3: f-strings (Formatted String Literals)
print("=== Basic f-string Examples ===")
# Same examples, now with f-strings (cleanest!)
MW = 63.21
unit = 'g/mol'
result3 = f"Molar mass = {MW} {unit}"
print(result3)
=== Basic f-string Examples ===
Molar mass = 63.21 g/mol
# Multiple variables - very readable
reactor_id = "R-101"
temperature = 85.5
status = f"The reactor {reactor_id} is at {temperature}°C"
print(status)
The reactor R-101 is at 85.5°C
# Complex example - super clean!
chemical = "Benzene"
formula = "C6H6"
bp = 80.1
description = f"{chemical} ({formula}) has a boiling point of {bp}°C"
print(description)
Benzene (C6H6) has a boiling point of 80.1°C
print("\n=== Advanced f-string Features ===")
# 1. Expressions inside braces
length = 5
width = 3
area_msg = f"Rectangle area = {length * width} square units"
print(area_msg)
=== Advanced f-string Features ===
Rectangle area = 15 square units
# 2. Function calls inside braces
name = "reactor temperature"
formatted_name = f"Sensor: {name.title()}"
print(formatted_name)
Sensor: Reactor Temperature
# 3. Number formatting (same as .format())
pi = 3.14159265359
pressure = 2.34567
print(f"Pi = {pi:.2f}") # 2 decimal places
print(f"Pi = {pi:.4f}") # 4 decimal places
print(f"Pressure = {pressure:.1f} bar")
Pi = 3.14
Pi = 3.1416
Pressure = 2.3 bar
# 4. Percentage formatting
efficiency = 0.854
print(f"Reactor efficiency: {efficiency:.1%}")
Reactor efficiency: 85.4%
# 5. Complex calculations
temp_celsius = 25
temp_fahrenheit = f"Temperature: {temp_celsius}°C = {temp_celsius * 9/5 + 32:.1f}°F"
print(temp_fahrenheit)
Temperature: 25°C = 77.0°F
print("\n=== Comparing All Three Methods ===")
reactor = "R-102"
temp = 92.3
pressure = 2.5
# Method 1: Concatenation (verbose)
concat_msg = "Reactor " + reactor + " is at " + str(temp) + "°C, " + str(pressure) + " bar"
# Method 2: .format() (good)
format_msg = "Reactor {} is at {}°C, {} bar".format(reactor, temp, pressure)
# Method 3: f-string (best!)
fstring_msg = f"Reactor {reactor} is at {temp}°C, {pressure} bar"
print("Concatenation:", concat_msg)
print(".format():", format_msg)
print("f-string:", fstring_msg)
=== Comparing All Three Methods ===
Concatenation: Reactor R-102 is at 92.3°C, 2.5 bar
.format(): Reactor R-102 is at 92.3°C, 2.5 bar
f-string: Reactor R-102 is at 92.3°C, 2.5 bar
Practical Summary: What You Should Use#
For this course: Use f-strings!
f-strings are the modern, preferred way to format strings in Python. They’re more readable, faster, and easier to write.
Why we showed the other methods:
Concatenation (
+): You might need this for simple cases.format(): You’ll encounter this when reading older Python code or tutorials
Method |
When to Use |
Example |
|---|---|---|
Concatenation ( |
❌ Avoid (hard to read) |
|
|
📚 Understanding old code only |
|
f-strings |
✅ USE THIS! |
|
Bottom line: Always use f-strings in your assignments and projects!
Real-World Example: Laboratory Report
Imagine you’re writing a program to generate laboratory reports. Here’s how each method would look:
# Real-World Example: Laboratory Report Generator
print("=== LABORATORY REPORT GENERATOR ===")
print("Generating a report for chemical analysis results...\n")
# Laboratory data
sample_id = "CHEM-2024-001"
compound = "Ethanol"
purity = 0.9534
temperature = 25.0
analyst = "Dr. Smith"
date = "2024-11-19"
print("=== Method 1: Concatenation (Verbose) ===")
report1 = "Sample ID: " + sample_id + "\n" + \
"Compound: " + compound + "\n" + \
"Purity: " + str(purity * 100) + "%\n" + \
"Analysis Temperature: " + str(temperature) + "°C\n" + \
"Analyst: " + analyst + "\n" + \
"Date: " + date
print(report1)
print("\n=== Method 2: .format() (Better) ===")
report2 = """Sample ID: {}
Compound: {}
Purity: {:.1%}
Analysis Temperature: {}°C
Analyst: {}
Date: {}""".format(sample_id, compound, purity, temperature, analyst, date)
print(report2)
print("\n=== Method 3: f-strings (Best!) ===")
report3 = f"""Sample ID: {sample_id}
Compound: {compound}
Purity: {purity:.1%}
Analysis Temperature: {temperature}°C
Analyst: {analyst}
Date: {date}"""
print(report3)
print("\n=== Advanced f-string Features in Reports ===")
# Calculate derived values directly in the f-string
molecular_weight = 46.07 # g/mol for ethanol
concentration = 0.250 # mol/L
advanced_report = f"""
ADVANCED ANALYSIS REPORT
========================
Sample: {sample_id} ({compound})
Molecular Weight: {molecular_weight} g/mol
Concentration: {concentration} mol/L
Mass per Liter: {concentration * molecular_weight:.2f} g/L
Purity: {purity:.1%} ({purity:.4f} decimal)
Quality Grade: {'High' if purity > 0.95 else 'Standard'}
Temperature: {temperature}°C ({temperature * 9/5 + 32:.1f}°F)
"""
print(advanced_report)
=== LABORATORY REPORT GENERATOR ===
Generating a report for chemical analysis results...
=== Method 1: Concatenation (Verbose) ===
Sample ID: CHEM-2024-001
Compound: Ethanol
Purity: 95.34%
Analysis Temperature: 25.0°C
Analyst: Dr. Smith
Date: 2024-11-19
=== Method 2: .format() (Better) ===
Sample ID: CHEM-2024-001
Compound: Ethanol
Purity: 95.3%
Analysis Temperature: 25.0°C
Analyst: Dr. Smith
Date: 2024-11-19
=== Method 3: f-strings (Best!) ===
Sample ID: CHEM-2024-001
Compound: Ethanol
Purity: 95.3%
Analysis Temperature: 25.0°C
Analyst: Dr. Smith
Date: 2024-11-19
=== Advanced f-string Features in Reports ===
ADVANCED ANALYSIS REPORT
========================
Sample: CHEM-2024-001 (Ethanol)
Molecular Weight: 46.07 g/mol
Concentration: 0.25 mol/L
Mass per Liter: 11.52 g/L
Purity: 95.3% (0.9534 decimal)
Quality Grade: High
Temperature: 25.0°C (77.0°F)