Chapter 2: Working with Text (Strings)#

2.1 String#

A string is Python’s way of storing and working with text. Think of it as a sequence of characters (letters, numbers, symbols, spaces) that represent words, names, labels, or any text information.

Simple Definition: A string is text data enclosed in quotes.

"hello world"
'hello world'
'hello world'
'hello world'
'Mark says "hello"'
'Mark says "hello"'
type(1.0)
float
type('1')
str
1 + 2
3
type('1' + '2')
str

2.1.1 Creating Strings#

You can create strings using either single quotes (') or double quotes ("):

process_name = "distillation"        # Double quotes
operator = 'Alice'                   # Single quotes
reaction = "2H2 + O2 → 2H2O"        # Chemical equation as text
status_message = 'Reactor is running normally'
string = 'python'

string[5]
'n'

2.1.2 Indexing & Slicing#

Indexing lets you access individual characters in a string, while slicing lets you extract parts of a string. This is very useful when you need to work with specific portions of text data.

The key concept: Python starts counting from 0, not 1!

If we have the string "Python", here’s how the indexing works:

String:  P  y  t  h  o  n
Index:   0  1  2  3  4  5
  • First character "P" is at index 0

  • Second character "y" is at index 1

  • Last character "n" is at index 5

Remember: Python indexing starts at 0, not 1!

Finding Characters with .find()

  • The .find() method returns the index of the first occurrence of a character (or substring).

  • If the character is not found, it returns -1.

s = "Python"
s.find("t")   # 2
s.find("n")   # 5
s.find("z")   # -1
string = 'Python'
          -6 -5 -4 -3-2-1
string[-1]
  Cell In[9], line 2
    -6 -5 -4 -3-2-1
    ^
IndentationError: unexpected indent
string[-1]
'n'
# String Indexing Examples

print("=== Basic String Indexing ===")

# Create a simple string
word = "Python"
print(f"String: '{word}'")
print(f"Length: {len(word)} characters")

print(f"\n=== Positive Indexing (counting from start) ===")
print(f"word[0] = '{word[0]}'  # First character")
print(f"word[1] = '{word[1]}'  # Second character") 
print(f"word[2] = '{word[2]}'  # Third character")
print(f"word[3] = '{word[3]}'  # Fourth character")
print(f"word[4] = '{word[4]}'  # Fifth character")
print(f"word[5] = '{word[5]}'  # Sixth character")
print(f"\n=== Negative Indexing (counting from end) ===")
print(f"word[-1] = '{word[-1]}'  # Last character")
print(f"word[-2] = '{word[-2]}'  # Second to last")
print(f"word[-3] = '{word[-3]}'  # Third to last")
print(f"word[-4] = '{word[-4]}'  # Fourth to last")
print(f"word[-5] = '{word[-5]}'  # Fifth to last")
print(f"word[-6] = '{word[-6]}'  # Sixth to last (first)")
print(f"\n=== Index Diagram ===")
print("String:  P  y  t  h  o  n")
print("Index:   0  1  2  3  4  5")
print("Negative:-6 -5 -4 -3 -2 -1")
print(f"\n=== Practical Example ===")
message = "Hello World"
print(f"Message: '{message}'")
print(f"First character: '{message[0]}'")
print(f"Last character: '{message[-1]}'")
print(f"Middle character: '{message[len(message)//2]}'")
=== Basic String Indexing ===
String: 'Python'
Length: 6 characters

=== Positive Indexing (counting from start) ===
word[0] = 'P'  # First character
word[1] = 'y'  # Second character
word[2] = 't'  # Third character
word[3] = 'h'  # Fourth character
word[4] = 'o'  # Fifth character
word[5] = 'n'  # Sixth character

=== Negative Indexing (counting from end) ===
word[-1] = 'n'  # Last character
word[-2] = 'o'  # Second to last
word[-3] = 'h'  # Third to last
word[-4] = 't'  # Fourth to last
word[-5] = 'y'  # Fifth to last
word[-6] = 'P'  # Sixth to last (first)

=== Index Diagram ===
String:  P  y  t  h  o  n
Index:   0  1  2  3  4  5
Negative:-6 -5 -4 -3 -2 -1

=== Practical Example ===
Message: 'Hello World'
First character: 'H'
Last character: 'd'
Middle character: ' '

💡 Remember: Python indexing starts at 0, not 1!

Slicing lets you extract a portion (substring) of a string. The syntax is:

string[start:end]
  • start: Index where slice begins (included)

  • end: Index where slice ends (excluded)

Important: The end index is not included in the slice!

text = "Python"


text[0:2] #--> 0, 1

# print(text[1:4])    # Gets characters from index 1 to 3: "yth"
# print(text[0:3])    # Gets characters from index 0 to 2: "Pyt"
# print(text[2:])     # Gets from index 2 to end: "thon"
# print(text[:4])     # Gets from start to index 3: "Pyth"
'Py'
# String Slicing Examples

print("=== String Slicing ===")

text = "Programming"
print(f"Original string: '{text}'")
print(f"Length: {len(text)} characters")
print(f"\n=== Basic Slicing [start:end] ===")
print(f"text[0:4] = '{text[0:4]}'  # Characters 0,1,2,3")
print(f"text[1:5] = '{text[1:5]}'  # Characters 1,2,3,4")
print(f"text[3:7] = '{text[3:7]}'  # Characters 3,4,5,6")
print(f"\n=== Slicing from Start or to End ===")
print(f"text[:5] = '{text[:5]}'    # From beginning to index 4")
print(f"text[5:] = '{text[5:]}'    # From index 5 to end")
print(f"text[:] = '{text[:]}'      # Entire string (copy)")
print(f"\n=== Negative Slicing ===")
print(f"text[-4:] = '{text[-4:]}'   # Last 4 characters")
print(f"text[:-3] = '{text[:-3]}'   # All except last 3")
print(f"text[-7:-2] = '{text[-7:-2]}' # From -7 to -3")
text = "Programming"
               #-4-3-2-1


text[-4:]
'ming'
print(f"\n=== Slicing with Step ===")
print(f"text[::2] = '{text[::2]}'   # Every 2nd character")
print(f"text[1::2] = '{text[1::2]}' # Every 2nd, starting from index 1")
print(f"text[::3] = '{text[::3]}'   # Every 3rd character")
print(f"\n=== Reverse String ===")
print(f"text[::-1] = '{text[::-1]}' # Reverse the entire string!")
print(f"\n=== Practical Examples ===")
email = "user@example.com"
print(f"Email: '{email}'")
print(f"Username: '{email[:email.index('@')]}'")  # Before @
print(f"Domain: '{email[email.index('@')+1:]}'")  # After @

filename = "data.txt"
print(f"\nFilename: '{filename}'")
print(f"Name part: '{filename[:-4]}'")  # Without extension
print(f"Extension: '{filename[-4:]}'")  # Just extension
=== String Slicing ===
Original string: 'Programming'
Length: 11 characters

=== Basic Slicing [start:end] ===
text[0:4] = 'Prog'  # Characters 0,1,2,3
text[1:5] = 'rogr'  # Characters 1,2,3,4
text[3:7] = 'gram'  # Characters 3,4,5,6

=== Slicing from Start or to End ===
text[:5] = 'Progr'    # From beginning to index 4
text[5:] = 'amming'    # From index 5 to end
text[:] = 'Programming'      # Entire string (copy)

=== Negative Slicing ===
text[-4:] = 'ming'   # Last 4 characters
text[:-3] = 'Programm'   # All except last 3
text[-7:-2] = 'rammi' # From -7 to -3

=== Slicing with Step ===
text[::2] = 'Pormig'   # Every 2nd character
text[1::2] = 'rgamn' # Every 2nd, starting from index 1
text[::3] = 'Pgmn'   # Every 3rd character

=== Reverse String ===
text[::-1] = 'gnimmargorP' # Reverse the entire string!

=== Practical Examples ===
Email: 'user@example.com'
Username: 'user'
Domain: 'example.com'

Filename: 'data.txt'
Name part: 'data'
Extension: '.txt'

2.1.3 Common String Operations#

Strings have many useful methods (functions) that help you work with text:

Basic String Methods#

Method

Description

Example

Result

.upper()

Convert to uppercase

"reactor".upper()

"REACTOR"

.lower()

Convert to lowercase

"REACTOR".lower()

"reactor"

.title()

Title case (first letter caps)

"batch reactor".title()

"Batch Reactor"

.len()

Length of string

len("benzene")

7

.replace()

Replace text

"pump A".replace("A", "B")

"pump B"

Real Examples: Standardization & Data Cleaning#

Standardization Example: Imagine you have equipment data from different sources with inconsistent naming:

  • Raw data: "pump a", "PUMP B", "Pump-C", "centrifugal_pump_d"

  • Standardized: "Pump A", "Pump B", "Pump C", "Centrifugal Pump D"

examples = ['Pump a', 'Pump B', 'pump c', 'Pump E ', 'Pump f']


standardized_list = []

for ex in examples:
    standardized = ex.strip().title()
    standardized = standardized.replace(' ', '_')
    standardized_list.append(standardized)

standardized_list
['Pump_A', 'Pump_B', 'Pump_C', 'Pump_E', 'Pump_F']
print("=== Basic String Methods ===")

# Equipment data that might need standardization
equipment_name = "centrifugal_pump_d"
chemical_name = "BENZENE"
process_status = "normal operation"

print(f"Original: '{equipment_name}'")
print(f"Uppercase: '{equipment_name.upper()}'")
print(f"Title Case: '{equipment_name.title()}'")
print(f"Replace underscores: '{equipment_name.replace('_', ' ')}'")
print(f"Replace underscores & Title Case: '{equipment_name.replace('_', ' ').title()}'")

print(f"\nOriginal: '{chemical_name}'")
print(f"Lowercase: '{chemical_name.lower()}'")

print(f"\nOriginal: '{process_status}'")
print(f"Title Case: '{process_status.title()}'")
=== Basic String Methods ===
Original: 'centrifugal_pump_d'
Uppercase: 'CENTRIFUGAL_PUMP_D'
Title Case: 'Centrifugal_Pump_D'
Replace underscores: 'centrifugal pump d'
Replace underscores & Title Case: 'Centrifugal Pump D'

Original: 'BENZENE'
Lowercase: 'benzene'

Original: 'normal operation'
Title Case: 'Normal Operation'
print(f"\n=== String Length ===")
formula = "C6H12O6"  # Glucose
safety_message = "Wear safety goggles at all times"

print(f"Chemical formula '{formula}' has {len(formula)} characters")
print(f"Safety message has {len(safety_message)} characters")
print(f"\n=== Replacing Text ===")
old_equipment = "Pump A is running"
new_equipment = old_equipment.replace("A", "B")

print(f"Original: '{old_equipment}'")
print(f"Updated: '{new_equipment}'")
=== Replacing Text ===
Original: 'Pump A is running'
Updated: 'Pump B is running'
# Useful for updating equipment IDs
reactor_list = "R-101, R-102, R-103"
updated_list = reactor_list.replace("R-", "Reactor-")
print(f"\nEquipment list: '{reactor_list}'")
print(f"Formatted list: '{updated_list}'")
Equipment list: 'R-101, R-102, R-103'
Formatted list: 'Reactor-101, Reactor-102, Reactor-103'
print(f"\n=== Practical Example: Data Cleaning ===")
# Messy data from a sensor log file
messy_data = "  REACTOR temperature:  85.5 °C  "
print(f"Raw data: '{messy_data}'")

# Clean it up
clean_data = messy_data.strip()  # Remove extra spaces
clean_data = clean_data.lower()  # Consistent case
clean_data = clean_data.title()  # Make it look professional

print(f"Cleaned data: '{clean_data}'")
=== Practical Example: Data Cleaning ===
Raw data: '  REACTOR temperature:  85.5 °C  '
Cleaned data: 'Reactor Temperature:  85.5 °C'

Data Cleaning Example: Sensor data files often have messy formatting:

  • Raw data: "  TEMPERATURE:85.5°c  ", "pressure: 2.3 BAR"

  • Cleaned: "Temperature: 85.5°C", "Pressure: 2.3 bar"

Let’s see these in action:

2.1.4 String Formatting#

String formatting is the process of inserting variables and values into strings to create dynamic text. Instead of manually concatenating strings with the + operator, Python provides several elegant methods to combine text and data.

Example of the problem:

# Hard to read and maintain
name = "Reactor"
temperature = 85.5
pressure = 2.3
message = "The " + name + " temperature is " + str(temperature) + "°C and pressure is " + str(pressure) + " bar"

Python offers three main formatting methods:

  1. Concatenation Operator (+) - Basic but limited

  2. .format() Method - Powerful and flexible

  3. f-strings - Modern and most readable (Python 3.6+)

temperature = 100 # Celcius
unit = 'K'

# print(f'temperature is {temperature} {unit}')

print('temperature is', temperature, unit)
temperature is 100 K

Method 1: Concatenation Operator (+)#

The basic method uses the + operator to join strings together. Important: You must convert non-string values to strings using str() function first.

Advantages:

  • Simple and straightforward

  • Works in all Python versions

Disadvantages:

  • Can become messy with many variables

  • Must manually convert numbers to strings

  • Hard to read with complex formatting

print("=== Basic Concatenation Examples ===")

# Chemical data
MW = 63.21
result1 = "Molar mass = " + str(MW) + " g/mol"
print(result1)

# Equipment status
reactor_id = "R-101"
temperature = 85.5
status = "The reactor " + reactor_id + " is at " + str(temperature) + "°C"
print(status)
=== Basic Concatenation Examples ===
Molar mass = 63.21 g/mol
The reactor R-101 is at 85.5°C
print("\n=== Problems with Concatenation ===")
# This gets messy quickly!
reactor = "R-102"
temp = 92.3
pressure = 2.5
flow_rate = 125.7
complex_message = "Reactor " + reactor + " status: Temperature = " + str(temp) + "°C, Pressure = " + str(pressure) + " bar, Flow rate = " + str(flow_rate) + " L/min"
print("Complex message (hard to read):")
print(complex_message)
=== Problems with Concatenation ===
Complex message (hard to read):
Reactor R-102 status: Temperature = 92.3°C, Pressure = 2.5 bar, Flow rate = 125.7 L/min
print("\n=== Common Mistakes ===")
# This will cause an error - uncomment to see:
# temperature = 85.5
# error_message = "Temperature is " + temperature  # TypeError!
print("Remember: Must convert numbers to strings with str()")
=== Common Mistakes ===
Remember: Must convert numbers to strings with str()

Method 2: .format() Method#

The .format() method provides a more powerful and flexible way to format strings. You use {} as placeholders and pass values to the .format() method.

Advantages:

  • No need to convert numbers to strings manually

  • Placeholders make the template clear

  • Can control number formatting (decimal places, etc.)

  • Can reuse and reorder variables

Basic Syntax:

"template with {} placeholders".format(value1, value2)
# Method 2: .format() Method
print("=== Basic .format() Examples ===")
# Same examples as before, but cleaner
MW = 63.21
result2 = "Molar mass = {} g/mol".format(MW)
print(result2)
=== Basic .format() Examples ===
Molar mass = 63.21 g/mol
# Multiple variables
reactor_id = "R-101"
temperature = 85.5
status = "The reactor {} is at {}°C".format(reactor_id, temperature)
print(status)
The reactor R-101 is at 85.5°C
# Complex example - much cleaner!
chemical = "Benzene"
formula = "C6H6"
bp = 80.1
description = "{} ({}) has a boiling point of {}°C".format(chemical, formula, bp)
print(description)
Benzene (C6H6) has a boiling point of 80.1°C
print("\n=== Comparing Methods ===")
reactor = "R-102"
temp = 92.3
pressure = 2.5

# Concatenation (messy)
concat_msg = "Reactor " + reactor + " is at " + str(temp) + "°C, " + str(pressure) + " bar"

# .format() method (cleaner)
format_msg = "Reactor {} is at {}°C, {} bar".format(reactor, temp, pressure)

print("Concatenation:", concat_msg)
print(".format():", format_msg)
=== Comparing Methods ===
Concatenation: Reactor R-102 is at 92.3°C, 2.5 bar
.format(): Reactor R-102 is at 92.3°C, 2.5 bar

Method 3: f-strings (Formatted String Literals)#

f-strings are the most modern and readable way to format strings in Python (available in Python 3.6+). You put an f before the quotes and write variables directly inside {}.

Advantages:

  • Most readable and concise

  • Variables are directly visible in the string

  • Excellent performance

  • Can include expressions inside {}

  • Preferred method for new Python code

Basic Syntax:

f"template with {variable} directly inside"
# Method 3: f-strings (Formatted String Literals)
print("=== Basic f-string Examples ===")
# Same examples, now with f-strings (cleanest!)
MW = 63.21
unit = 'g/mol'
result3 = f"Molar mass = {MW} {unit}"
print(result3)
=== Basic f-string Examples ===
Molar mass = 63.21 g/mol
# Multiple variables - very readable
reactor_id = "R-101"
temperature = 85.5
status = f"The reactor {reactor_id} is at {temperature}°C"
print(status)
The reactor R-101 is at 85.5°C
# Complex example - super clean!
chemical = "Benzene"
formula = "C6H6"
bp = 80.1
description = f"{chemical} ({formula}) has a boiling point of {bp}°C"
print(description)
Benzene (C6H6) has a boiling point of 80.1°C
print("\n=== Advanced f-string Features ===")
# 1. Expressions inside braces
length = 5
width = 3
area_msg = f"Rectangle area = {length * width} square units"
print(area_msg)
=== Advanced f-string Features ===
Rectangle area = 15 square units
# 2. Function calls inside braces
name = "reactor temperature"
formatted_name = f"Sensor: {name.title()}"
print(formatted_name)
Sensor: Reactor Temperature
# 3. Number formatting (same as .format())
pi = 3.14159265359
pressure = 2.34567
print(f"Pi = {pi:.2f}")           # 2 decimal places
print(f"Pi = {pi:.4f}")           # 4 decimal places
print(f"Pressure = {pressure:.1f} bar")
Pi = 3.14
Pi = 3.1416
Pressure = 2.3 bar
# 4. Percentage formatting
efficiency = 0.854
print(f"Reactor efficiency: {efficiency:.1%}")
Reactor efficiency: 85.4%
# 5. Complex calculations
temp_celsius = 25
temp_fahrenheit = f"Temperature: {temp_celsius}°C = {temp_celsius * 9/5 + 32:.1f}°F"
print(temp_fahrenheit)
Temperature: 25°C = 77.0°F
print("\n=== Comparing All Three Methods ===")
reactor = "R-102"
temp = 92.3
pressure = 2.5

# Method 1: Concatenation (verbose)
concat_msg = "Reactor " + reactor + " is at " + str(temp) + "°C, " + str(pressure) + " bar"

# Method 2: .format() (good)
format_msg = "Reactor {} is at {}°C, {} bar".format(reactor, temp, pressure)

# Method 3: f-string (best!)
fstring_msg = f"Reactor {reactor} is at {temp}°C, {pressure} bar"

print("Concatenation:", concat_msg)
print(".format():", format_msg)
print("f-string:", fstring_msg)
=== Comparing All Three Methods ===
Concatenation: Reactor R-102 is at 92.3°C, 2.5 bar
.format(): Reactor R-102 is at 92.3°C, 2.5 bar
f-string: Reactor R-102 is at 92.3°C, 2.5 bar

Practical Summary: What You Should Use#

For this course: Use f-strings!

f-strings are the modern, preferred way to format strings in Python. They’re more readable, faster, and easier to write.

Why we showed the other methods:

  • Concatenation (+): You might need this for simple cases

  • .format(): You’ll encounter this when reading older Python code or tutorials

Method

When to Use

Example

Concatenation (+)

❌ Avoid (hard to read)

"Hello " + name

.format()

📚 Understanding old code only

"Value: {:.2f}".format(num)

f-strings

USE THIS!

f"Value: {num:.2f}"

Bottom line: Always use f-strings in your assignments and projects!

Real-World Example: Laboratory Report

Imagine you’re writing a program to generate laboratory reports. Here’s how each method would look:

# Real-World Example: Laboratory Report Generator

print("=== LABORATORY REPORT GENERATOR ===")
print("Generating a report for chemical analysis results...\n")

# Laboratory data
sample_id = "CHEM-2024-001"
compound = "Ethanol"
purity = 0.9534
temperature = 25.0
analyst = "Dr. Smith"
date = "2024-11-19"

print("=== Method 1: Concatenation (Verbose) ===")
report1 = "Sample ID: " + sample_id + "\n" + \
          "Compound: " + compound + "\n" + \
          "Purity: " + str(purity * 100) + "%\n" + \
          "Analysis Temperature: " + str(temperature) + "°C\n" + \
          "Analyst: " + analyst + "\n" + \
          "Date: " + date
print(report1)

print("\n=== Method 2: .format() (Better) ===")
report2 = """Sample ID: {}
Compound: {}
Purity: {:.1%}
Analysis Temperature: {}°C
Analyst: {}
Date: {}""".format(sample_id, compound, purity, temperature, analyst, date)
print(report2)

print("\n=== Method 3: f-strings (Best!) ===")
report3 = f"""Sample ID: {sample_id}
Compound: {compound}
Purity: {purity:.1%}
Analysis Temperature: {temperature}°C
Analyst: {analyst}
Date: {date}"""
print(report3)

print("\n=== Advanced f-string Features in Reports ===")
# Calculate derived values directly in the f-string
molecular_weight = 46.07  # g/mol for ethanol
concentration = 0.250     # mol/L

advanced_report = f"""
ADVANCED ANALYSIS REPORT
========================
Sample: {sample_id} ({compound})
Molecular Weight: {molecular_weight} g/mol
Concentration: {concentration} mol/L
Mass per Liter: {concentration * molecular_weight:.2f} g/L
Purity: {purity:.1%} ({purity:.4f} decimal)
Quality Grade: {'High' if purity > 0.95 else 'Standard'}
Temperature: {temperature}°C ({temperature * 9/5 + 32:.1f}°F)
"""
print(advanced_report)
=== LABORATORY REPORT GENERATOR ===
Generating a report for chemical analysis results...

=== Method 1: Concatenation (Verbose) ===
Sample ID: CHEM-2024-001
Compound: Ethanol
Purity: 95.34%
Analysis Temperature: 25.0°C
Analyst: Dr. Smith
Date: 2024-11-19

=== Method 2: .format() (Better) ===
Sample ID: CHEM-2024-001
Compound: Ethanol
Purity: 95.3%
Analysis Temperature: 25.0°C
Analyst: Dr. Smith
Date: 2024-11-19

=== Method 3: f-strings (Best!) ===
Sample ID: CHEM-2024-001
Compound: Ethanol
Purity: 95.3%
Analysis Temperature: 25.0°C
Analyst: Dr. Smith
Date: 2024-11-19

=== Advanced f-string Features in Reports ===

ADVANCED ANALYSIS REPORT
========================
Sample: CHEM-2024-001 (Ethanol)
Molecular Weight: 46.07 g/mol
Concentration: 0.25 mol/L
Mass per Liter: 11.52 g/L
Purity: 95.3% (0.9534 decimal)
Quality Grade: High
Temperature: 25.0°C (77.0°F)