Chapter 2: Working with Text (Strings)

Chapter 2: Working with Text (Strings)#

2.1 String#

A string is Python’s way of storing and working with text. Think of it as a sequence of characters (letters, numbers, symbols, spaces) that represent words, names, labels, or any text information.

Simple Definition: A string is text data enclosed in quotes.

"hello world"

'hello world'

'hello world'

'hello world'

'Mark says "hello"'

'Mark says "hello"'

type(1.0)

float

type('1')

str

1 + 2

type('1' + '2')

str

2.1.1 Creating Strings#

You can create strings using either single quotes (') or double quotes ("):

process_name = "distillation"        # Double quotes
operator = 'Alice'                   # Single quotes
reaction = "2H2 + O2 → 2H2O"        # Chemical equation as text
status_message = 'Reactor is running normally'

string = 'python'

string[5]

'n'

2.1.2 Indexing & Slicing#

Indexing lets you access individual characters in a string, while slicing lets you extract parts of a string. This is very useful when you need to work with specific portions of text data.

The key concept: Python starts counting from 0, not 1!

If we have the string "Python", here’s how the indexing works:

String:  P  y  t  h  o  n
Index:   0  1  2  3  4  5

First character "P" is at index 0
Second character "y" is at index 1
Last character "n" is at index 5

Remember: Python indexing starts at 0, not 1!

Finding Characters with .find()

The .find() method returns the index of the first occurrence of a character (or substring).
If the character is not found, it returns -1.

s = "Python"
s.find("t")   # 2
s.find("n")   # 5
s.find("z")   # -1

string = 'Python'
          -6 -5 -4 -3-2-1
string[-1]

  Cell In[9], line 2
    -6 -5 -4 -3-2-1
    ^
IndentationError: unexpected indent

string[-1]

'n'

# String Indexing Examples

print("=== Basic String Indexing ===")

# Create a simple string
word = "Python"
print(f"String: '{word}'")
print(f"Length: {len(word)} characters")

print(f"\n=== Positive Indexing (counting from start) ===")
print(f"word[0] = '{word[0]}'  # First character")
print(f"word[1] = '{word[1]}'  # Second character") 
print(f"word[2] = '{word[2]}'  # Third character")
print(f"word[3] = '{word[3]}'  # Fourth character")
print(f"word[4] = '{word[4]}'  # Fifth character")
print(f"word[5] = '{word[5]}'  # Sixth character")

print(f"\n=== Negative Indexing (counting from end) ===")
print(f"word[-1] = '{word[-1]}'  # Last character")
print(f"word[-2] = '{word[-2]}'  # Second to last")
print(f"word[-3] = '{word[-3]}'  # Third to last")
print(f"word[-4] = '{word[-4]}'  # Fourth to last")
print(f"word[-5] = '{word[-5]}'  # Fifth to last")
print(f"word[-6] = '{word[-6]}'  # Sixth to last (first)")

print(f"\n=== Index Diagram ===")
print("String:  P  y  t  h  o  n")
print("Index:   0  1  2  3  4  5")
print("Negative:-6 -5 -4 -3 -2 -1")

print(f"\n=== Practical Example ===")
message = "Hello World"
print(f"Message: '{message}'")
print(f"First character: '{message[0]}'")
print(f"Last character: '{message[-1]}'")
print(f"Middle character: '{message[len(message)//2]}'")

=== Basic String Indexing ===
String: 'Python'
Length: 6 characters

=== Positive Indexing (counting from start) ===
word[0] = 'P'  # First character
word[1] = 'y'  # Second character
word[2] = 't'  # Third character
word[3] = 'h'  # Fourth character
word[4] = 'o'  # Fifth character
word[5] = 'n'  # Sixth character

=== Negative Indexing (counting from end) ===
word[-1] = 'n'  # Last character
word[-2] = 'o'  # Second to last
word[-3] = 'h'  # Third to last
word[-4] = 't'  # Fourth to last
word[-5] = 'y'  # Fifth to last
word[-6] = 'P'  # Sixth to last (first)

=== Index Diagram ===
String:  P  y  t  h  o  n
Index:   0  1  2  3  4  5
Negative:-6 -5 -4 -3 -2 -1

=== Practical Example ===
Message: 'Hello World'
First character: 'H'
Last character: 'd'
Middle character: ' '

💡 Remember: Python indexing starts at 0, not 1!

Slicing lets you extract a portion (substring) of a string. The syntax is:

string[start:end]

start: Index where slice begins (included)
end: Index where slice ends (excluded)

Important: The end index is not included in the slice!

text = "Python"


text[0:2] #--> 0, 1

# print(text[1:4])    # Gets characters from index 1 to 3: "yth"
# print(text[0:3])    # Gets characters from index 0 to 2: "Pyt"
# print(text[2:])     # Gets from index 2 to end: "thon"
# print(text[:4])     # Gets from start to index 3: "Pyth"

'Py'

# String Slicing Examples

print("=== String Slicing ===")

text = "Programming"
print(f"Original string: '{text}'")
print(f"Length: {len(text)} characters")

print(f"\n=== Basic Slicing [start:end] ===")
print(f"text[0:4] = '{text[0:4]}'  # Characters 0,1,2,3")
print(f"text[1:5] = '{text[1:5]}'  # Characters 1,2,3,4")
print(f"text[3:7] = '{text[3:7]}'  # Characters 3,4,5,6")

print(f"\n=== Slicing from Start or to End ===")
print(f"text[:5] = '{text[:5]}'    # From beginning to index 4")
print(f"text[5:] = '{text[5:]}'    # From index 5 to end")
print(f"text[:] = '{text[:]}'      # Entire string (copy)")

print(f"\n=== Negative Slicing ===")
print(f"text[-4:] = '{text[-4:]}'   # Last 4 characters")
print(f"text[:-3] = '{text[:-3]}'   # All except last 3")
print(f"text[-7:-2] = '{text[-7:-2]}' # From -7 to -3")

text = "Programming"
               #-4-3-2-1


text[-4:]

'ming'

print(f"\n=== Slicing with Step ===")
print(f"text[::2] = '{text[::2]}'   # Every 2nd character")
print(f"text[1::2] = '{text[1::2]}' # Every 2nd, starting from index 1")
print(f"text[::3] = '{text[::3]}'   # Every 3rd character")

print(f"\n=== Reverse String ===")
print(f"text[::-1] = '{text[::-1]}' # Reverse the entire string!")

print(f"\n=== Practical Examples ===")
email = "user@example.com"
print(f"Email: '{email}'")
print(f"Username: '{email[:email.index('@')]}'")  # Before @
print(f"Domain: '{email[email.index('@')+1:]}'")  # After @

filename = "data.txt"
print(f"\nFilename: '{filename}'")
print(f"Name part: '{filename[:-4]}'")  # Without extension
print(f"Extension: '{filename[-4:]}'")  # Just extension

=== String Slicing ===
Original string: 'Programming'
Length: 11 characters

=== Basic Slicing [start:end] ===
text[0:4] = 'Prog'  # Characters 0,1,2,3
text[1:5] = 'rogr'  # Characters 1,2,3,4
text[3:7] = 'gram'  # Characters 3,4,5,6

=== Slicing from Start or to End ===
text[:5] = 'Progr'    # From beginning to index 4
text[5:] = 'amming'    # From index 5 to end
text[:] = 'Programming'      # Entire string (copy)

=== Negative Slicing ===
text[-4:] = 'ming'   # Last 4 characters
text[:-3] = 'Programm'   # All except last 3
text[-7:-2] = 'rammi' # From -7 to -3

=== Slicing with Step ===
text[::2] = 'Pormig'   # Every 2nd character
text[1::2] = 'rgamn' # Every 2nd, starting from index 1
text[::3] = 'Pgmn'   # Every 3rd character

=== Reverse String ===
text[::-1] = 'gnimmargorP' # Reverse the entire string!

=== Practical Examples ===
Email: 'user@example.com'
Username: 'user'
Domain: 'example.com'

Filename: 'data.txt'
Name part: 'data'
Extension: '.txt'

2.1.3 Common String Operations#

Strings have many useful methods (functions) that help you work with text:

Basic String Methods#

Method	Description	Example	Result
`.upper()`	Convert to uppercase	`"reactor".upper()`	`"REACTOR"`
`.lower()`	Convert to lowercase	`"REACTOR".lower()`	`"reactor"`
`.title()`	Title case (first letter caps)	`"batch reactor".title()`	`"Batch Reactor"`
`.len()`	Length of string	`len("benzene")`	`7`
`.replace()`	Replace text	`"pump A".replace("A", "B")`	`"pump B"`

Real Examples: Standardization & Data Cleaning#

Standardization Example: Imagine you have equipment data from different sources with inconsistent naming:

Raw data: "pump a", "PUMP B", "Pump-C", "centrifugal_pump_d"
Standardized: "Pump A", "Pump B", "Pump C", "Centrifugal Pump D"

examples = ['Pump a', 'Pump B', 'pump c', 'Pump E ', 'Pump f']


standardized_list = []

for ex in examples:
    standardized = ex.strip().title()
    standardized = standardized.replace(' ', '_')
    standardized_list.append(standardized)

standardized_list

['Pump_A', 'Pump_B', 'Pump_C', 'Pump_E', 'Pump_F']

print("=== Basic String Methods ===")

# Equipment data that might need standardization
equipment_name = "centrifugal_pump_d"
chemical_name = "BENZENE"
process_status = "normal operation"

print(f"Original: '{equipment_name}'")
print(f"Uppercase: '{equipment_name.upper()}'")
print(f"Title Case: '{equipment_name.title()}'")
print(f"Replace underscores: '{equipment_name.replace('_', ' ')}'")
print(f"Replace underscores & Title Case: '{equipment_name.replace('_', ' ').title()}'")

print(f"\nOriginal: '{chemical_name}'")
print(f"Lowercase: '{chemical_name.lower()}'")

print(f"\nOriginal: '{process_status}'")
print(f"Title Case: '{process_status.title()}'")

=== Basic String Methods ===
Original: 'centrifugal_pump_d'
Uppercase: 'CENTRIFUGAL_PUMP_D'
Title Case: 'Centrifugal_Pump_D'
Replace underscores: 'centrifugal pump d'
Replace underscores & Title Case: 'Centrifugal Pump D'

Original: 'BENZENE'
Lowercase: 'benzene'

Original: 'normal operation'
Title Case: 'Normal Operation'

print(f"\n=== String Length ===")
formula = "C6H12O6"  # Glucose
safety_message = "Wear safety goggles at all times"

print(f"Chemical formula '{formula}' has {len(formula)} characters")
print(f"Safety message has {len(safety_message)} characters")

print(f"\n=== Replacing Text ===")
old_equipment = "Pump A is running"
new_equipment = old_equipment.replace("A", "B")

print(f"Original: '{old_equipment}'")
print(f"Updated: '{new_equipment}'")

=== Replacing Text ===
Original: 'Pump A is running'
Updated: 'Pump B is running'

# Useful for updating equipment IDs
reactor_list = "R-101, R-102, R-103"
updated_list = reactor_list.replace("R-", "Reactor-")
print(f"\nEquipment list: '{reactor_list}'")
print(f"Formatted list: '{updated_list}'")

Equipment list: 'R-101, R-102, R-103'
Formatted list: 'Reactor-101, Reactor-102, Reactor-103'

print(f"\n=== Practical Example: Data Cleaning ===")
# Messy data from a sensor log file
messy_data = "  REACTOR temperature:  85.5 °C  "
print(f"Raw data: '{messy_data}'")

# Clean it up
clean_data = messy_data.strip()  # Remove extra spaces
clean_data = clean_data.lower()  # Consistent case
clean_data = clean_data.title()  # Make it look professional

print(f"Cleaned data: '{clean_data}'")

=== Practical Example: Data Cleaning ===
Raw data: '  REACTOR temperature:  85.5 °C  '
Cleaned data: 'Reactor Temperature:  85.5 °C'

Data Cleaning Example: Sensor data files often have messy formatting:

Raw data: " TEMPERATURE:85.5°c ", "pressure: 2.3 BAR"
Cleaned: "Temperature: 85.5°C", "Pressure: 2.3 bar"

Let’s see these in action:

2.1.4 String Formatting#

String formatting is the process of inserting variables and values into strings to create dynamic text. Instead of manually concatenating strings with the + operator, Python provides several elegant methods to combine text and data.

Example of the problem:

# Hard to read and maintain
name = "Reactor"
temperature = 85.5
pressure = 2.3
message = "The " + name + " temperature is " + str(temperature) + "°C and pressure is " + str(pressure) + " bar"

Python offers three main formatting methods:

Concatenation Operator (+) - Basic but limited
.format() Method - Powerful and flexible
f-strings - Modern and most readable (Python 3.6+)

temperature = 100 # Celcius
unit = 'K'

# print(f'temperature is {temperature} {unit}')

print('temperature is', temperature, unit)

temperature is 100 K

Method 1: Concatenation Operator (`+`)#

The basic method uses the + operator to join strings together. Important: You must convert non-string values to strings using str() function first.

Advantages:

Simple and straightforward
Works in all Python versions

Disadvantages:

Can become messy with many variables
Must manually convert numbers to strings
Hard to read with complex formatting

print("=== Basic Concatenation Examples ===")

# Chemical data
MW = 63.21
result1 = "Molar mass = " + str(MW) + " g/mol"
print(result1)

# Equipment status
reactor_id = "R-101"
temperature = 85.5
status = "The reactor " + reactor_id + " is at " + str(temperature) + "°C"
print(status)

=== Basic Concatenation Examples ===
Molar mass = 63.21 g/mol
The reactor R-101 is at 85.5°C

print("\n=== Problems with Concatenation ===")
# This gets messy quickly!
reactor = "R-102"
temp = 92.3
pressure = 2.5
flow_rate = 125.7
complex_message = "Reactor " + reactor + " status: Temperature = " + str(temp) + "°C, Pressure = " + str(pressure) + " bar, Flow rate = " + str(flow_rate) + " L/min"
print("Complex message (hard to read):")
print(complex_message)

=== Problems with Concatenation ===
Complex message (hard to read):
Reactor R-102 status: Temperature = 92.3°C, Pressure = 2.5 bar, Flow rate = 125.7 L/min

print("\n=== Common Mistakes ===")
# This will cause an error - uncomment to see:
# temperature = 85.5
# error_message = "Temperature is " + temperature  # TypeError!
print("Remember: Must convert numbers to strings with str()")

=== Common Mistakes ===
Remember: Must convert numbers to strings with str()

Method 2: `.format()` Method#

The .format() method provides a more powerful and flexible way to format strings. You use {} as placeholders and pass values to the .format() method.

Advantages:

No need to convert numbers to strings manually
Placeholders make the template clear
Can control number formatting (decimal places, etc.)
Can reuse and reorder variables

Basic Syntax:

"template with {} placeholders".format(value1, value2)

# Method 2: .format() Method
print("=== Basic .format() Examples ===")
# Same examples as before, but cleaner
MW = 63.21
result2 = "Molar mass = {} g/mol".format(MW)
print(result2)

=== Basic .format() Examples ===
Molar mass = 63.21 g/mol

# Multiple variables
reactor_id = "R-101"
temperature = 85.5
status = "The reactor {} is at {}°C".format(reactor_id, temperature)
print(status)

The reactor R-101 is at 85.5°C

# Complex example - much cleaner!
chemical = "Benzene"
formula = "C6H6"
bp = 80.1
description = "{} ({}) has a boiling point of {}°C".format(chemical, formula, bp)
print(description)

Benzene (C6H6) has a boiling point of 80.1°C

print("\n=== Comparing Methods ===")
reactor = "R-102"
temp = 92.3
pressure = 2.5

# Concatenation (messy)
concat_msg = "Reactor " + reactor + " is at " + str(temp) + "°C, " + str(pressure) + " bar"

# .format() method (cleaner)
format_msg = "Reactor {} is at {}°C, {} bar".format(reactor, temp, pressure)

print("Concatenation:", concat_msg)
print(".format():", format_msg)

=== Comparing Methods ===
Concatenation: Reactor R-102 is at 92.3°C, 2.5 bar
.format(): Reactor R-102 is at 92.3°C, 2.5 bar

Method 3: f-strings (Formatted String Literals)#

f-strings are the most modern and readable way to format strings in Python (available in Python 3.6+). You put an f before the quotes and write variables directly inside {}.

Advantages:

Most readable and concise
Variables are directly visible in the string
Excellent performance
Can include expressions inside {}
Preferred method for new Python code

Basic Syntax:

f"template with {variable} directly inside"

# Method 3: f-strings (Formatted String Literals)
print("=== Basic f-string Examples ===")
# Same examples, now with f-strings (cleanest!)
MW = 63.21
unit = 'g/mol'
result3 = f"Molar mass = {MW} {unit}"
print(result3)

=== Basic f-string Examples ===
Molar mass = 63.21 g/mol

# Multiple variables - very readable
reactor_id = "R-101"
temperature = 85.5
status = f"The reactor {reactor_id} is at {temperature}°C"
print(status)

The reactor R-101 is at 85.5°C

# Complex example - super clean!
chemical = "Benzene"
formula = "C6H6"
bp = 80.1
description = f"{chemical} ({formula}) has a boiling point of {bp}°C"
print(description)

Benzene (C6H6) has a boiling point of 80.1°C

print("\n=== Advanced f-string Features ===")
# 1. Expressions inside braces
length = 5
width = 3
area_msg = f"Rectangle area = {length * width} square units"
print(area_msg)

=== Advanced f-string Features ===
Rectangle area = 15 square units

# 2. Function calls inside braces
name = "reactor temperature"
formatted_name = f"Sensor: {name.title()}"
print(formatted_name)

Sensor: Reactor Temperature

# 3. Number formatting (same as .format())
pi = 3.14159265359
pressure = 2.34567
print(f"Pi = {pi:.2f}")           # 2 decimal places
print(f"Pi = {pi:.4f}")           # 4 decimal places
print(f"Pressure = {pressure:.1f} bar")

Pi = 3.14
Pi = 3.1416
Pressure = 2.3 bar

# 4. Percentage formatting
efficiency = 0.854
print(f"Reactor efficiency: {efficiency:.1%}")

Reactor efficiency: 85.4%

# 5. Complex calculations
temp_celsius = 25
temp_fahrenheit = f"Temperature: {temp_celsius}°C = {temp_celsius * 9/5 + 32:.1f}°F"
print(temp_fahrenheit)

Temperature: 25°C = 77.0°F

print("\n=== Comparing All Three Methods ===")
reactor = "R-102"
temp = 92.3
pressure = 2.5

# Method 1: Concatenation (verbose)
concat_msg = "Reactor " + reactor + " is at " + str(temp) + "°C, " + str(pressure) + " bar"

# Method 2: .format() (good)
format_msg = "Reactor {} is at {}°C, {} bar".format(reactor, temp, pressure)

# Method 3: f-string (best!)
fstring_msg = f"Reactor {reactor} is at {temp}°C, {pressure} bar"

print("Concatenation:", concat_msg)
print(".format():", format_msg)
print("f-string:", fstring_msg)

=== Comparing All Three Methods ===
Concatenation: Reactor R-102 is at 92.3°C, 2.5 bar
.format(): Reactor R-102 is at 92.3°C, 2.5 bar
f-string: Reactor R-102 is at 92.3°C, 2.5 bar

Practical Summary: What You Should Use#

For this course: Use f-strings!

f-strings are the modern, preferred way to format strings in Python. They’re more readable, faster, and easier to write.

Why we showed the other methods:

Concatenation (+): You might need this for simple cases
.format(): You’ll encounter this when reading older Python code or tutorials

Method	When to Use	Example
Concatenation (`+`)	❌ Avoid (hard to read)	`"Hello " + name`
`.format()`	📚 Understanding old code only	`"Value: {:.2f}".format(num)`
f-strings	✅ USE THIS!	`f"Value: {num:.2f}"`

Bottom line: Always use f-strings in your assignments and projects!

Real-World Example: Laboratory Report

Imagine you’re writing a program to generate laboratory reports. Here’s how each method would look:

# Real-World Example: Laboratory Report Generator

print("=== LABORATORY REPORT GENERATOR ===")
print("Generating a report for chemical analysis results...\n")

# Laboratory data
sample_id = "CHEM-2024-001"
compound = "Ethanol"
purity = 0.9534
temperature = 25.0
analyst = "Dr. Smith"
date = "2024-11-19"

print("=== Method 1: Concatenation (Verbose) ===")
report1 = "Sample ID: " + sample_id + "\n" + \
          "Compound: " + compound + "\n" + \
          "Purity: " + str(purity * 100) + "%\n" + \
          "Analysis Temperature: " + str(temperature) + "°C\n" + \
          "Analyst: " + analyst + "\n" + \
          "Date: " + date
print(report1)

print("\n=== Method 2: .format() (Better) ===")
report2 = """Sample ID: {}
Compound: {}
Purity: {:.1%}
Analysis Temperature: {}°C
Analyst: {}
Date: {}""".format(sample_id, compound, purity, temperature, analyst, date)
print(report2)

print("\n=== Method 3: f-strings (Best!) ===")
report3 = f"""Sample ID: {sample_id}
Compound: {compound}
Purity: {purity:.1%}
Analysis Temperature: {temperature}°C
Analyst: {analyst}
Date: {date}"""
print(report3)

print("\n=== Advanced f-string Features in Reports ===")
# Calculate derived values directly in the f-string
molecular_weight = 46.07  # g/mol for ethanol
concentration = 0.250     # mol/L

advanced_report = f"""
ADVANCED ANALYSIS REPORT
========================
Sample: {sample_id} ({compound})
Molecular Weight: {molecular_weight} g/mol
Concentration: {concentration} mol/L
Mass per Liter: {concentration * molecular_weight:.2f} g/L
Purity: {purity:.1%} ({purity:.4f} decimal)
Quality Grade: {'High' if purity > 0.95 else 'Standard'}
Temperature: {temperature}°C ({temperature * 9/5 + 32:.1f}°F)
"""
print(advanced_report)

=== LABORATORY REPORT GENERATOR ===
Generating a report for chemical analysis results...

=== Method 1: Concatenation (Verbose) ===
Sample ID: CHEM-2024-001
Compound: Ethanol
Purity: 95.34%
Analysis Temperature: 25.0°C
Analyst: Dr. Smith
Date: 2024-11-19

=== Method 2: .format() (Better) ===
Sample ID: CHEM-2024-001
Compound: Ethanol
Purity: 95.3%
Analysis Temperature: 25.0°C
Analyst: Dr. Smith
Date: 2024-11-19

=== Method 3: f-strings (Best!) ===
Sample ID: CHEM-2024-001
Compound: Ethanol
Purity: 95.3%
Analysis Temperature: 25.0°C
Analyst: Dr. Smith
Date: 2024-11-19

=== Advanced f-string Features in Reports ===

ADVANCED ANALYSIS REPORT
========================
Sample: CHEM-2024-001 (Ethanol)
Molecular Weight: 46.07 g/mol
Concentration: 0.25 mol/L
Mass per Liter: 11.52 g/L
Purity: 95.3% (0.9534 decimal)
Quality Grade: High
Temperature: 25.0°C (77.0°F)

Chapter 2: Working with Text (Strings)

Contents

Chapter 2: Working with Text (Strings)#

2.1 String#

2.1.1 Creating Strings#

2.1.2 Indexing & Slicing#

2.1.3 Common String Operations#

Basic String Methods#

Real Examples: Standardization & Data Cleaning#

2.1.4 String Formatting#

Method 1: Concatenation Operator (+)#

Method 2: .format() Method#

Method 3: f-strings (Formatted String Literals)#

Practical Summary: What You Should Use#

Method 1: Concatenation Operator (`+`)#

Method 2: `.format()` Method#