Chapter 6: File Input/Output

Chapter 6: File Input/Output#

In real-world applications, you rarely type all your data directly into code. Instead, you:

Read data from files (sensor logs, experimental results, configuration files)
Write results to files (analysis outputs, reports, processed data)
Save and load data between sessions

File I/O (Input/Output) is essential for:

Loading experimental data
Saving calculation results
Reading configuration parameters
Creating data analysis pipelines
Sharing data with other programs

In this chapter, we’ll cover:

Reading files
Writing files
File paths and directories

6.1 Reading Text Files#

Python provides built-in functions to read text files. The basic process is:

Open the file
Read the contents
Close the file

Basic syntax:

file = open('filename.txt', 'r')  # 'r' means read mode
content = file.read()
file.close()

File modes:

'r' - Read (default)
'w' - Write (overwrites existing file)
'a' - Append (adds to end of file)
'r+' - Read and write

# Basic File Reading - Method 1: Read entire file

print("=== Reading Entire File ===")

# Open file in read mode
file = open('06_temperature_data.txt', 'r')

# Read entire contents
content = file.read()

# Close the file (important!)
file.close()


content

# print(content)
# print(f"\nType: {type(content)}")
# print(f"Length: {len(content)} characters")

=== Reading Entire File ===

'Time,Temperature\n0,25.0\n1,27.5\n2,30.2\n3,32.8\n4,35.1\n'

# Basic File Reading - Method 2: Read line by line

print("=== Reading Line by Line ===")

file = open('06_temperature_data.txt', 'r')

# Read lines into a list
lines = file.readlines()

file.close()

#print(lines)

# print(f"results: {lines}\n")

# print(f"Number of lines: {len(lines)}\n")

for i, line in enumerate(lines):
    print(f"Line {i}: {repr(line)}")

=== Reading Line by Line ===
Line 0: 'Time,Temperature\n'
Line 1: '0,25.0\n'
Line 2: '1,27.5\n'
Line 3: '2,30.2\n'
Line 4: '3,32.8\n'
Line 5: '4,35.1\n'

6.1.1 Using Context Managers (Best Practice)#

Context managers automatically close files, even if errors occur. This is the recommended way to work with files!

Syntax:

with open('filename.txt', 'r') as file:
    content = file.read()
    # File automatically closes here

Advantages:

Automatically closes the file
Prevents resource leaks
Cleaner code
Handles errors gracefully

# Using Context Manager (Recommended!)

print("=== Reading with Context Manager ===")

# File opens and automatically closes
with open('06_temperature_data.txt', 'r') as f:
    content = f.read()
    print(content)

# File is already closed here
print("\nFile has been automatically closed")

=== Reading with Context Manager ===
Time,Temperature
0,25.0
1,27.5
2,30.2
3,32.8
4,35.1


File has been automatically closed

# Processing File Line by Line

print("=== Processing Data Line by Line ===")

with open('06_temperature_data.txt', 'r') as file:
    # Skip header line
    header = file.readline()
    print(f"Header: {header.strip()}\n") 
    # .strip() removes extra whitespace/newline
    
    print("Data:")
    for line in file:
        # Remove whitespace and split by comma
        line = line.strip()
        if line:  # Skip empty lines
            time, temp = line.split(',')
            print(f"  Time: {time} min, Temperature: {temp}°C")

=== Processing Data Line by Line ===
Header: Time,Temperature

Data:
  Time: 0 min, Temperature: 25.0°C
  Time: 1 min, Temperature: 27.5°C
  Time: 2 min, Temperature: 30.2°C
  Time: 3 min, Temperature: 32.8°C
  Time: 4 min, Temperature: 35.1°C

6.2 Writing Text Files#

Writing files is just as important as reading them. You can:

Save calculation results
Create data logs
Generate reports
Export data for other programs

Write modes:

'w' - Write (creates new file or overwrites existing)
'a' - Append (adds to end of existing file)
'x' - Exclusive creation (fails if file exists)

# Writing to a File

print("=== Writing Data to File ===")

# Data to write
reactor_data = [
    ("R-101", 85.0, 2.5),
    ("R-102", 92.0, 2.8),
    ("R-103", 78.5, 2.3),
]

# Write to file
with open('06_reactor_report.txt', 'w') as file:
    # Write header
    file.write("Reactor Status Report\n")
    file.write("=" * 40 + "\n\n")
    
    # Write data
    for reactor_id, temp, pressure in reactor_data:
        line = f"{reactor_id}: Temp={temp}°C, Pressure={pressure} bar\n"
        file.write(line)

print("Report written to 06_reactor_report.txt")

# Read it back to verify
print("\nFile contents:")
with open('06_reactor_report.txt', 'r') as file:
    print(file.read())

=== Writing Data to File ===
Report written to 06_reactor_report.txt

File contents:
Reactor Status Report
========================================

R-101: Temp=85.0°C, Pressure=2.5 bar
R-102: Temp=92.0°C, Pressure=2.8 bar
R-103: Temp=78.5°C, Pressure=2.3 bar

# Appending to a File

print("=== Appending to Existing File ===")

# Add more data
new_data = ("R-104", 88.5, 2.6)

with open('06_reactor_report.txt', 'a') as file:
    reactor_id, temp, pressure = new_data
    line = f"{reactor_id}: Temp={temp}°C, Pressure={pressure} bar\n"
    file.write(line)

print("Data appended to 06_reactor_report.txt")

# Read updated file
print("\nUpdated file contents:")
with open('06_reactor_report.txt', 'r') as file:
    print(file.read())

=== Appending to Existing File ===
Data appended to 06_reactor_report.txt

Updated file contents:
Reactor Status Report
========================================

R-101: Temp=85.0°C, Pressure=2.5 bar
R-102: Temp=92.0°C, Pressure=2.8 bar
R-103: Temp=78.5°C, Pressure=2.3 bar
R-104: Temp=88.5°C, Pressure=2.6 bar

(Optional) 6.3 Working with CSV Files#

CSV (Comma-Separated Values) files are one of the most common data formats in science and engineering.

Why CSV?

Simple, universal format
Works with Excel, Google Sheets, MATLAB, etc.
Human-readable
Easy to parse

Python has a built-in csv module for working with CSV files.

In addition, other Python packages can also be used to read and write CSV files, depending on your needs:

pandas → best for data analysis, tables, and large datasets
numpy → useful for numerical data stored in CSV format
openpyxl / xlsxwriter → when working with Excel files that include CSV-like data

For simple tasks, the built-in csv module is often sufficient.
For more complex data processing and analysis, pandas is commonly preferred.

import csv

# Writing CSV Files

print("=== Writing CSV File ===")

# Sample experimental data
experimental_data = [
    ["Time (min)", "Temperature (C)", "Pressure (bar)", "Flow Rate (L/min)"],
    [0, 25.0, 1.0, 50.0],
    [5, 45.2, 1.5, 55.3],
    [10, 65.8, 2.0, 60.1],
    [15, 85.3, 2.5, 65.7],
    [20, 95.1, 2.8, 70.2],
]

# Write to CSV
with open('06_experiment_data.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    writer.writerows(experimental_data)

print("Data written to 06_experiment_data.csv")
# Display what we wrote
print("\nCSV contents:")
with open('06_experiment_data.csv', 'r') as file:
    print(file.read())

=== Writing CSV File ===
Data written to 06_experiment_data.csv

CSV contents:
Time (min),Temperature (C),Pressure (bar),Flow Rate (L/min)
0,25.0,1.0,50.0
5,45.2,1.5,55.3
10,65.8,2.0,60.1
15,85.3,2.5,65.7
20,95.1,2.8,70.2

# Reading CSV Files

print("=== Reading CSV File ===")

with open('06_experiment_data.csv', 'r') as file:
    reader = csv.reader(file)
    
    # Read header
    header = next(reader)
    print("Header:", header)
    print()
    
    # Read data rows
    print("Data:")
    for row in reader:
        time, temp, pressure, flow = row
        print(f"  t={time} min: T={temp}°C, P={pressure} bar, F={flow} L/min")

=== Reading CSV File ===
Header: ['Time (min)', 'Temperature (C)', 'Pressure (bar)', 'Flow Rate (L/min)']

Data:
  t=0 min: T=25.0°C, P=1.0 bar, F=50.0 L/min
  t=5 min: T=45.2°C, P=1.5 bar, F=55.3 L/min
  t=10 min: T=65.8°C, P=2.0 bar, F=60.1 L/min
  t=15 min: T=85.3°C, P=2.5 bar, F=65.7 L/min
  t=20 min: T=95.1°C, P=2.8 bar, F=70.2 L/min

# read csv with pandas
import pandas as pd
df = pd.read_csv('06_experiment_data.csv')
print("\nData read with pandas:")
print(df)   

Data read with pandas:
   Time (min)  Temperature (C)  Pressure (bar)  Flow Rate (L/min)
         0             25.0             1.0               50.0
         5             45.2             1.5               55.3
        10             65.8             2.0               60.1
        15             85.3             2.5               65.7
        20             95.1             2.8               70.2

6.4 File Paths and Directories#

Understanding file paths is crucial for working with files in different locations.

Path types:

Absolute path: Full path from root directory
- /Users/username/data/experiment.csv (Mac/Linux)
- C:\Users\username\data\experiment.csv (Windows)
Relative path: Path relative to current directory
- data/experiment.csv
- ../data/experiment.csv (.. means parent directory)

Assume your current working directory is:

project/
├── main.py
├── data/
│   ├── experiment.csv
│   └── results.csv
└── output/

data/experiment.csv
→ accesses the file inside the data directory
output/
→ refers to the output folder in the current directory
../project/data/experiment.csv
→ moves up one directory, then navigates into project/data
/Users/username/project/data/experiment.csv
→ absolute path that works regardless of the current directory

Python’s pathlib module provides a modern, cross-platform way to work with paths.

from pathlib import Path
import os

print("=== Working with Paths ===")

# Get current working directory
current_dir = Path.cwd()
print(f"Current directory: {current_dir}")

=== Working with Paths ===
Current directory: /Users/hoon/CHME212/cheme_comp_book/docs/chapter06

# Create a Path object
data_file = Path('batch_data')#06_temperature_data.txt')



print(f"\nFile path: {data_file}")
print(f"Absolute path: {data_file.absolute()}")
print(f"File exists: {data_file.exists()}")
print(f"Is file: {data_file.is_file()}")

File path: batch_data
Absolute path: /Users/hoon/CHME212/cheme_comp_book/docs/chapter06/batch_data
File exists: True
Is file: False

# Get file information
if data_file.exists():
    print(f"File name: {data_file.name}")
    print(f"File stem (no extension): {data_file.stem}")
    print(f"File extension: {data_file.suffix}")
    print(f"File size: {data_file.stat().st_size} bytes")

File name: batch_data
File stem (no extension): batch_data
File extension: 
File size: 192 bytes

# Creating Directories and Organizing Files

print("=== Creating Directory Structure ===")

# Create a data directory
data_dir = Path('project_data')
data_dir.mkdir(exist_ok=True)  # exist_ok=True won't error if exists

print(f"Created directory: {data_dir}")

=== Creating Directory Structure ===
Created directory: project_data

# Create subdirectories
raw_dir = data_dir / 'raw'
processed_dir = data_dir / 'processed'

raw_dir.mkdir(exist_ok=True)
processed_dir.mkdir(exist_ok=True)

print(f"Created: {raw_dir}")
print(f"Created: {processed_dir}")

Created: project_data/raw
Created: project_data/processed

# Write a file in the subdirectory
output_file = processed_dir / 'results.txt'
output_file.write_text("Analysis complete: All tests passed\n")

print(f"\nWrote file: {output_file}")
print(f"Contents: {output_file.read_text()}")

Wrote file: project_data/processed/results.txt
Contents: Analysis complete: All tests passed

6.5 (Optional) Error Handling with Files#

Files can cause many errors:

File doesn’t exist
No permission to read/write
Disk full
File is locked by another program

Good practice: Always handle potential errors!

# Example 1: File doesn't exist
print("\n1. Trying to read non-existent file:")
try:
    with open('nonexistent_file.txt', 'r') as file:
        content = file.read()
except FileNotFoundError:
    print("   Error: File not found!")
    print("   Creating a default file instead...")
    with open('nonexistent_file.txt', 'w') as file:
        file.write("Default content\n")
    print("   Default file created.")

1. Trying to read non-existent file:

# Example 2: Check if file exists before reading
print("\n2. Checking file existence first:")
filename = 'data_file.txt'
if Path(filename).exists():
    with open(filename, 'r') as file:
        content = file.read()
    print(f"   File read successfully: {len(content)} characters")
else:
    print(f"   File '{filename}' does not exist")

2. Checking file existence first:
   File 'data_file.txt' does not exist

# print("\n3. Comprehensive error handling:")

# def safe_read_file(filename):
#     """Safely read a file with error handling"""
#     try:
#         with open(filename, 'r') as file:
#             return file.read()
#     except FileNotFoundError:
#         print(f"   Error: '{filename}' not found")
#         return None
#     except PermissionError:
#         print(f"   Error: No permission to read '{filename}'")
#         return None
#     except Exception as e:
#         print(f"   Unexpected error: {e}")
#         return None
    
# content = safe_read_file('experiment_data.csv')
# if content:
#     print(f"   Successfully read {len(content)} characters")

6.6 Practical Applications#

Let’s put everything together with realistic examples.

# Comprehensive Example 1: Processing Sensor Log Files

print("=" * 60)
print("SENSOR DATA ANALYSIS PIPELINE")
print("=" * 60)

# Step 1: Create sample sensor log
print("\n[1/4] Creating sample sensor log...")

sensor_log = """timestamp,sensor_id,temperature,pressure,status
2024-01-15 08:00:00,S001,25.3,1.01,OK
2024-01-15 08:05:00,S001,28.7,1.05,OK
2024-01-15 08:10:00,S001,95.2,2.85,WARNING
2024-01-15 08:15:00,S001,105.3,3.15,CRITICAL
2024-01-15 08:20:00,S001,98.1,2.95,WARNING
2024-01-15 08:25:00,S001,85.4,2.50,OK
"""

with open('sensor_log.csv', 'w') as f:
    f.write(sensor_log)
print("   sensor_log.csv created")

# Step 2: Read and analyze data
print("\n[2/4] Analyzing sensor data...")

warnings = []
critical = []
temperatures = []

with open('sensor_log.csv', 'r') as file:
    reader = csv.DictReader(file)
    
    for row in reader:
        temp = float(row['temperature'])
        temperatures.append(temp)
        
        if row['status'] == 'WARNING':
            warnings.append(row)
        elif row['status'] == 'CRITICAL':
            critical.append(row)

print(f"   Total readings: {len(temperatures)}")
print(f"   Warnings: {len(warnings)}")
print(f"   Critical alerts: {len(critical)}")
print(f"   Avg temperature: {sum(temperatures)/len(temperatures):.1f}°C")
print(f"   Max temperature: {max(temperatures):.1f}°C")

# Step 3: Generate report
print("\n[3/4] Generating analysis report...")

with open('sensor_analysis_report.txt', 'w') as file:
    file.write("SENSOR ANALYSIS REPORT\n")
    file.write("=" * 50 + "\n\n")
    
    file.write(f"Total readings: {len(temperatures)}\n")
    file.write(f"Average temperature: {sum(temperatures)/len(temperatures):.1f}°C\n")
    file.write(f"Maximum temperature: {max(temperatures):.1f}°C\n")
    file.write(f"Minimum temperature: {min(temperatures):.1f}°C\n\n")
    
    file.write(f"Warnings: {len(warnings)}\n")
    file.write(f"Critical alerts: {len(critical)}\n\n")
    
    if critical:
        file.write("CRITICAL EVENTS:\n")
        for event in critical:
            file.write(f"  {event['timestamp']}: "
                      f"T={event['temperature']}°C, "
                      f"P={event['pressure']} bar\n")

print("   Report saved to sensor_analysis_report.txt")

# Step 4: Display the report
print("\n[4/4] Report contents:")
print("=" * 60)
with open('sensor_analysis_report.txt', 'r') as file:
    print(file.read())

============================================================
SENSOR DATA ANALYSIS PIPELINE
============================================================

[1/4] Creating sample sensor log...
   sensor_log.csv created

[2/4] Analyzing sensor data...
   Total readings: 6
   Warnings: 2
   Critical alerts: 1
   Avg temperature: 73.0°C
   Max temperature: 105.3°C

[3/4] Generating analysis report...
   Report saved to sensor_analysis_report.txt

[4/4] Report contents:
============================================================
SENSOR ANALYSIS REPORT
==================================================

Total readings: 6
Average temperature: 73.0°C
Maximum temperature: 105.3°C
Minimum temperature: 25.3°C

Warnings: 2
Critical alerts: 1

CRITICAL EVENTS:
  2024-01-15 08:15:00: T=105.3°C, P=3.15 bar

# Comprehensive Example 2: Batch Processing Multiple Files

print("=" * 60)
print("BATCH DATA QUALITY CONTROL SYSTEM")
print("=" * 60)

# Step 1: Create sample batch data files
print("\n[1/3] Creating sample batch data...")

batch_dir = Path('batch_data')
batch_dir.mkdir(exist_ok=True)

# Create multiple batch files
batches = {
    'batch_001.csv': [["purity", "yield"], [0.965, 0.88], [0.962, 0.89]],
    'batch_002.csv': [["purity", "yield"], [0.948, 0.91], [0.945, 0.90]],
    'batch_003.csv': [["purity", "yield"], [0.978, 0.85], [0.975, 0.86]],
}

for filename, data in batches.items():
    filepath = batch_dir / filename
    with open(filepath, 'w', newline='') as file:
        writer = csv.writer(file)
        writer.writerows(data)
    print(f"   Created: {filename}")

# Step 2: Process all batch files
print("\n[2/3] Processing all batches...")

min_purity = 0.95
min_yield = 0.85

results = []

for batch_file in sorted(batch_dir.glob('batch_*.csv')):
    print(f"\n   Processing {batch_file.name}...")
    
    with open(batch_file, 'r') as file:
        reader = csv.DictReader(file)
        
        batch_purities = []
        batch_yields = []
        
        for row in reader:
            batch_purities.append(float(row['purity']))
            batch_yields.append(float(row['yield']))
        
        avg_purity = sum(batch_purities) / len(batch_purities)
        avg_yield = sum(batch_yields) / len(batch_yields)
        
        # Quality control check
        if avg_purity >= min_purity and avg_yield >= min_yield:
            status = "PASS"
        else:
            status = "FAIL"
        
        print(f"     Avg purity: {avg_purity:.1%}")
        print(f"     Avg yield: {avg_yield:.1%}")
        print(f"     Status: {status}")
        
        results.append({
            'batch': batch_file.stem,
            'purity': avg_purity,
            'yield': avg_yield,
            'status': status
        })

# Step 3: Write summary report
print("\n[3/3] Generating summary report...")

summary_file = batch_dir / 'quality_control_summary.csv'
fieldnames = ['batch', 'purity', 'yield', 'status']

with open(summary_file, 'w', newline='') as file:
    writer = csv.DictWriter(file, fieldnames=fieldnames)
    writer.writeheader()
    writer.writerows(results)

print(f"   Summary saved to {summary_file}")

# Display summary
print("\nQUALITY CONTROL SUMMARY:")
print("=" * 60)
passed = sum(1 for r in results if r['status'] == 'PASS')
failed = len(results) - passed
print(f"Total batches: {len(results)}")
print(f"Passed: {passed}")
print(f"Failed: {failed}")
print(f"\nPass rate: {passed/len(results):.1%}")

============================================================
BATCH DATA QUALITY CONTROL SYSTEM
============================================================

[1/3] Creating sample batch data...
   Created: batch_001.csv
   Created: batch_002.csv
   Created: batch_003.csv

[2/3] Processing all batches...

   Processing batch_001.csv...
     Avg purity: 96.4%
     Avg yield: 88.5%
     Status: PASS

   Processing batch_002.csv...
     Avg purity: 94.6%
     Avg yield: 90.5%
     Status: FAIL

   Processing batch_003.csv...
     Avg purity: 97.6%
     Avg yield: 85.5%
     Status: PASS

[3/3] Generating summary report...
   Summary saved to batch_data/quality_control_summary.csv

QUALITY CONTROL SUMMARY:
============================================================
Total batches: 3
Passed: 2
Failed: 1

Pass rate: 66.7%

Summary#

In this chapter, you learned how to work with files in Python:

Reading Files#

open() function: Basic file operations
Context managers: with statement (best practice)
Methods: .read(), .readline(), .readlines()
Iterate line by line: Memory-efficient for large files

Writing Files#

Write mode: 'w' (overwrites)
Append mode: 'a' (adds to end)
Methods: .write(), .writelines()

CSV Files#

csv.reader() and csv.writer(): Basic CSV operations
csv.DictReader() and csv.DictWriter(): Dictionary-based (more readable)
Common format for data exchange

File Paths#

pathlib.Path: Modern path handling
Absolute vs relative paths
Directory operations: Create, check existence

Best Practices#

Always use context managers (with statement)
Handle errors (file not found, permissions)
Close files (automatic with context managers)
Check file existence before reading
Use meaningful file names
Organize data in directories

Quick Reference#

# Reading a file
with open('data.txt', 'r') as file:
    content = file.read()

# Writing a file
with open('output.txt', 'w') as file:
    file.write("Results\n")

# Reading CSV
import csv
with open('data.csv', 'r') as file:
    reader = csv.DictReader(file)
    for row in reader:
        print(row['column_name'])

# Writing CSV
with open('output.csv', 'w', newline='') as file:
    writer = csv.DictWriter(file, fieldnames=['name', 'value'])
    writer.writeheader()
    writer.writerow({'name': 'test', 'value': 123})

# Working with paths
from pathlib import Path
data_file = Path('data') / 'experiment.csv'
if data_file.exists():
    content = data_file.read_text()

File I/O is fundamental for real-world data analysis and scientific computing!