AWS Course

Module 0: Introduction, Prerequisites & The Cloud Mindset

Welcome to the "AWS for Web Developers" masterclass. This is not just a tutorial on clicking buttons; it is a comprehensive restructuring of how you approach software engineering. Moving from on-premise or simple VPS (like DigitalOcean/Heroku) to AWS requires a fundamental shift in philosophy.

In this module, we will establish the "Cloud Mindset," configure your local development environment, secure your wallet against unexpected bills, and understand the physical layout of the AWS global network.


0.1 The Paradigm Shift: Cattle vs. Pets

If you come from a background of managing a single Linux server (VPS), you likely treat that server like a Pet.

The Goal: By the end of this course, you will build systems where you can delete your web server, and your application will automatically recover within seconds without human intervention. This is called Disposable Infrastructure.

0.2 Required Technical Prerequisites

AWS is an advanced platform. To succeed, you must be comfortable with the following web fundamentals:

1. Networking Basics

AWS requires you to build your own network (VPC). You must understand:

2. The Terminal (CLI)

While the AWS Console (GUI) is great for learning, professionals use the Command Line Interface (CLI). You should know how to:

0.3 AWS Global Infrastructure

You cannot deploy code if you don't know where it lives. AWS is divided into three physical layers:

1. Regions

A Region is a physical location in the world where AWS clusters data centers. Example: us-east-1 (Northern Virginia), eu-west-2 (London).

Why it matters:

2. Availability Zones (AZs)

Inside every Region, there are typically 3 to 6 Availability Zones. An AZ is one or more discrete data centers with redundant power, networking, and connectivity.

The Golden Rule of High Availability: If you only run one server in us-east-1a, and that data center floods, your site is down. To be "High Availability," you must run servers in at least two AZs (e.g., us-east-1a AND us-east-1b).

3. Edge Locations

There are 400+ Edge Locations globally. These are not data centers for servers; they are caching endpoints for CloudFront (CDN). They sit very close to users to serve static content (images/video) quickly.

0.4 The Shared Responsibility Model

Who is responsible when a hack happens? It depends.

AWS is responsible for "Security OF the Cloud"

They protect the physical hardware, the concrete walls, the guards, the power, and the host OS virtualization software.

You are responsible for "Security IN the Cloud"

If you launch a server and leave the password as `admin/admin`, or leave Port 22 open to the world, and you get hacked—that is YOUR FAULT. AWS does not patch your OS or secure your code.

0.5 Setup & Safety: Avoiding "Bill Shock"

The Horror Story: A student leaves a large EC2 instance running, forgets about it, and wakes up to a $2,000 bill. Follow these steps immediately.

Step 1: Understand the Free Tier

Step 2: Set Up AWS Budgets (Do this NOW)

  1. Log into your Root Account.
  2. Search for "Budgets" in the console.
  3. Click "Create Budget" -> "Cost Budget".
  4. Set the amount to $10.00 (or $0.01).
  5. Configure "Thresholds": Send an email when actual costs reach 80% of the budget.

0.6 Installing the Tooling

We will interact with AWS programmatically. Install the AWS Command Line Interface (CLI).

Installation

Mac:

brew install awscli

Windows: Download the MSI installer from AWS.

Linux:

curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install

Configuration

Once we create credentials in Module 1, you will run:

aws configure
# AWS Access Key ID: [Paste Key]
# AWS Secret Access Key: [Paste Secret]
# Default region name: us-east-1
# Default output format: json

You are now ready to begin. In the next module, we will secure your account using IAM.

Module 1: Identity and Access Management (IAM)

Identity and Access Management (IAM) is the backbone of AWS security. It is a global service (not bound to a specific region like EC2). It answers two fundamental questions:

  1. Authentication: Who are you? (Login)
  2. Authorization: What are you allowed to do? (Permissions)

Before we launch servers, we must secure the perimeter. If a hacker gains access to your IAM credentials, they don't need to hack your firewall—they own the firewall.


1.1 The Root Account: The "God Mode"

When you first sign up for AWS with your email and credit card, you are logged in as the Root User. This account has unlimited privileges. It can close your account, change your billing, and delete every server you own.

CRITICAL SECURITY PROTOCOL:
  1. Log in as Root one last time.
  2. Enable MFA (Multi-Factor Authentication) immediately. Use Google Authenticator or Authy.
  3. Create an IAM User for yourself with Administrator permissions.
  4. Log out of Root.
  5. Lock the Root credentials (email/password) in a physical safe or password manager.
  6. NEVER use the Root account for daily tasks or API calls.

1.2 The Four IAM Identities

You must understand the difference between these four objects to architect securely.

1. IAM Users

Represents a person (e.g., `alice`, `bob`) or a specific legacy application. Users have permanent long-term credentials.

2. IAM Groups

A collection of Users. You should never attach permissions directly to a User. Always attach permissions to a Group, and add Users to that Group.

Example: Create a "Developers" group. Give the group access to S3. Add `alice` to the group. If `alice` leaves the company, simply remove her from the group.

3. IAM Roles (The Most Important Concept for Devs)

A Role is an identity that you can assume temporarily. It does not have a password or permanent keys.

Use Case: You have a Python script running on an EC2 server that needs to upload files to S3.

4. Policies

Policies are JSON documents that define permissions. They are attached to Users, Groups, or Roles.

1.3 Anatomy of a JSON Policy

You will be writing and reading a lot of JSON. Here is the structure of a policy implementing the Principle of Least Privilege.

{
  "Version": "2012-10-17",
  "Id": "S3-Restricted-Access",
  "Statement": [
    {
      "Sid": "AllowListBucket",
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket"
      ],
      "Resource": "arn:aws:s3:::my-company-data"
    },
    {
      "Sid": "AllowUploadFiles",
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:GetObject"
      ],
      "Resource": "arn:aws:s3:::my-company-data/*"
    }
  ]
}

Breaking it down:

Understanding ARNs:
The syntax is: arn:partition:service:region:account-id:resource-id
Example: arn:aws:ec2:us-east-1:123456789012:instance/i-0a1b2c3d

1.4 The Principle of Least Privilege

This is the golden rule of IAM. A user/service should only have the minimum permissions necessary to do their job, and nothing more.

Scenario: A Junior Dev needs to restart a specific web server.

1.5 IAM Credential Safety & Best Practices

1. Never Commit Keys to Git

If you commit an `access_key_id` to a public GitHub repo, AWS bots (and hacker bots) scan GitHub continuously. You will often receive an email from AWS within 5 minutes telling you your account is compromised.

Solution: Use `.env` files and add them to `.gitignore`. Better yet, use IAM Roles.

2. Password Policy

In IAM settings, enforce a strong password policy: Minimum length 12, require special characters, expire passwords every 90 days.

3. Access Analyzer & Credential Report

1.6 Hands-on Lab: Setting up your Admin User

Before moving to Module 2, perform these actions:

  1. Log in as Root.
  2. Go to IAM -> Users -> "Add User".
  3. Name: YourName-Admin.
  4. Select "AWS Management Console access".
  5. Create a Group called SysAdmins.
  6. Attach the managed policy AdministratorAccess to the Group.
  7. Add your user to the Group.
  8. Finish and Log out.
  9. Log back in using the IAM User credentials.

Module 2: Networking & Virtual Private Cloud (VPC)

If IAM is the gatekeeper, the VPC (Virtual Private Cloud) is the castle walls. A VPC is a logically isolated section of the AWS Cloud where you can launch resources in a virtual network that you define.

Many developers skip this and use the "Default VPC" AWS provides. This is fine for testing, but in production, you must build a custom network to separate public-facing servers from sensitive databases.


2.1 The Address Space: CIDR Blocks

When you create a VPC, you must assign it an IP address range using CIDR (Classless Inter-Domain Routing) notation.

Understanding the Notation

An IPv4 address is 32 bits (e.g., 192.168.0.1). The CIDR suffix (the /16 part) tells AWS how many bits are fixed.

Recommendation: For your main VPC, always use a large range like 10.0.0.0/16. This gives you plenty of room to grow. You cannot easily change the size of a VPC after creation.

Reserved IPs: In every subnet, AWS reserves 5 IP addresses. If you have a /24 subnet (256 IPs), you can only use 251.
  1. x.x.x.0: Network address.
  2. x.x.x.1: Reserved by AWS for the VPC router.
  3. x.x.x.2: Reserved by AWS for DNS.
  4. x.x.x.3: Reserved by AWS for future use.
  5. x.x.x.255: Network broadcast address.

2.2 Subnets: Slicing the Pie

You cannot launch servers directly into a VPC. You must launch them into Subnets. A subnet is a sub-section of your VPC range found within a specific Availability Zone.

The Architecture Pattern

To ensure High Availability, we create pairs of subnets across two Availability Zones (e.g., us-east-1a and us-east-1b).

2.3 Public vs. Private: The Route Table

A subnet is not inherently "Public" or "Private". The distinction is defined by its Route Table.

1. Internet Gateway (IGW)

This is a virtual modem that connects your VPC to the real internet.

2. Public Subnet Definition

A subnet is Public if its Route Table has a route to the Internet Gateway.

Destination: 0.0.0.0/0   Target: igw-123456

Use Case: Load Balancers, Bastion Hosts, Web Servers (though Web Servers are safer in private subnets behind a Load Balancer).

3. Private Subnet Definition

A subnet is Private if it does NOT have a route to the IGW.

Use Case: App Servers, Databases, Back-end Logic. No one from the outside world can directly ping these servers.

2.4 The NAT Gateway: Private Internet Access

Problem: Your database is in a Private Subnet (secure). But your database needs to download a security patch from `update.mysql.com`. How does it reach the internet if it has no IGW route?

Solution: The NAT Gateway (Network Address Translation).

  1. You launch a NAT Gateway in the Public Subnet.
  2. You edit the Private Subnet's Route Table: 0.0.0.0/0 -> nat-123456.
  3. Flow: Database -> Private Route Table -> NAT Gateway (Public Subnet) -> Internet Gateway -> Internet.
Cost Alert: NAT Gateways are expensive. They cost roughly $0.045/hour (~$32/month) PLUS data processing fees. For personal projects, you might skip the NAT Gateway, but you won't be able to run `yum update` on private servers.

2.5 Security: Firewalls

AWS has two layers of firewalls. You must understand the difference.

Layer 1: Security Groups (The Instance Level)

Layer 2: Network ACLs (The Subnet Level)

2.6 VPC Peering & Endpoints

2.7 Lab: The "Life of a Packet" Challenge

Before moving to Compute, visualize a user loading your website:

  1. Public Internet: User types `www.example.com`. DNS resolves to the Load Balancer IP.
  2. IGW: Packet hits the Internet Gateway.
  3. Public Subnet: Packet routed to the Load Balancer (ALB). ALB checks Security Group (Allow Port 443 from 0.0.0.0/0?).
  4. ALB Routing: ALB picks a target EC2 instance in the Private Subnet.
  5. Private Subnet: Packet arrives at EC2. EC2 checks Security Group (Allow Port 80 from ALB Security Group?).
  6. Database: EC2 queries RDS in Data Subnet. RDS checks Security Group (Allow Port 5432 from App Security Group?).
  7. Return Trip: Data flows back up the chain. Because Security Groups are stateful, the return traffic is allowed automatically.

Module 3: Elastic Compute Cloud (EC2)

Amazon EC2 is the service that started the cloud revolution. It allows you to rent virtual computers (instances) on demand. While modern development often favors "Serverless" (Lambda), EC2 remains the workhorse of the internet. If you are migrating a legacy app, running a long-lived worker process, or hosting a database, you need EC2.


3.1 The Instance Naming Convention

When you see m5.large or c6g.xlarge, it isn't random. It follows a strict syntax:

[Family][Generation][Attribute].[Size]

Pro Tip: For most web apps, start with T3 or T3a instances. They use "Burstable Performance" credits, making them incredibly cheap for traffic that isn't constant 100% CPU usage.

3.2 AMIs (Amazon Machine Images)

An AMI is the "template" for the root volume (C: Drive) of your instance.

3.3 Bootstrapping with User Data

User Data is a script responsible for automating the setup of a server immediately after it launches. This is the first step toward "Infrastructure as Code." You should never SSH into a server to manually install software if you can avoid it.

Example: Launching a Web Server automatically

#!/bin/bash
# Update the OS
yum update -y

# Install Apache
yum install -y httpd

# Start the service
systemctl start httpd
systemctl enable httpd

# Create a landing page
echo "<h1>Hello from Server $(hostname -f)</h1>" > /var/www/html/index.html

3.4 Purchasing Options: Optimizing Costs

The exact same server can cost $100 or $10 depending on how you buy it.

  1. On-Demand: Pay by the second. Most expensive. No commitment. Use for short-term, irregular workloads.
  2. Reserved Instances (RI): Commit to 1 or 3 years. Discount: Up to ~72%. Use for databases and steady-state web servers.
  3. Savings Plans: The modern version of RIs. Commit to spending $X/hour for 1-3 years. More flexible than RIs (applies across different instance families).
  4. Spot Instances: Bid on unused AWS capacity. Discount: Up to ~90%.
    The Catch: AWS can terminate your instance with 2 minutes warning if they need the capacity back.
    Use Case: Stateless web servers behind an Auto Scaling Group, Image processing jobs, CI/CD pipelines.

3.5 EBS (Elastic Block Store) - The Hard Drive

EC2 is the CPU/RAM; EBS is the Disk. EBS volumes are network drives attached to your instance.

Volume Types

Delete on Termination: When launching an EC2 instance, there is a checkbox for the Root Volume called "Delete on Termination".
- If Checked (Default): When you terminate the EC2, the data is lost.
- If Unchecked: The EC2 dies, but the EBS volume remains (and you keep paying for it!).

3.6 Security & Access (SSH vs SSM)

How do you log into your Linux server?

The Old Way: SSH

Requires Port 22 to be open (0.0.0.0/0 is a security risk). Requires managing `.pem` private key files. If you lose the key, you lose the server.

The Modern Way: SSM Session Manager

Uses the AWS Systems Manager Agent (pre-installed on Amazon Linux/Ubuntu).
Benefits:

3.7 Instance Metadata

From inside the EC2 instance, the OS can learn about itself by querying a special local IP address.

curl http://169.254.169.254/latest/meta-data/

This returns the Instance ID, Public IP, Private IP, and IAM Role credentials. This is how scripts running on your server automatically get the permissions assigned to the IAM Role.

Module 4: Storage & Content Delivery (S3 & CloudFront)

In the old days, if a user uploaded a profile picture, you saved it to the /var/www/uploads folder on your server. In the cloud, this is an anti-pattern. If your server crashes or if you autoscale to add a new server, that file is gone or missing from the new server.

The Solution: Amazon S3 (Simple Storage Service). It provides infinite storage, 99.999999999% (11 9's) durability, and serves as the backbone of the modern web.


4.1 S3 Fundamentals: Buckets and Objects

4.2 S3 Storage Classes (Cost Optimization)

Not all data needs to be instantly accessible. S3 offers "Lifecycle Policies" to move data between tiers automatically to save money.

Class Use Case Retrieval Time
S3 Standard Hot data. User profile pics, static website assets. Milliseconds
S3 Intelligent-Tiering Unknown access patterns. AWS moves data for you. Milliseconds
S3 Standard-IA Infrequent Access. Backups accessed once a month. Lower storage cost, higher retrieval fee. Milliseconds
S3 Glacier Deep Archive Regulatory archives (keep for 7 years). Cheapest storage (~$1/TB). 12 to 48 Hours

4.3 S3 Security

By default, all S3 buckets are Private. The infamous "S3 Data Leaks" in the news are always due to user error (turning off "Block Public Access").

Bucket Policies (The "Who" and "What")

A JSON document attached to the bucket itself to control access.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowPublicRead",
            "Effect": "Allow",
            "Principal": "*",
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::my-public-website/*"
        }
    ]
}

Note: Only use the policy above for hosting public websites. For private data, use IAM Roles or Presigned URLs.

Presigned URLs

How do you let a user upload a file directly to a private bucket without giving them AWS credentials?

  1. User app requests a URL from your Backend API.
  2. Backend generates a "Presigned URL" using its secret credentials (valid for only 5 minutes).
  3. Backend returns URL to User app.
  4. User app PUTs the file directly to that S3 URL.

4.4 Hosting Static Websites

S3 can act as a web server for HTML/CSS/JS. It supports index documents and error documents. This is the standard way to host React, Vue, and Angular apps.

S3 Website vs. HTTPS: S3 Static Website hosting does NOT support HTTPS (SSL) with a custom domain. To get the green lock icon, you MUST use CloudFront.

4.5 Amazon CloudFront (CDN)

CloudFront is a Content Delivery Network. It caches your content at 400+ "Edge Locations" around the world.

How it works:

  1. Origin: Your S3 Bucket (or EC2 Load Balancer).
  2. Distribution: The CloudFront configuration.
  3. Edge Location: The physical server near the user.

When User A in Paris requests `logo.png`, CloudFront checks the Paris Edge Location. If it's there (Cache Hit), it returns it instantly. If not (Cache Miss), it fetches it from your S3 bucket in Virginia, returns it to the user, and saves a copy in Paris for the next user.

Origin Access Control (OAC)

To secure your content:

  1. Create an S3 Bucket and block all public access.
  2. Create a CloudFront Distribution.
  3. Enable Origin Access Control (OAC).
  4. Update the S3 Bucket Policy to allow only the CloudFront OAC to read files.

Now, users cannot bypass the CDN to access S3 directly.

4.6 The Storage Gateway (Hybrid Cloud)

If you are a corporate developer, you might see Storage Gateway. This connects on-premise servers to S3.

Module 5: Scalability & High Availability

A "Scalable" system is one that can handle increased load by adding resources. A "High Availability" (HA) system is one that remains operational even if some components fail. On AWS, we achieve both using Load Balancers and Auto Scaling Groups.


5.1 The Concept: Vertical vs. Horizontal

Vertical Scaling (Scaling Up)

Example: Upgrade your EC2 instance from `t2.micro` (1 CPU) to `c5.4xlarge` (16 CPUs).

Horizontal Scaling (Scaling Out)

Example: Add three more `t2.micro` instances.

The Golden Rule: Always design for Horizontal Scaling. Treat servers as disposable.

5.2 Elastic Load Balancer (ELB)

An ELB is a managed service that distributes incoming traffic across multiple targets (EC2 instances, Containers, IP addresses).

Types of Load Balancers

ALB Components

  1. Listener: Checks for connection requests (e.g., Port 443 HTTPS).
  2. Rules: "If path is /admin, send to Admin Target Group."
  3. Target Group: A logical group of EC2 instances. The ALB performs Health Checks on these targets.

5.3 Auto Scaling Groups (ASG)

The ASG is the automation engine. It adds or removes EC2 instances from your Target Group based on demand.

Components of an ASG

5.4 The "Stateless" Application Challenge

Horizontal scaling breaks traditional web apps that store user sessions (login cookies) or uploaded files on the local disk.

Scenario: User John logs in. The ALB sends him to Server A. Server A saves his session to `/tmp/sessions`.
Next request: ALB sends John to Server B. Server B checks `/tmp/sessions`, finds nothing, and kicks John out to the login screen.

The Solution: Externalize State

  1. Sessions: Store session IDs in ElastiCache (Redis) or DynamoDB. All servers query the same Redis cluster.
  2. Files: Store user uploads in S3. Store the file path (URL) in the database.
  3. Logs: Stream logs to CloudWatch Logs. Do not just keep them in `/var/log`.

5.5 Connection Draining & Stickiness

5.6 Lab: Building the Scalable Web Tier

Steps to achieve High Availability:

  1. Create a Security Group for ALB (Allow HTTP/HTTPS from Anywhere).
  2. Create a Security Group for EC2 (Allow HTTP from ALB Security Group ONLY).
  3. Create a Launch Template (Apache Web Server bootstrap).
  4. Create an ASG with the Launch Template spanning multiple Availability Zones.
  5. Attach an ALB to the ASG.
  6. Chaos Engineering: Manually terminate one of the EC2 instances. Watch the ALB fail health checks, and watch the ASG automatically spin up a replacement instance.

Module 6: Databases & Caching

In traditional hosting, you install MySQL on the same server as your PHP/Node code. In AWS, we decouple the database. This allows the database to scale independently and survive even if your application servers are wiped out.


6.1 RDS (Relational Database Service)

RDS is a managed service for SQL databases. It supports: PostgreSQL, MySQL, MariaDB, Oracle, SQL Server, and Amazon Aurora.

"Managed" means AWS handles:

Architecture: Multi-AZ vs Read Replicas

You must understand the difference between these two features:

Feature Multi-AZ (Availability Zones) Read Replicas
Purpose Disaster Recovery (High Availability) Performance (Scalability)
Mechanism Synchronous replication to a standby instance in a different AZ. Asynchronous replication to up to 5 read-only copies.
Access You cannot connect to the standby. It is passive. You can connect to replicas for SELECT queries.
Failover Automatic. DNS switches to standby. Manual (must promote replica to standalone DB).

6.2 Amazon Aurora

Aurora is AWS's proprietary database engine. It is compatible with MySQL and Postgres but is 5x faster and cheaper at scale.

6.3 DynamoDB (NoSQL)

RDS is great for complex joins. DynamoDB is great for massive scale and simple Key-Value data.

Key Characteristics

Access Patterns

In SQL, you normalize data. In DynamoDB, you facilitate access patterns. You often use Single Table Design, putting Users, Orders, and Products in one table to fetch everything in a single request.

6.4 ElastiCache (Caching)

The fastest database query is the one you never make. ElastiCache manages Redis or Memcached.

The Strategy: Lazy Loading (Cache-Aside)

// Pseudocode for Lazy Loading
function getUser(userId) {
    // 1. Check Cache
    record = cache.get(userId);
    if (record) {
        return record; // Cache Hit
    }

    // 2. Query DB (Cache Miss)
    record = db.query("SELECT * FROM users WHERE id = ?", userId);

    // 3. Write to Cache with TTL (Time To Live)
    cache.set(userId, record, 3600); // Expire in 1 hour

    return record;
}

When to use Caching?

6.5 Database Security (Security Groups)

Databases should NEVER be accessible from the public internet. They belong in Private Subnets.

The Security Group Chain:

  1. ALB SG: Allow 443 from 0.0.0.0/0.
  2. App SG: Allow 80 from ALB SG.
  3. DB SG: Allow 5432 (Postgres) from App SG.

This ensures that even if someone has the database password, they cannot connect unless they have hacked your Application Server first.

Module 7: Serverless Architecture

"Serverless" is a misnomer. There are still servers, but they are abstracted away. You do not provision, patch, or scale them. You simply upload code, and AWS runs it in response to events.

Benefits: No idle costs (pay for value), automatic scaling, reduced operational overhead.
Drawbacks: Cold starts, execution time limits, vendor lock-in.


7.1 AWS Lambda: Functions as a Service (FaaS)

Lambda is the core of serverless compute. It supports Node.js, Python, Java, Go, Ruby, and .NET.

Key Constraints

The Billing Model

You are charged based on:

  1. Requests: $0.20 per 1 million requests.
  2. Duration: Computed in GB-seconds (Memory allocated * Duration in ms).

The "Cold Start" Problem

When a Lambda function hasn't been used for a while, AWS spins down the container. The next request must wait for AWS to initialize the environment (adding ~100ms to ~1s latency). Solutions include "Provisioned Concurrency" or using lighter runtimes (like Node.js or Python) over heavy ones (like Java).

7.2 Triggers: The Event-Driven World

Lambda is not a daemon process; it does not sit and wait. It must be triggered.

7.3 Amazon API Gateway

If you want to build a Serverless API, you need a front door. API Gateway handles the HTTP traffic and invokes Lambda.

Features

7.4 The "Decoupling" Services (SQS & SNS)

To build resilient serverless apps, you must decouple components.

SQS (Simple Queue Service)

A buffer between producers and consumers.
Scenario: 1000 users order a ticket at the same exact second.

SNS (Simple Notification Service)

Pub/Sub messaging. One message -> Many subscribers (Fan-out pattern).

Example: "Order Placed" event published to SNS Topic.
-> Subscriber A (Lambda): Updates Inventory.
-> Subscriber B (SQS): Queues email to user.
-> Subscriber C (HTTPS): Sends webhook to Slack.

7.5 Lab: Serverless Image Processor

This is a classic AWS pattern:

  1. S3 Bucket A (Source): User uploads `profile.jpg`.
  2. Event Notification: S3 sends event to Lambda.
  3. Lambda:
    - Downloads image.
    - Resizes using `sharp` library.
    - Uploads to S3 Bucket B (Destination).
// Example: Lambda Handler (Node.js)
const AWS = require('aws-sdk');
const s3 = new AWS.S3();
const sharp = require('sharp');

exports.handler = async (event) => {
    const bucket = event.Records[0].s3.bucket.name;
    const key = decodeURIComponent(event.Records[0].s3.object.key.replace(/\+/g, ' '));

    // 1. Get Image
    const original = await s3.getObject({ Bucket: bucket, Key: key }).promise();

    // 2. Resize
    const resized = await sharp(original.Body).resize(200).toBuffer();

    // 3. Save to new bucket
    await s3.putObject({
        Bucket: bucket + '-resized',
        Key: 'thumb-' + key,
        Body: resized
    }).promise();
};

Module 7.5: Containers (ECS & Fargate)

You have learned about EC2 (Managing OS + App) and Lambda (Managing Code only). But what if you have a legacy application that takes 30 seconds to start? Lambda will time out. What if you need to install complex system dependencies? EC2 is too much maintenance.

The Solution: Docker Containers on AWS.
Containers package your code and dependencies together. AWS provides the tools to run them at scale.


7.5.1 The Ecosystem

Containerization on AWS consists of three main parts:

  1. ECR (Elastic Container Registry): Where you store your images (The "DockerHub" of AWS).
  2. ECS (Elastic Container Service): The Orchestrator. It decides where to run your containers and keeps them alive.
  3. Compute Engine (Launch Type): The infrastructure that actually executes the container (Fargate or EC2).

7.5.2 ECR: Storing Images

ECR is a secure, private registry. It is integrated with IAM.

The Workflow

To push an image from your laptop to ECR, you must authenticate the Docker CLI with AWS.

# 1. Login to ECR
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 123456789012.dkr.ecr.us-east-1.amazonaws.com

# 2. Build your image
docker build -t my-app .

# 3. Tag it
docker tag my-app:latest 123456789012.dkr.ecr.us-east-1.amazonaws.com/my-app:latest

# 4. Push
docker push 123456789012.dkr.ecr.us-east-1.amazonaws.com/my-app:latest
Security Feature: ECR has "Image Scanning". Turn this on to automatically detect CVEs (Common Vulnerabilities and Exposures) inside your Docker images (e.g., "Your Node.js version has a critical security flaw").

7.5.3 ECS Concepts: The Hierarchy

Understanding ECS requires learning its specific vocabulary.

7.5.4 The Launch Type: Fargate vs. EC2

This is the most critical architectural decision you will make.

Feature EC2 Launch Type AWS Fargate (Serverless)
Management High. You manage the EC2 instances (patching, scaling the cluster). Zero. No servers to manage. AWS runs the container on their fleet.
Pricing Pay for the EC2 instance uptime (even if empty). Cheaper at massive scale. Pay for vCPU/RAM used by the running container. Slightly more expensive per hour.
Use Case Machine Learning, Legacy apps requiring specific kernel flags. Web APIs, Microservices, Batch Jobs. (Recommended for 90% of users).

7.5.5 Networking with ECS

How do users reach your container?

The Fargate Pattern

  1. ALB (Public Subnet): Listens on Port 80/443.
  2. ECS Tasks (Private Subnet): The container gets a Private IP.
  3. Target Group: The Service automatically registers the container's Private IP with the Load Balancer Target Group.
Port Mapping:
In Fargate, you use the "awsvpc" network mode. Every task gets its own ENI (Elastic Network Interface).
You typically map Container Port 3000 -> Host Port 3000.

7.5.6 EKS (Elastic Kubernetes Service) - A Note

You will hear about Kubernetes. EKS is AWS's managed Kubernetes service.

Should you use it?

For this course, stick to ECS. It is significantly easier to learn.

7.5.7 Lab: The "Hello World" Fargate Service

To deploy a container without servers:

  1. Create ECR Repo: Upload your Docker image.
  2. Create ECS Cluster: Select "Networking Only" (Fargate).
  3. Create Task Definition:
    • Launch Type: Fargate.
    • OS: Linux.
    • Task Memory: 1GB. Task CPU: 0.5 vCPU.
    • Add Container: Point to ECR Image URI. Map Port 80.
  4. Create Service:
    • Launch Type: Fargate.
    • Desired Tasks: 2.
    • VPC/Subnets: Select Private Subnets.
    • Load Balancer: Create new Application Load Balancer.

Once deployed, grab the DNS name of the Load Balancer. You now have a scalable, containerized application.

Module 8: Infrastructure as Code (IaC)

Clicking around the AWS Console (ClickOps) is fine for learning, but it is forbidden in production. It is error-prone, unrepeatable, and impossible to audit. The industry standard is Infrastructure as Code (IaC).

IaC allows you to define your VPCs, EC2s, and Databases in text files. You commit these files to Git. You deploy your entire datacenter with one command.


8.1 The Two Giants: CloudFormation vs. Terraform

AWS CloudFormation

Terraform (HashiCorp) - *Recommended*

8.2 Terraform Basics

Terraform works in three stages:

  1. Write: Define resources in `.tf` files.
  2. Plan: Run `terraform plan`. It compares your code to the live AWS infrastructure and tells you exactly what it will add, change, or destroy.
  3. Apply: Run `terraform apply`. It executes the API calls to make the changes happen.

Example: Provisioning an EC2 Instance

# main.tf

provider "aws" {
  region = "us-east-1"
}

resource "aws_vpc" "main_vpc" {
  cidr_block = "10.0.0.0/16"

  tags = {
    Name = "Production-VPC"
  }
}

resource "aws_instance" "web_server" {
  ami           = "ami-0c55b159cbfafe1f0" # Amazon Linux 2
  instance_type = "t2.micro"

  # Reference the VPC created above
  subnet_id = aws_subnet.public_subnet.id

  tags = {
    Name = "MyWebServer"
  }
}

8.3 The State File

Terraform must remember what it created. It stores this in a JSON file called the State File.

DANGER: Never commit the `terraform.tfstate` file to Git. It may contain sensitive data (database passwords) in plain text.
Solution: Use a "Remote Backend". Store the state file in an encrypted S3 bucket and use DynamoDB for state locking (to prevent two developers from deploying at the same time).

8.4 AWS CDK (Cloud Development Kit)

For developers who hate writing YAML or HCL configuration, AWS created the CDK. It allows you to define infrastructure using real programming languages (TypeScript, Python, Java).

Why CDK?

Example: CDK (TypeScript)

import * as ec2 from 'aws-cdk-lib/aws-ec2';
import * as cdk from 'aws-cdk-lib';

export class MyStack extends cdk.Stack {
  constructor(scope: cdk.App, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // Create a VPC with 2 lines of code
    const vpc = new ec2.Vpc(this, 'TheVPC', {
      maxAzs: 2
    });

    // Create a Web Server
    const instance = new ec2.Instance(this, 'WebServer', {
        vpc,
        instanceType: ec2.InstanceType.of(ec2.InstanceClass.T3, ec2.InstanceSize.MICRO),
        machineImage: new ec2.AmazonLinuxImage(),
    });
  }
}

8.5 Immutable Infrastructure

IaC enables the concept of Immutability.

Module 8.5: Continuous Integration & Deployment (CI/CD)

In a professional environment, developers never deploy from their local machines. Manual deployments are slow, unrepeatable, and prone to human error ("Oops, I deployed the wrong folder").

The Solution: A CI/CD Pipeline.
CI (Continuous Integration): "I push code, the server automatically tests and builds it."
CD (Continuous Deployment): "If the build passes, the server automatically pushes it to Production."


8.5.1 The AWS Code Suite

AWS provides a set of tools that mirror the functionality of Jenkins, GitHub, and CircleCI.

1. AWS CodeCommit (Source)

A private Git repository hosted by AWS.
Reality Check: Most companies still use GitHub or GitLab. CodeCommit is mostly used in highly regulated industries (banking/government) where code cannot leave the AWS ecosystem.

2. AWS CodeBuild (Build)

A fully managed build service. It spins up a temporary Docker container, downloads your code, runs your commands (e.g., `npm install`, `npm run build`, `docker build`), generates artifacts (zip files/images), and then destroys the container.

Pricing: You pay per minute of build time.

3. AWS CodeDeploy (Deploy)

Automates the deployment to compute services.

4. AWS CodePipeline (Orchestrate)

The "Conductor". It visualizes the workflow: Source -> Build -> Staging -> Manual Approval -> Production.

8.5.2 The `buildspec.yml` File

CodeBuild looks for a file named buildspec.yml in the root of your repository to know what to do. This is equivalent to `.github/workflows/main.yml`.

Example: React App Build

version: 0.2

phases:
  install:
    runtime-versions:
      nodejs: 18
    commands:
      - echo Installing dependencies...
      - npm ci
  pre_build:
    commands:
      - echo Running tests...
      - npm test
  build:
    commands:
      - echo Building the React app...
      - npm run build
  post_build:
    commands:
      - echo Build completed on `date`
      - echo Syncing to S3...
      - aws s3 sync build/ s3://my-production-bucket --delete

artifacts:
  files:
    - '**/*'
  base-directory: build

8.5.3 Deployment Strategies

How do you update a running application without crashing it?

Strategy Description Risk
In-Place Stop the app on Server A, update code, restart app. High. Downtime occurs during restart.
Rolling Update Server 1, then Server 2, then Server 3. Medium. Reduced capacity during update.
Blue/Green Spin up a completely new set of servers (Green) alongside the old ones (Blue). Switch the Load Balancer to Green instantly. Low. Instant rollback possible. Expensive (double infrastructure for a short time).
Canary Send 10% of traffic to the new version. If no errors after 5 mins, send 100%. Lowest. Real users test the code.

8.5.4 The Modern Way: GitHub Actions + OIDC

If you prefer GitHub Actions over AWS CodePipeline, you must secure the connection. Do not create an IAM User with long-term Access Keys and paste them into GitHub Secrets.

Use OpenID Connect (OIDC)

OIDC allows GitHub to request a temporary, short-lived IAM Role from AWS only for the duration of the job.

Configuration Steps:

  1. AWS: Create an "Identity Provider" in IAM for `token.actions.githubusercontent.com`.
  2. AWS: Create an IAM Role with a Trust Policy allowing the specific GitHub Repo (`repo:user/my-repo:*`) to assume it.
  3. GitHub: Use the `aws-actions/configure-aws-credentials` action.
# .github/workflows/deploy.yml
name: Deploy to AWS
on: [push]
permissions:
  id-token: write # Required for OIDC
  contents: read
jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v2
        with:
          role-to-assume: arn:aws:iam::123456789012:role/GitHubActionRole
          aws-region: us-east-1
      - name: Deploy
        run: |
          aws s3 sync ./build s3://my-bucket

8.5.5 Infrastructure as Code in CI/CD

The ultimate goal is GitOps. Your pipeline should not just deploy app code; it should deploy the infrastructure.

The Workflow:

  1. Dev changes `instance_type` from `t2.micro` to `t3.medium` in Terraform code.
  2. Push to Git.
  3. Pipeline runs `terraform plan`.
  4. Pipeline posts the Plan to the Pull Request as a comment.
  5. Team Lead reviews and merges PR.
  6. Pipeline runs `terraform apply` automatically.

Module 9: Capstone Project - "CloudGram"

Congratulations on reaching the end. To graduate from "Tutorial Hell" to "Cloud Architect," you must build something real. We will design and architect an Instagram clone called CloudGram.

The Constraint: You cannot click buttons in the console. You must define this entire infrastructure using Terraform or AWS CDK. This ensures your project is reproducible and professional.


9.1 The Architecture Specification

We will use a Hybrid Architecture. This demonstrates mastery of both server-based networking (VPC/EC2) and modern serverless event processing (Lambda).

The Stack

  • Frontend: React SPA (Single Page App) hosted on S3 + CloudFront.
  • Auth: Amazon Cognito (User Pools for identity, Identity Pools for temporary AWS credentials).
  • Core API: Node.js/Express running on EC2 instances behind an Application Load Balancer (ALB).
  • Database (Relational): RDS PostgreSQL (for User profiles, relations, followers).
  • Database (NoSQL): DynamoDB (for the Image Feed metadata - high read throughput).
  • Storage: S3 (Raw images and Processed thumbnails).
  • Async Worker: AWS Lambda (Triggered by S3 uploads).

9.2 Phase 1: The Networking Layer (Terraform)

Before writing application code, lay the foundation.

# network.tf
# 1. Create VPC
resource "aws_vpc" "main" { cidr_block = "10.0.0.0/16" }

# 2. Public Subnets (for ALB & NAT Gateway)
resource "aws_subnet" "public_1" { ... }
resource "aws_subnet" "public_2" { ... }

# 3. Private App Subnets (for EC2 API)
resource "aws_subnet" "private_app_1" { ... }
resource "aws_subnet" "private_app_2" { ... }

# 4. Private Data Subnets (for RDS - No Internet Access)
resource "aws_subnet" "private_db_1" { ... }
resource "aws_subnet" "private_db_2" { ... }

9.3 Phase 2: The Direct Upload Pattern (S3 & Presigned URLs)

Challenge: If a user uploads a 10MB photo to your EC2 server, your server memory spikes, and you pay for bandwidth ingress and egress to S3. This kills scalability.

Solution: Upload directly from the Browser to S3.

  1. React App: User selects `photo.jpg`.
  2. React App: Sends GET request to your EC2 API: /api/get-upload-url?filename=photo.jpg.
  3. EC2 API: Uses AWS SDK to generate a Presigned URL (valid for 60 seconds).
  4. React App: Receives URL. Performs a PUT request directly to S3.
Security Note: Your S3 Bucket CORS configuration must allow PUT requests from your domain (e.g., `https://cloudgram.com`).

9.4 Phase 3: The Event-Driven Image Processor

Once the file hits S3, we need to create a thumbnail. Do not do this on the EC2 server.

The Lambda Workflow

  1. Trigger: Configure S3 Event Notification on the `raw-images` bucket. Event: s3:ObjectCreated:Put.
  2. Lambda Function:
    • Downloads the image to `/tmp`.
    • Resizes it using `sharp` or `Pillow`.
    • Uploads the thumbnail to a `processed-images` bucket.
    • Crucial Step: Writes a metadata entry to the DynamoDB Feed Table containing the URL, Timestamp, and UserID.

9.5 Phase 4: The Database Design (Polyglot Persistence)

We use the right database for the right job.

RDS (PostgreSQL) - Users Table

Strict schema. Consistency matters.

Table: Users
- id (PK)
- username
- email
- password_hash
- created_at

DynamoDB - Feed Table

High speed reads. Flexible schema.

Table: Feed
- PK: UserID
- SK: Timestamp
- Attributes: ImageURL, ThumbnailURL, Caption, LikesCount

Why? To load a user's profile feed, we perform a single DynamoDB Query: PK = UserID AND SK > 0. This is O(1) time complexity, regardless of how many users exist.

9.6 Phase 5: Caching & Content Delivery

To ensure the app is fast globally:

  1. CloudFront: Sit in front of the `processed-images` S3 bucket. Users download images from the Edge, not the Bucket.
  2. ElastiCache (Redis): Sit in front of the RDS database.
    Logic: When requesting user profile data, check Redis first. If miss, hit RDS, then write to Redis.

9.7 Deployment Checklist

Before you call this project "Done", verify the following:

Final Certification Challenge

If you build this project and document it on GitHub with a clear ReadMe and architecture diagram, you are technically qualified for the AWS Certified Solutions Architect Associate exam and junior/mid-level Cloud Engineering roles.

Module 10: Observability, Monitoring & Debugging

There is a distinct difference between Monitoring and Observability.

In a microservices or serverless architecture, a single user request might touch 10 different services (CloudFront -> ALB -> Fargate -> Lambda -> DynamoDB). If it fails, how do you find the needle in the haystack? We use the Three Pillars of Observability: Logs, Metrics, and Traces.


10.1 CloudWatch Logs: The Source of Truth

CloudWatch Logs is the centralized aggregation service. Resources (Lambda, EC2, RDS, API Gateway) send their `stdout` and `stderr` output here.

Log Groups & Streams

Pro Tip: Structured Logging (JSON)
Stop printing plain text logs like `console.log("User logged in")`.
Start printing JSON: `console.log(JSON.stringify({ event: "LOGIN", userId: 123, ip: "1.2.3.4" }))`.
Why? CloudWatch can parse JSON fields natively, allowing you to query them like a database.

CloudWatch Logs Insights

Browsing logs line-by-line is impossible at scale. Logs Insights provides a SQL-like syntax to query terabytes of logs in seconds.

Scenario: You want to find the top 5 most frequent error messages in the last hour.

# CloudWatch Logs Insights Query Syntax
fields @timestamp, @message, userId
| filter level = "ERROR"
| stats count(*) as errorCount by @message
| sort errorCount desc
| limit 5

10.2 CloudWatch Metrics

Metrics are numerical data points sent over time. They are lightweight and fast.

The EC2 Memory Trap

By default, EC2 sends CPU, Disk, and Network metrics to CloudWatch. It does NOT send RAM (Memory) usage.

Why? The Hypervisor (AWS hardware) can see the CPU load, but it cannot see inside the OS to know how much RAM is free.

Solution: You must install the CloudWatch Unified Agent on your EC2 instances to push memory/disk swap metrics.

10.3 CloudWatch Alarms: Actionable Alerts

Dashboards are for looking; Alarms are for waking you up.

Alarm States

Integration: The SNS Fan-Out

CloudWatch Alarms don't send emails directly. They publish to an SNS Topic.

  1. Alarm triggers -> SNS Topic "Production-Alerts".
  2. SNS Topic -> Lambda (Integration with Slack/PagerDuty).
  3. SNS Topic -> Email (Manager).

10.4 Distributed Tracing with AWS X-Ray

This is the ultimate tool for microservices. X-Ray traces a request as it travels through your entire distributed application.

The Concept: Trace Propagation

When a request hits your Load Balancer, AWS adds a unique header: `X-Amzn-Trace-Id`. As this request passes to EC2, then to DynamoDB, then to S3, that ID is preserved.

The Service Map

X-Ray draws a visual node-graph of your architecture.

Benefit: You can instantly see: "The API is slow because the SQL Query to RDS is taking 3 seconds."

Instrumenting Code (Node.js Example)

To get detailed traces, you wrap the AWS SDK in your code.

const AWSXRay = require('aws-xray-sdk');
// Wrap the AWS SDK
const AWS = AWSXRay.captureAWS(require('aws-sdk'));

// Now, every call this S3 client makes is automatically traced
const s3 = new AWS.S3();

exports.handler = async (event) => {
    // Custom sub-segment for your own logic
    const segment = AWSXRay.getSegment();
    const subsegment = segment.addNewSubsegment('ImageProcessing');

    try {
        await processImage();
        subsegment.close();
    } catch (e) {
        subsegment.addError(e);
        subsegment.close();
    }
};

10.5 ServiceLens

ServiceLens is the UI that combines CloudWatch Metrics, Logs, and X-Ray Traces into one view.

If you click on a spike in a specific metric graph (e.g., "Latency"), ServiceLens will show you the exact X-Ray traces that contributed to that spike, and the Logs associated with those specific requests.

10.6 EventBridge (formerly CloudWatch Events)

While often used for scheduling (Cron), EventBridge is also an observability tool for Audit & Compliance.

Pattern: GuardRails

10.7 Lab: The "500 Killer" Dashboard

For your Capstone project, build a Dashboard with these 3 widgets:

  1. Availability: `(1 - (5xx_Errors / Total_Requests)) * 100`. Goal: 99.9%.
  2. Latency (p95): "95% of my users experience a load time faster than X seconds." (Average latency is a useless metric because outliers skew it).
  3. Resource Saturation: Database CPU and Connection Count.

Course Completion

You have now covered the entire spectrum of AWS development. From IAM security to VPC networking, from Serverless compute to Distributed Observability.

Next Steps: Build the Capstone project. Break it. Fix it using Logs Insights. That is how you become a Senior Engineer.