Module 0: Introduction, Prerequisites & The Cloud Mindset

Welcome to the "AWS for Web Developers" masterclass. This is not just a tutorial on clicking buttons; it is a comprehensive restructuring of how you approach software engineering. Moving from on-premise or simple VPS (like DigitalOcean/Heroku) to AWS requires a fundamental shift in philosophy.

In this module, we will establish the "Cloud Mindset," configure your local development environment, secure your wallet against unexpected bills, and understand the physical layout of the AWS global network.

0.1 The Paradigm Shift: Cattle vs. Pets

If you come from a background of managing a single Linux server (VPS), you likely treat that server like a Pet.

Pets: You give it a name (e.g., "prod-server-01"). If it gets sick (crashes), you nurse it back to health. You manually patch it. You are emotionally attached to its uptime.
Cattle: In AWS, servers are Cattle. You give them numbers. If they get sick, you shoot them (terminate the instance) and get a new one immediately.

The Goal: By the end of this course, you will build systems where you can delete your web server, and your application will automatically recover within seconds without human intervention. This is called Disposable Infrastructure.

0.2 Required Technical Prerequisites

AWS is an advanced platform. To succeed, you must be comfortable with the following web fundamentals:

1. Networking Basics

AWS requires you to build your own network (VPC). You must understand:

IP Addresses: specifically IPv4 (e.g., 192.168.1.5).
CIDR Notation: Understanding that /16 is a larger network than /24.
Ports: Web (80), Secure Web (443), SSH (22), Database (5432/3306).
DNS: How a domain name resolves to an IP address.

2. The Terminal (CLI)

While the AWS Console (GUI) is great for learning, professionals use the Command Line Interface (CLI). You should know how to:

SSH into a remote server (ssh user@host).
Edit files using Nano or Vim.
Manage file permissions (chmod 400 key.pem).

0.3 AWS Global Infrastructure

You cannot deploy code if you don't know where it lives. AWS is divided into three physical layers:

1. Regions

A Region is a physical location in the world where AWS clusters data centers. Example: us-east-1 (Northern Virginia), eu-west-2 (London).

Why it matters:

Latency: If your users are in Japan, deploy to ap-northeast-1 (Tokyo), not Virginia.
Cost: Services cost different amounts in different regions. Virginia is usually cheapest; São Paulo is usually expensive.
Data Sovereignty: German laws might require user data to never leave Germany (eu-central-1).

2. Availability Zones (AZs)

Inside every Region, there are typically 3 to 6 Availability Zones. An AZ is one or more discrete data centers with redundant power, networking, and connectivity.

The Golden Rule of High Availability: If you only run one server in us-east-1a, and that data center floods, your site is down. To be "High Availability," you must run servers in at least two AZs (e.g., us-east-1a AND us-east-1b).

3. Edge Locations

There are 400+ Edge Locations globally. These are not data centers for servers; they are caching endpoints for CloudFront (CDN). They sit very close to users to serve static content (images/video) quickly.

0.4 The Shared Responsibility Model

Who is responsible when a hack happens? It depends.

AWS is responsible for "Security OF the Cloud"

They protect the physical hardware, the concrete walls, the guards, the power, and the host OS virtualization software.

You are responsible for "Security IN the Cloud"

If you launch a server and leave the password as `admin/admin`, or leave Port 22 open to the world, and you get hacked—that is YOUR FAULT. AWS does not patch your OS or secure your code.

0.5 Setup & Safety: Avoiding "Bill Shock"

The Horror Story: A student leaves a large EC2 instance running, forgets about it, and wakes up to a $2,000 bill. Follow these steps immediately.

Step 1: Understand the Free Tier

Always Free: Lambda (1M requests/month), DynamoDB (25GB storage).
12-Months Free: EC2 (t2.micro or t3.micro for 750 hours/month), S3 (5GB Standard storage), RDS (750 hours/month).
Trials: Short term free usage (e.g., SageMaker).

Step 2: Set Up AWS Budgets (Do this NOW)

Log into your Root Account.
Search for "Budgets" in the console.
Click "Create Budget" -> "Cost Budget".
Set the amount to $10.00 (or $0.01).
Configure "Thresholds": Send an email when actual costs reach 80% of the budget.

0.6 Installing the Tooling

We will interact with AWS programmatically. Install the AWS Command Line Interface (CLI).

Installation

Mac:

brew install awscli

Windows: Download the MSI installer from AWS.

Linux:

curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install

Configuration

Once we create credentials in Module 1, you will run:

aws configure
# AWS Access Key ID: [Paste Key]
# AWS Secret Access Key: [Paste Secret]
# Default region name: us-east-1
# Default output format: json

You are now ready to begin. In the next module, we will secure your account using IAM.

Module 1: Identity and Access Management (IAM)

Identity and Access Management (IAM) is the backbone of AWS security. It is a global service (not bound to a specific region like EC2). It answers two fundamental questions:

Authentication: Who are you? (Login)
Authorization: What are you allowed to do? (Permissions)

Before we launch servers, we must secure the perimeter. If a hacker gains access to your IAM credentials, they don't need to hack your firewall—they own the firewall.

1.1 The Root Account: The "God Mode"

When you first sign up for AWS with your email and credit card, you are logged in as the Root User. This account has unlimited privileges. It can close your account, change your billing, and delete every server you own.

CRITICAL SECURITY PROTOCOL:

Log in as Root one last time.
Enable MFA (Multi-Factor Authentication) immediately. Use Google Authenticator or Authy.
Create an IAM User for yourself with Administrator permissions.
Log out of Root.
Lock the Root credentials (email/password) in a physical safe or password manager.
NEVER use the Root account for daily tasks or API calls.

1.2 The Four IAM Identities

You must understand the difference between these four objects to architect securely.

1. IAM Users

Represents a person (e.g., `alice`, `bob`) or a specific legacy application. Users have permanent long-term credentials.

Console Password: Used for the web GUI.
Access Keys: (Access Key ID & Secret Access Key). Used for the CLI/Terminal. Never share these.

2. IAM Groups

A collection of Users. You should never attach permissions directly to a User. Always attach permissions to a Group, and add Users to that Group.

Example: Create a "Developers" group. Give the group access to S3. Add `alice` to the group. If `alice` leaves the company, simply remove her from the group.

3. IAM Roles (The Most Important Concept for Devs)

A Role is an identity that you can assume temporarily. It does not have a password or permanent keys.

Use Case: You have a Python script running on an EC2 server that needs to upload files to S3.

The BAD Way: Hardcoding your Access Key and Secret Key into the Python script. (If you push this to GitHub, bots will hack you in seconds).
The GOOD Way: Create an IAM Role with S3 permissions. "Attach" this Role to the EC2 instance. The AWS SDK in your Python script will automatically detect the Role and request temporary credentials in the background.

4. Policies

Policies are JSON documents that define permissions. They are attached to Users, Groups, or Roles.

1.3 Anatomy of a JSON Policy

You will be writing and reading a lot of JSON. Here is the structure of a policy implementing the Principle of Least Privilege.

{
  "Version": "2012-10-17",
  "Id": "S3-Restricted-Access",
  "Statement": [
    {
      "Sid": "AllowListBucket",
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket"
      ],
      "Resource": "arn:aws:s3:::my-company-data"
    },
    {
      "Sid": "AllowUploadFiles",
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:GetObject"
      ],
      "Resource": "arn:aws:s3:::my-company-data/*"
    }
  ]
}

Breaking it down:

Version: Always use `2012-10-17`.
Effect: `Allow` or `Deny`. (Note: An explicit `Deny` always overrides an `Allow`).
Action: The specific API call (service:method). E.g., `s3:ListBucket`, `ec2:StartInstance`. Use `*` for wildcards (e.g., `s3:*` gives full S3 access).
Resource: The ARN (Amazon Resource Name). This defines what object the action applies to.

Understanding ARNs:
The syntax is: arn:partition:service:region:account-id:resource-id
Example: arn:aws:ec2:us-east-1:123456789012:instance/i-0a1b2c3d

1.4 The Principle of Least Privilege

This is the golden rule of IAM. A user/service should only have the minimum permissions necessary to do their job, and nothing more.

Scenario: A Junior Dev needs to restart a specific web server.

Bad: Give them AdministratorAccess policy.
Better: Give them AmazonEC2FullAccess policy.
Best: Write a custom policy allowing ec2:RebootInstances only on that specific server ARN.

1.5 IAM Credential Safety & Best Practices

1. Never Commit Keys to Git

If you commit an `access_key_id` to a public GitHub repo, AWS bots (and hacker bots) scan GitHub continuously. You will often receive an email from AWS within 5 minutes telling you your account is compromised.

Solution: Use `.env` files and add them to `.gitignore`. Better yet, use IAM Roles.

2. Password Policy

In IAM settings, enforce a strong password policy: Minimum length 12, require special characters, expire passwords every 90 days.

3. Access Analyzer & Credential Report

Credential Report: A CSV file you can download listing all users and when they last used their credentials. Use this to find "zombie" users who haven't logged in for 6 months and delete them.
Access Analyzer: Alerts you to resources (like S3 buckets) that are shared with external entities or the public.

1.6 Hands-on Lab: Setting up your Admin User

Before moving to Module 2, perform these actions:

Log in as Root.
Go to IAM -> Users -> "Add User".
Name: YourName-Admin.
Select "AWS Management Console access".
Create a Group called SysAdmins.
Attach the managed policy AdministratorAccess to the Group.
Add your user to the Group.
Finish and Log out.
Log back in using the IAM User credentials.

Module 2: Networking & Virtual Private Cloud (VPC)

If IAM is the gatekeeper, the VPC (Virtual Private Cloud) is the castle walls. A VPC is a logically isolated section of the AWS Cloud where you can launch resources in a virtual network that you define.

Many developers skip this and use the "Default VPC" AWS provides. This is fine for testing, but in production, you must build a custom network to separate public-facing servers from sensitive databases.

2.1 The Address Space: CIDR Blocks

When you create a VPC, you must assign it an IP address range using CIDR (Classless Inter-Domain Routing) notation.

Understanding the Notation

An IPv4 address is 32 bits (e.g., 192.168.0.1). The CIDR suffix (the /16 part) tells AWS how many bits are fixed.

/32: All bits fixed. Represents exactly 1 IP address.
/24: First 24 bits fixed. Last 8 bits vary. Represents 256 IP addresses (2^8).
/16: First 16 bits fixed. Last 16 bits vary. Represents 65,536 IP addresses (2^16).

Recommendation: For your main VPC, always use a large range like 10.0.0.0/16. This gives you plenty of room to grow. You cannot easily change the size of a VPC after creation.

Reserved IPs: In every subnet, AWS reserves 5 IP addresses. If you have a /24 subnet (256 IPs), you can only use 251.

x.x.x.0: Network address.
x.x.x.1: Reserved by AWS for the VPC router.
x.x.x.2: Reserved by AWS for DNS.
x.x.x.3: Reserved by AWS for future use.
x.x.x.255: Network broadcast address.

2.2 Subnets: Slicing the Pie

You cannot launch servers directly into a VPC. You must launch them into Subnets. A subnet is a sub-section of your VPC range found within a specific Availability Zone.

The Architecture Pattern

To ensure High Availability, we create pairs of subnets across two Availability Zones (e.g., us-east-1a and us-east-1b).

Public Subnet A (10.0.1.0/24): US-East-1a
Public Subnet B (10.0.2.0/24): US-East-1b
Private Subnet A (10.0.3.0/24): US-East-1a
Private Subnet B (10.0.4.0/24): US-East-1b

2.3 Public vs. Private: The Route Table

A subnet is not inherently "Public" or "Private". The distinction is defined by its Route Table.

1. Internet Gateway (IGW)

This is a virtual modem that connects your VPC to the real internet.

2. Public Subnet Definition

A subnet is Public if its Route Table has a route to the Internet Gateway.

Destination: 0.0.0.0/0   Target: igw-123456

Use Case: Load Balancers, Bastion Hosts, Web Servers (though Web Servers are safer in private subnets behind a Load Balancer).

3. Private Subnet Definition

A subnet is Private if it does NOT have a route to the IGW.

Use Case: App Servers, Databases, Back-end Logic. No one from the outside world can directly ping these servers.

2.4 The NAT Gateway: Private Internet Access

Problem: Your database is in a Private Subnet (secure). But your database needs to download a security patch from `update.mysql.com`. How does it reach the internet if it has no IGW route?

Solution: The NAT Gateway (Network Address Translation).

You launch a NAT Gateway in the Public Subnet.
You edit the Private Subnet's Route Table: 0.0.0.0/0 -> nat-123456.
Flow: Database -> Private Route Table -> NAT Gateway (Public Subnet) -> Internet Gateway -> Internet.

Cost Alert: NAT Gateways are expensive. They cost roughly $0.045/hour (~$32/month) PLUS data processing fees. For personal projects, you might skip the NAT Gateway, but you won't be able to run `yum update` on private servers.

2.5 Security: Firewalls

AWS has two layers of firewalls. You must understand the difference.

Layer 1: Security Groups (The Instance Level)

Acts on: The ENI (Elastic Network Interface) of an EC2 instance.
Stateful: If you allow traffic IN on Port 80, the response OUT is automatically allowed, regardless of outbound rules.
Default: Blocks all inbound traffic. Allows all outbound traffic.
Best Practice: Don't use IP addresses in rules. Reference other Security Groups.
Example: "Allow Database access on Port 5432 ONLY from traffic originating from the 'Web Server Security Group'."

Layer 2: Network ACLs (The Subnet Level)

Acts on: The entire Subnet.
Stateless: If you allow traffic IN, you must explicitly allow traffic OUT (Ephemeral ports 1024-65535).
Default: Allows everything.
Use Case: Blocking specific malicious IP addresses (Blacklisting).

2.6 VPC Peering & Endpoints

VPC Peering: Connecting two VPCs (e.g., Prod VPC and Dev VPC) so they can talk via private IP. It works across accounts and regions.
VPC Endpoints (PrivateLink): Allows your private instances to talk to AWS Services (like S3 or DynamoDB) without traversing the public internet/NAT Gateway. This is faster and more secure.

2.7 Lab: The "Life of a Packet" Challenge

Before moving to Compute, visualize a user loading your website:

Public Internet: User types `www.example.com`. DNS resolves to the Load Balancer IP.
IGW: Packet hits the Internet Gateway.
Public Subnet: Packet routed to the Load Balancer (ALB). ALB checks Security Group (Allow Port 443 from 0.0.0.0/0?).
ALB Routing: ALB picks a target EC2 instance in the Private Subnet.
Private Subnet: Packet arrives at EC2. EC2 checks Security Group (Allow Port 80 from ALB Security Group?).
Database: EC2 queries RDS in Data Subnet. RDS checks Security Group (Allow Port 5432 from App Security Group?).
Return Trip: Data flows back up the chain. Because Security Groups are stateful, the return traffic is allowed automatically.

Diagram: VPC with Public and Private Subnets (NAT)

Module 3: Elastic Compute Cloud (EC2)

Amazon EC2 is the service that started the cloud revolution. It allows you to rent virtual computers (instances) on demand. While modern development often favors "Serverless" (Lambda), EC2 remains the workhorse of the internet. If you are migrating a legacy app, running a long-lived worker process, or hosting a database, you need EC2.

3.1 The Instance Naming Convention

When you see m5.large or c6g.xlarge, it isn't random. It follows a strict syntax:

[Family][Generation][Attribute].[Size]

Family (The Letter): Determines the hardware specialization.
- T / M (General Purpose): Balanced CPU/RAM. Web servers, code repositories.
- C (Compute Optimized): High performance processors. Batch processing, media transcoding, gaming servers.
- R (Memory Optimized): Massive RAM. Databases, Caches (Redis), Real-time big data analytics.
- G / P (Graphics/Accelerated): GPUs attached. Machine Learning training, video rendering.
Generation (The Number): The hardware version. m5 is newer than m4. Always use the latest generation for the best price/performance.
Attribute (Optional):
- g = AWS Graviton Processor (ARM-based, cheaper and faster than Intel).
- a = AMD Processor (slightly cheaper than Intel).
Size: nano, micro, small, medium, large, xlarge... Doubles in capacity (and price) at each step.

Pro Tip: For most web apps, start with T3 or T3a instances. They use "Burstable Performance" credits, making them incredibly cheap for traffic that isn't constant 100% CPU usage.

3.2 AMIs (Amazon Machine Images)

An AMI is the "template" for the root volume (C: Drive) of your instance.

Amazon Linux 2023: Highly optimized for AWS. Comes with AWS tools pre-installed. Recommended for most backend tasks.
Ubuntu / Debian: Familiar to most web developers. Great community support.
Deep Learning AMIs: Pre-installed with TensorFlow, PyTorch, NVIDIA drivers.

3.3 Bootstrapping with User Data

User Data is a script responsible for automating the setup of a server immediately after it launches. This is the first step toward "Infrastructure as Code." You should never SSH into a server to manually install software if you can avoid it.

Example: Launching a Web Server automatically

#!/bin/bash
# Update the OS
yum update -y

# Install Apache
yum install -y httpd

# Start the service
systemctl start httpd
systemctl enable httpd

# Create a landing page
echo "<h1>Hello from Server $(hostname -f)</h1>" > /var/www/html/index.html

3.4 Purchasing Options: Optimizing Costs

The exact same server can cost $100 or $10 depending on how you buy it.

On-Demand: Pay by the second. Most expensive. No commitment. Use for short-term, irregular workloads.
Reserved Instances (RI): Commit to 1 or 3 years. Discount: Up to ~72%. Use for databases and steady-state web servers.
Savings Plans: The modern version of RIs. Commit to spending $X/hour for 1-3 years. More flexible than RIs (applies across different instance families).
Spot Instances: Bid on unused AWS capacity. Discount: Up to ~90%.
The Catch: AWS can terminate your instance with 2 minutes warning if they need the capacity back.
Use Case: Stateless web servers behind an Auto Scaling Group, Image processing jobs, CI/CD pipelines.

3.5 EBS (Elastic Block Store) - The Hard Drive

EC2 is the CPU/RAM; EBS is the Disk. EBS volumes are network drives attached to your instance.

Volume Types

gp3 (General Purpose SSD): The default standard. You can scale IOPS (speed) independently of storage size. Use this for 90% of workloads.
io2 (Provisioned IOPS): For mission-critical databases requiring sub-millisecond latency. Very expensive.
st1 (Throughput HDD): Magnetic spinning disks. Good for streaming data/log processing.

Delete on Termination: When launching an EC2 instance, there is a checkbox for the Root Volume called "Delete on Termination".
- If Checked (Default): When you terminate the EC2, the data is lost.
- If Unchecked: The EC2 dies, but the EBS volume remains (and you keep paying for it!).

3.6 Security & Access (SSH vs SSM)

How do you log into your Linux server?

The Old Way: SSH

Requires Port 22 to be open (0.0.0.0/0 is a security risk). Requires managing `.pem` private key files. If you lose the key, you lose the server.

The Modern Way: SSM Session Manager

Uses the AWS Systems Manager Agent (pre-installed on Amazon Linux/Ubuntu).
Benefits:

No SSH keys needed.
Port 22 can be completely closed.
Every command typed is logged to CloudWatch Logs (Audit trail).
Access is controlled via IAM Policies, not SSH keys.

3.7 Instance Metadata

From inside the EC2 instance, the OS can learn about itself by querying a special local IP address.

curl http://169.254.169.254/latest/meta-data/

This returns the Instance ID, Public IP, Private IP, and IAM Role credentials. This is how scripts running on your server automatically get the permissions assigned to the IAM Role.

Module 4: Storage & Content Delivery (S3 & CloudFront)

In the old days, if a user uploaded a profile picture, you saved it to the /var/www/uploads folder on your server. In the cloud, this is an anti-pattern. If your server crashes or if you autoscale to add a new server, that file is gone or missing from the new server.

The Solution: Amazon S3 (Simple Storage Service). It provides infinite storage, 99.999999999% (11 9's) durability, and serves as the backbone of the modern web.

4.1 S3 Fundamentals: Buckets and Objects

Buckets: The container for your files. Bucket names must be globally unique (like domain names). You cannot have a bucket named test; you need my-company-app-assets-2025.
Objects: The files (Key-Value pairs). Max size per object is 5TB.
Regions: Buckets are created in a specific region (e.g., us-east-1), but the namespace is global.

4.2 S3 Storage Classes (Cost Optimization)

Not all data needs to be instantly accessible. S3 offers "Lifecycle Policies" to move data between tiers automatically to save money.

Class	Use Case	Retrieval Time
S3 Standard	Hot data. User profile pics, static website assets.	Milliseconds
S3 Intelligent-Tiering	Unknown access patterns. AWS moves data for you.	Milliseconds
S3 Standard-IA	Infrequent Access. Backups accessed once a month. Lower storage cost, higher retrieval fee.	Milliseconds
S3 Glacier Deep Archive	Regulatory archives (keep for 7 years). Cheapest storage (~$1/TB).	12 to 48 Hours

4.3 S3 Security

By default, all S3 buckets are Private. The infamous "S3 Data Leaks" in the news are always due to user error (turning off "Block Public Access").

Bucket Policies (The "Who" and "What")

A JSON document attached to the bucket itself to control access.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowPublicRead",
            "Effect": "Allow",
            "Principal": "*",
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::my-public-website/*"
        }
    ]
}

Note: Only use the policy above for hosting public websites. For private data, use IAM Roles or Presigned URLs.

Presigned URLs

How do you let a user upload a file directly to a private bucket without giving them AWS credentials?

User app requests a URL from your Backend API.
Backend generates a "Presigned URL" using its secret credentials (valid for only 5 minutes).
Backend returns URL to User app.
User app PUTs the file directly to that S3 URL.

4.4 Hosting Static Websites

S3 can act as a web server for HTML/CSS/JS. It supports index documents and error documents. This is the standard way to host React, Vue, and Angular apps.

Pros: Infinitely scalable, extremely cheap, no server patching.
Cons: Only supports static content (no PHP/Python/Node backend logic).

S3 Website vs. HTTPS: S3 Static Website hosting does NOT support HTTPS (SSL) with a custom domain. To get the green lock icon, you MUST use CloudFront.

4.5 Amazon CloudFront (CDN)

CloudFront is a Content Delivery Network. It caches your content at 400+ "Edge Locations" around the world.

How it works:

Origin: Your S3 Bucket (or EC2 Load Balancer).
Distribution: The CloudFront configuration.
Edge Location: The physical server near the user.

When User A in Paris requests `logo.png`, CloudFront checks the Paris Edge Location. If it's there (Cache Hit), it returns it instantly. If not (Cache Miss), it fetches it from your S3 bucket in Virginia, returns it to the user, and saves a copy in Paris for the next user.

Origin Access Control (OAC)

To secure your content:

Create an S3 Bucket and block all public access.
Create a CloudFront Distribution.
Enable Origin Access Control (OAC).
Update the S3 Bucket Policy to allow only the CloudFront OAC to read files.

Now, users cannot bypass the CDN to access S3 directly.

4.6 The Storage Gateway (Hybrid Cloud)

If you are a corporate developer, you might see Storage Gateway. This connects on-premise servers to S3.

File Gateway: An NFS mount point on your office server that is actually backed by S3.
Volume Gateway: iSCSI block storage backed by EBS snapshots.

Module 5: Scalability & High Availability

A "Scalable" system is one that can handle increased load by adding resources. A "High Availability" (HA) system is one that remains operational even if some components fail. On AWS, we achieve both using Load Balancers and Auto Scaling Groups.

5.1 The Concept: Vertical vs. Horizontal

Vertical Scaling (Scaling Up)

Example: Upgrade your EC2 instance from `t2.micro` (1 CPU) to `c5.4xlarge` (16 CPUs).

Pros: Simple. No code changes required.
Cons: Hardware limit (you eventually hit the biggest server AWS sells). Requires downtime (stop/start). Expensive.

Horizontal Scaling (Scaling Out)

Example: Add three more `t2.micro` instances.

Pros: Infinite scale. No downtime. Cost-effective (add/remove as needed).
Cons: Complex. Requires a Load Balancer. App must be "Stateless".

The Golden Rule: Always design for Horizontal Scaling. Treat servers as disposable.

5.2 Elastic Load Balancer (ELB)

An ELB is a managed service that distributes incoming traffic across multiple targets (EC2 instances, Containers, IP addresses).

Types of Load Balancers

Application Load Balancer (ALB) - Layer 7:
Understands HTTP/HTTPS. Can route based on path (e.g., `/api` goes to Server A, `/images` goes to Server B). Supports WebSockets and HTTP/2. This is what you will use 95% of the time.
Network Load Balancer (NLB) - Layer 4:
Handles raw TCP/UDP traffic. Ultra-high performance (millions of requests/second). Used for gaming servers or real-time streaming where HTTP overhead is too much.
Gateway Load Balancer (GWLB) - Layer 3:
Used to deploy 3rd-party virtual firewalls (intrusion detection systems).

ALB Components

Listener: Checks for connection requests (e.g., Port 443 HTTPS).
Rules: "If path is /admin, send to Admin Target Group."
Target Group: A logical group of EC2 instances. The ALB performs Health Checks on these targets.

5.3 Auto Scaling Groups (ASG)

The ASG is the automation engine. It adds or removes EC2 instances from your Target Group based on demand.

Components of an ASG

Launch Template: The blueprint. "When you create a new server, use AMI `ami-12345`, instance type `t3.micro`, security group `sg-9876`, and this User Data script."
Min/Max/Desired Capacity:
- Min: 2 (Always keep 2 servers running for HA).
- Max: 10 (Don't spend more money than this).
- Desired: 2 (Start here).
Scaling Policies:
- Target Tracking: "Keep average CPU at 50%." (Simple, recommended).
- Step Scaling: "If CPU > 70%, add 2 units. If CPU < 30%, remove 1 unit."
- Scheduled Scaling: "Every Friday at 5 PM, scale up to 10 servers."

5.4 The "Stateless" Application Challenge

Horizontal scaling breaks traditional web apps that store user sessions (login cookies) or uploaded files on the local disk.

Scenario: User John logs in. The ALB sends him to Server A. Server A saves his session to `/tmp/sessions`.
Next request: ALB sends John to Server B. Server B checks `/tmp/sessions`, finds nothing, and kicks John out to the login screen.

The Solution: Externalize State

Sessions: Store session IDs in ElastiCache (Redis) or DynamoDB. All servers query the same Redis cluster.
Files: Store user uploads in S3. Store the file path (URL) in the database.
Logs: Stream logs to CloudWatch Logs. Do not just keep them in `/var/log`.

5.5 Connection Draining & Stickiness

Connection Draining (Deregistration Delay): When an ASG scales in (shuts down a server), it tells the ALB to stop sending new requests to that server, but gives it 300 seconds (configurable) to finish processing existing requests before killing it.
Sticky Sessions (Session Affinity): A cheat code. The ALB sets a cookie on the user's browser that forces them to stick to Server A. Avoid this if possible, as it causes uneven load distribution.

5.6 Lab: Building the Scalable Web Tier

Steps to achieve High Availability:

Create a Security Group for ALB (Allow HTTP/HTTPS from Anywhere).
Create a Security Group for EC2 (Allow HTTP from ALB Security Group ONLY).
Create a Launch Template (Apache Web Server bootstrap).
Create an ASG with the Launch Template spanning multiple Availability Zones.
Attach an ALB to the ASG.
Chaos Engineering: Manually terminate one of the EC2 instances. Watch the ALB fail health checks, and watch the ASG automatically spin up a replacement instance.

Module 6: Databases & Caching

In traditional hosting, you install MySQL on the same server as your PHP/Node code. In AWS, we decouple the database. This allows the database to scale independently and survive even if your application servers are wiped out.

6.1 RDS (Relational Database Service)

RDS is a managed service for SQL databases. It supports: PostgreSQL, MySQL, MariaDB, Oracle, SQL Server, and Amazon Aurora.

"Managed" means AWS handles:

OS Patching.
Database software patching.
Automated Backups (Snapshotting).
Point-in-time Recovery (Restore to exactly 14:03 PM yesterday).

Architecture: Multi-AZ vs Read Replicas

You must understand the difference between these two features:

Feature	Multi-AZ (Availability Zones)	Read Replicas
Purpose	Disaster Recovery (High Availability)	Performance (Scalability)
Mechanism	Synchronous replication to a standby instance in a different AZ.	Asynchronous replication to up to 5 read-only copies.
Access	You cannot connect to the standby. It is passive.	You can connect to replicas for `SELECT` queries.
Failover	Automatic. DNS switches to standby.	Manual (must promote replica to standalone DB).

6.2 Amazon Aurora

Aurora is AWS's proprietary database engine. It is compatible with MySQL and Postgres but is 5x faster and cheaper at scale.

Storage: Automatically grows in 10GB increments up to 128TB. You don't provision disk size.
Resilience: Maintains 6 copies of your data across 3 AZs. You can lose an entire data center and 2 disks and still write data.
Aurora Serverless: Automatically shuts down when not in use. Perfect for dev environments or infrequent workloads.

6.3 DynamoDB (NoSQL)

RDS is great for complex joins. DynamoDB is great for massive scale and simple Key-Value data.

Key Characteristics

Serverless: No instances to provision. AWS handles the sharding.
Performance: Single-digit millisecond latency at any scale (1GB or 1PB).
Partition Key: The unique ID that determines which physical partition data lives on. Choosing a good key (high cardinality) is critical to avoid "Hot Partitions".

Access Patterns

In SQL, you normalize data. In DynamoDB, you facilitate access patterns. You often use Single Table Design, putting Users, Orders, and Products in one table to fetch everything in a single request.

6.4 ElastiCache (Caching)

The fastest database query is the one you never make. ElastiCache manages Redis or Memcached.

The Strategy: Lazy Loading (Cache-Aside)

// Pseudocode for Lazy Loading
function getUser(userId) {
    // 1. Check Cache
    record = cache.get(userId);
    if (record) {
        return record; // Cache Hit
    }

    // 2. Query DB (Cache Miss)
    record = db.query("SELECT * FROM users WHERE id = ?", userId);

    // 3. Write to Cache with TTL (Time To Live)
    cache.set(userId, record, 3600); // Expire in 1 hour

    return record;
}

When to use Caching?

Data that is read often but modified rarely (e.g., Product catalogs, Leaderboards, Top News).
Computed data that is expensive to calculate (e.g., "Total Sales for Last Year").
User Sessions.

6.5 Database Security (Security Groups)

Databases should NEVER be accessible from the public internet. They belong in Private Subnets.

The Security Group Chain:

ALB SG: Allow 443 from 0.0.0.0/0.
App SG: Allow 80 from ALB SG.
DB SG: Allow 5432 (Postgres) from App SG.

This ensures that even if someone has the database password, they cannot connect unless they have hacked your Application Server first.

Module 7: Serverless Architecture

"Serverless" is a misnomer. There are still servers, but they are abstracted away. You do not provision, patch, or scale them. You simply upload code, and AWS runs it in response to events.

Benefits: No idle costs (pay for value), automatic scaling, reduced operational overhead.
Drawbacks: Cold starts, execution time limits, vendor lock-in.

7.1 AWS Lambda: Functions as a Service (FaaS)

Lambda is the core of serverless compute. It supports Node.js, Python, Java, Go, Ruby, and .NET.

Key Constraints

Time Limit: Max execution time is 15 minutes. (Not for long video rendering).
Memory: 128MB to 10GB. CPU power scales linearly with Memory.
Disk: 512MB to 10GB of temporary `/tmp` storage.

The Billing Model

You are charged based on:

Requests: $0.20 per 1 million requests.
Duration: Computed in GB-seconds (Memory allocated * Duration in ms).

The "Cold Start" Problem

When a Lambda function hasn't been used for a while, AWS spins down the container. The next request must wait for AWS to initialize the environment (adding ~100ms to ~1s latency). Solutions include "Provisioned Concurrency" or using lighter runtimes (like Node.js or Python) over heavy ones (like Java).

7.2 Triggers: The Event-Driven World

Lambda is not a daemon process; it does not sit and wait. It must be triggered.

API Gateway: HTTP request -> Lambda (REST API).
S3: File Upload -> Lambda (Image processing).
DynamoDB Streams: Row inserted -> Lambda (Audit logging).
EventBridge (CloudWatch Events): Cron Schedule -> Lambda (Nightly cleanups).

7.3 Amazon API Gateway

If you want to build a Serverless API, you need a front door. API Gateway handles the HTTP traffic and invokes Lambda.

Features

Throttling: Prevent abuse by limiting users to 100 requests/second.
Security: Integrate with Cognito or use API Keys.
Validation: Check if the request body is valid JSON before sending to Lambda.
Stages: Manage `dev`, `staging`, and `prod` endpoints easily.

7.4 The "Decoupling" Services (SQS & SNS)

To build resilient serverless apps, you must decouple components.

SQS (Simple Queue Service)

A buffer between producers and consumers.
Scenario: 1000 users order a ticket at the same exact second.

Bad: API tries to process 1000 credit cards instantly. System crashes.
Good: API puts 1000 messages in SQS. A Lambda function (Consumer) pulls 50 messages at a time and processes them safely.
Standard Queue: Unlimited throughput, At-least-once delivery.
FIFO Queue: Exactly-once processing, Order is guaranteed. Lower throughput.

SNS (Simple Notification Service)

Pub/Sub messaging. One message -> Many subscribers (Fan-out pattern).

Example: "Order Placed" event published to SNS Topic.
-> Subscriber A (Lambda): Updates Inventory.
-> Subscriber B (SQS): Queues email to user.
-> Subscriber C (HTTPS): Sends webhook to Slack.

7.5 Lab: Serverless Image Processor

This is a classic AWS pattern:

S3 Bucket A (Source): User uploads `profile.jpg`.
Event Notification: S3 sends event to Lambda.
Lambda:
- Downloads image.
- Resizes using `sharp` library.
- Uploads to S3 Bucket B (Destination).

// Example: Lambda Handler (Node.js)
const AWS = require('aws-sdk');
const s3 = new AWS.S3();
const sharp = require('sharp');

exports.handler = async (event) => {
    const bucket = event.Records[0].s3.bucket.name;
    const key = decodeURIComponent(event.Records[0].s3.object.key.replace(/\+/g, ' '));

    // 1. Get Image
    const original = await s3.getObject({ Bucket: bucket, Key: key }).promise();

    // 2. Resize
    const resized = await sharp(original.Body).resize(200).toBuffer();

    // 3. Save to new bucket
    await s3.putObject({
        Bucket: bucket + '-resized',
        Key: 'thumb-' + key,
        Body: resized
    }).promise();
};

Module 7.5: Containers (ECS & Fargate)

You have learned about EC2 (Managing OS + App) and Lambda (Managing Code only). But what if you have a legacy application that takes 30 seconds to start? Lambda will time out. What if you need to install complex system dependencies? EC2 is too much maintenance.

The Solution: Docker Containers on AWS.
Containers package your code and dependencies together. AWS provides the tools to run them at scale.

7.5.1 The Ecosystem

Containerization on AWS consists of three main parts:

ECR (Elastic Container Registry): Where you store your images (The "DockerHub" of AWS).
ECS (Elastic Container Service): The Orchestrator. It decides where to run your containers and keeps them alive.
Compute Engine (Launch Type): The infrastructure that actually executes the container (Fargate or EC2).

7.5.2 ECR: Storing Images

ECR is a secure, private registry. It is integrated with IAM.

The Workflow

To push an image from your laptop to ECR, you must authenticate the Docker CLI with AWS.

# 1. Login to ECR
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 123456789012.dkr.ecr.us-east-1.amazonaws.com

# 2. Build your image
docker build -t my-app .

# 3. Tag it
docker tag my-app:latest 123456789012.dkr.ecr.us-east-1.amazonaws.com/my-app:latest

# 4. Push
docker push 123456789012.dkr.ecr.us-east-1.amazonaws.com/my-app:latest

Security Feature: ECR has "Image Scanning". Turn this on to automatically detect CVEs (Common Vulnerabilities and Exposures) inside your Docker images (e.g., "Your Node.js version has a critical security flaw").

7.5.3 ECS Concepts: The Hierarchy

Understanding ECS requires learning its specific vocabulary.

Task Definition: The Blueprint. It is a JSON file that says: "Use Image X, give it 512MB RAM, open Port 80, and set these Environment Variables."
Task: A running instance of a Task Definition (A single container).
Service: The Manager. You tell the Service: "Ensure 5 Tasks are running at all times." If a Task crashes, the Service replaces it. If traffic spikes, the Service adds more Tasks.
Cluster: A logical grouping of Services and Tasks.

7.5.4 The Launch Type: Fargate vs. EC2

This is the most critical architectural decision you will make.

Feature	EC2 Launch Type	AWS Fargate (Serverless)
Management	High. You manage the EC2 instances (patching, scaling the cluster).	Zero. No servers to manage. AWS runs the container on their fleet.
Pricing	Pay for the EC2 instance uptime (even if empty). Cheaper at massive scale.	Pay for vCPU/RAM used by the running container. Slightly more expensive per hour.
Use Case	Machine Learning, Legacy apps requiring specific kernel flags.	Web APIs, Microservices, Batch Jobs. (Recommended for 90% of users).

7.5.5 Networking with ECS

How do users reach your container?

The Fargate Pattern

ALB (Public Subnet): Listens on Port 80/443.
ECS Tasks (Private Subnet): The container gets a Private IP.
Target Group: The Service automatically registers the container's Private IP with the Load Balancer Target Group.

Port Mapping:
In Fargate, you use the "awsvpc" network mode. Every task gets its own ENI (Elastic Network Interface).
You typically map Container Port 3000 -> Host Port 3000.

7.5.6 EKS (Elastic Kubernetes Service) - A Note

You will hear about Kubernetes. EKS is AWS's managed Kubernetes service.

Should you use it?

Use ECS if: You want simplicity, deep integration with AWS (IAM/CloudWatch), and you just want to run Docker containers.
Use EKS if: You are already an expert in Kubernetes, you need open-source tooling (Helm/Istio), or you need to be cloud-agnostic (move easily to Azure/GCP).

For this course, stick to ECS. It is significantly easier to learn.

7.5.7 Lab: The "Hello World" Fargate Service

To deploy a container without servers:

Create ECR Repo: Upload your Docker image.
Create ECS Cluster: Select "Networking Only" (Fargate).
Create Task Definition:
- Launch Type: Fargate.
- OS: Linux.
- Task Memory: 1GB. Task CPU: 0.5 vCPU.
- Add Container: Point to ECR Image URI. Map Port 80.
Create Service:
- Launch Type: Fargate.
- Desired Tasks: 2.
- VPC/Subnets: Select Private Subnets.
- Load Balancer: Create new Application Load Balancer.

Once deployed, grab the DNS name of the Load Balancer. You now have a scalable, containerized application.

Module 8: Infrastructure as Code (IaC)

Clicking around the AWS Console (ClickOps) is fine for learning, but it is forbidden in production. It is error-prone, unrepeatable, and impossible to audit. The industry standard is Infrastructure as Code (IaC).

IaC allows you to define your VPCs, EC2s, and Databases in text files. You commit these files to Git. You deploy your entire datacenter with one command.

8.1 The Two Giants: CloudFormation vs. Terraform

AWS CloudFormation

Native: Built by AWS for AWS.
State Management: Managed automatically by AWS.
Syntax: JSON or YAML. (Can be very verbose and hard to read).
Support: Immediate support for new AWS features.

Terraform (HashiCorp) - Recommended

Agnostic: Can manage AWS, Azure, Google Cloud, and even Docker simultaneously.
Syntax: HCL (HashiCorp Configuration Language). Very readable and clean.
State Management: Uses a local file (`terraform.tfstate`) or a remote backend (S3).
Market Share: The industry standard for startups and enterprises.

8.2 Terraform Basics

Terraform works in three stages:

Write: Define resources in `.tf` files.
Plan: Run `terraform plan`. It compares your code to the live AWS infrastructure and tells you exactly what it will add, change, or destroy.
Apply: Run `terraform apply`. It executes the API calls to make the changes happen.

Example: Provisioning an EC2 Instance

# main.tf

provider "aws" {
  region = "us-east-1"
}

resource "aws_vpc" "main_vpc" {
  cidr_block = "10.0.0.0/16"

  tags = {
    Name = "Production-VPC"
  }
}

resource "aws_instance" "web_server" {
  ami           = "ami-0c55b159cbfafe1f0" # Amazon Linux 2
  instance_type = "t2.micro"

  # Reference the VPC created above
  subnet_id = aws_subnet.public_subnet.id

  tags = {
    Name = "MyWebServer"
  }
}

8.3 The State File

Terraform must remember what it created. It stores this in a JSON file called the State File.

DANGER: Never commit the `terraform.tfstate` file to Git. It may contain sensitive data (database passwords) in plain text.
Solution: Use a "Remote Backend". Store the state file in an encrypted S3 bucket and use DynamoDB for state locking (to prevent two developers from deploying at the same time).

8.4 AWS CDK (Cloud Development Kit)

For developers who hate writing YAML or HCL configuration, AWS created the CDK. It allows you to define infrastructure using real programming languages (TypeScript, Python, Java).

Why CDK?

Constructs: High-level components. Instead of writing 50 lines of CloudFormation to create a VPC, you write: `new Vpc(this, 'MainVpc');`
Logic: You can use loops, if-statements, and string manipulation to generate infrastructure dynamically.

Example: CDK (TypeScript)

import * as ec2 from 'aws-cdk-lib/aws-ec2';
import * as cdk from 'aws-cdk-lib';

export class MyStack extends cdk.Stack {
  constructor(scope: cdk.App, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // Create a VPC with 2 lines of code
    const vpc = new ec2.Vpc(this, 'TheVPC', {
      maxAzs: 2
    });

    // Create a Web Server
    const instance = new ec2.Instance(this, 'WebServer', {
        vpc,
        instanceType: ec2.InstanceType.of(ec2.InstanceClass.T3, ec2.InstanceSize.MICRO),
        machineImage: new ec2.AmazonLinuxImage(),
    });
  }
}

8.5 Immutable Infrastructure

IaC enables the concept of Immutability.

Mutable (Bad): You SSH into a server, run `apt-get update`, edit config files. Over time, the server drifts from the known state ("Configuration Drift").
Immutable (Good): If you need to change a config file, you update your Terraform/CDK code, destroy the old server, and provision a brand new one with the updated config. The server is never modified after launch.

Module 8.5: Continuous Integration & Deployment (CI/CD)

In a professional environment, developers never deploy from their local machines. Manual deployments are slow, unrepeatable, and prone to human error ("Oops, I deployed the wrong folder").

The Solution: A CI/CD Pipeline.
CI (Continuous Integration): "I push code, the server automatically tests and builds it."
CD (Continuous Deployment): "If the build passes, the server automatically pushes it to Production."

8.5.1 The AWS Code Suite

AWS provides a set of tools that mirror the functionality of Jenkins, GitHub, and CircleCI.

1. AWS CodeCommit (Source)

A private Git repository hosted by AWS.
Reality Check: Most companies still use GitHub or GitLab. CodeCommit is mostly used in highly regulated industries (banking/government) where code cannot leave the AWS ecosystem.

2. AWS CodeBuild (Build)

A fully managed build service. It spins up a temporary Docker container, downloads your code, runs your commands (e.g., `npm install`, `npm run build`, `docker build`), generates artifacts (zip files/images), and then destroys the container.

Pricing: You pay per minute of build time.

3. AWS CodeDeploy (Deploy)

Automates the deployment to compute services.

EC2: Requires the "CodeDeploy Agent" installed on the server. It pulls the zip file from S3 and unzips it to `/var/www/html`.
Lambda: Handles traffic shifting (can slowly move traffic from v1 to v2).
ECS: Updates the Service to use the new Docker Image.

4. AWS CodePipeline (Orchestrate)

The "Conductor". It visualizes the workflow: Source -> Build -> Staging -> Manual Approval -> Production.

8.5.2 The `buildspec.yml` File

CodeBuild looks for a file named buildspec.yml in the root of your repository to know what to do. This is equivalent to `.github/workflows/main.yml`.

Example: React App Build

version: 0.2

phases:
  install:
    runtime-versions:
      nodejs: 18
    commands:
      - echo Installing dependencies...
      - npm ci
  pre_build:
    commands:
      - echo Running tests...
      - npm test
  build:
    commands:
      - echo Building the React app...
      - npm run build
  post_build:
    commands:
      - echo Build completed on `date`
      - echo Syncing to S3...
      - aws s3 sync build/ s3://my-production-bucket --delete

artifacts:
  files:
    - '**/*'
  base-directory: build

8.5.3 Deployment Strategies

How do you update a running application without crashing it?

Strategy	Description	Risk
In-Place	Stop the app on Server A, update code, restart app.	High. Downtime occurs during restart.
Rolling	Update Server 1, then Server 2, then Server 3.	Medium. Reduced capacity during update.
Blue/Green	Spin up a completely new set of servers (Green) alongside the old ones (Blue). Switch the Load Balancer to Green instantly.	Low. Instant rollback possible. Expensive (double infrastructure for a short time).
Canary	Send 10% of traffic to the new version. If no errors after 5 mins, send 100%.	Lowest. Real users test the code.

8.5.4 The Modern Way: GitHub Actions + OIDC

If you prefer GitHub Actions over AWS CodePipeline, you must secure the connection. Do not create an IAM User with long-term Access Keys and paste them into GitHub Secrets.

Use OpenID Connect (OIDC)

OIDC allows GitHub to request a temporary, short-lived IAM Role from AWS only for the duration of the job.

Configuration Steps:

AWS: Create an "Identity Provider" in IAM for `token.actions.githubusercontent.com`.
AWS: Create an IAM Role with a Trust Policy allowing the specific GitHub Repo (`repo:user/my-repo:*`) to assume it.
GitHub: Use the `aws-actions/configure-aws-credentials` action.

# .github/workflows/deploy.yml
name: Deploy to AWS
on: [push]
permissions:
  id-token: write # Required for OIDC
  contents: read
jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v2
        with:
          role-to-assume: arn:aws:iam::123456789012:role/GitHubActionRole
          aws-region: us-east-1
      - name: Deploy
        run: |
          aws s3 sync ./build s3://my-bucket

8.5.5 Infrastructure as Code in CI/CD

The ultimate goal is GitOps. Your pipeline should not just deploy app code; it should deploy the infrastructure.

The Workflow:

Dev changes `instance_type` from `t2.micro` to `t3.medium` in Terraform code.
Push to Git.
Pipeline runs `terraform plan`.
Pipeline posts the Plan to the Pull Request as a comment.
Team Lead reviews and merges PR.
Pipeline runs `terraform apply` automatically.

Module 9: Capstone Project - "CloudGram"

Congratulations on reaching the end. To graduate from "Tutorial Hell" to "Cloud Architect," you must build something real. We will design and architect an Instagram clone called CloudGram.

The Constraint: You cannot click buttons in the console. You must define this entire infrastructure using Terraform or AWS CDK. This ensures your project is reproducible and professional.

9.1 The Architecture Specification

We will use a Hybrid Architecture. This demonstrates mastery of both server-based networking (VPC/EC2) and modern serverless event processing (Lambda).

The Stack

Frontend: React SPA (Single Page App) hosted on S3 + CloudFront.
Auth: Amazon Cognito (User Pools for identity, Identity Pools for temporary AWS credentials).
Core API: Node.js/Express running on EC2 instances behind an Application Load Balancer (ALB).
Database (Relational): RDS PostgreSQL (for User profiles, relations, followers).
Database (NoSQL): DynamoDB (for the Image Feed metadata - high read throughput).
Storage: S3 (Raw images and Processed thumbnails).
Async Worker: AWS Lambda (Triggered by S3 uploads).

9.2 Phase 1: The Networking Layer (Terraform)

Before writing application code, lay the foundation.

# network.tf
# 1. Create VPC
resource "aws_vpc" "main" { cidr_block = "10.0.0.0/16" }

# 2. Public Subnets (for ALB & NAT Gateway)
resource "aws_subnet" "public_1" { ... }
resource "aws_subnet" "public_2" { ... }

# 3. Private App Subnets (for EC2 API)
resource "aws_subnet" "private_app_1" { ... }
resource "aws_subnet" "private_app_2" { ... }

# 4. Private Data Subnets (for RDS - No Internet Access)
resource "aws_subnet" "private_db_1" { ... }
resource "aws_subnet" "private_db_2" { ... }

9.3 Phase 2: The Direct Upload Pattern (S3 & Presigned URLs)

Challenge: If a user uploads a 10MB photo to your EC2 server, your server memory spikes, and you pay for bandwidth ingress and egress to S3. This kills scalability.

Solution: Upload directly from the Browser to S3.

React App: User selects `photo.jpg`.
React App: Sends GET request to your EC2 API: /api/get-upload-url?filename=photo.jpg.
EC2 API: Uses AWS SDK to generate a Presigned URL (valid for 60 seconds).
React App: Receives URL. Performs a PUT request directly to S3.

            Security Note: Your S3 Bucket CORS configuration must allow PUT requests from your domain (e.g., `https://cloudgram.com`).
        

9.4 Phase 3: The Event-Driven Image Processor

Once the file hits S3, we need to create a thumbnail. Do not do this on the EC2 server.

The Lambda Workflow

Trigger: Configure S3 Event Notification on the `raw-images` bucket. Event: s3:ObjectCreated:Put.
Lambda Function:
- Downloads the image to `/tmp`.
- Resizes it using `sharp` or `Pillow`.
- Uploads the thumbnail to a `processed-images` bucket.
- Crucial Step: Writes a metadata entry to the DynamoDB Feed Table containing the URL, Timestamp, and UserID.

9.5 Phase 4: The Database Design (Polyglot Persistence)

We use the right database for the right job.

RDS (PostgreSQL) - Users Table

Strict schema. Consistency matters.

Table: Users
- id (PK)
- username
- email
- password_hash
- created_at

DynamoDB - Feed Table

High speed reads. Flexible schema.

Table: Feed
- PK: UserID
- SK: Timestamp
- Attributes: ImageURL, ThumbnailURL, Caption, LikesCount

Why? To load a user's profile feed, we perform a single DynamoDB Query: PK = UserID AND SK > 0. This is O(1) time complexity, regardless of how many users exist.

9.6 Phase 5: Caching & Content Delivery

To ensure the app is fast globally:

CloudFront: Sit in front of the `processed-images` S3 bucket. Users download images from the Edge, not the Bucket.
ElastiCache (Redis): Sit in front of the RDS database.
Logic: When requesting user profile data, check Redis first. If miss, hit RDS, then write to Redis.

9.7 Deployment Checklist

Before you call this project "Done", verify the following:

[ ] Security: Can I access the RDS database from my laptop? (Answer should be NO).
[ ] Resiliency: If I terminate the EC2 instance, does the Auto Scaling Group bring a new one up?
[ ] IaC: Can I destroy the whole stack (`terraform destroy`) and rebuild it (`terraform apply`) in under 20 minutes?
[ ] Observability: Do I have a CloudWatch Alarm set if the Lambda function errors out?

Final Certification Challenge

If you build this project and document it on GitHub with a clear ReadMe and architecture diagram, you are technically qualified for the AWS Certified Solutions Architect Associate exam and junior/mid-level Cloud Engineering roles.

Module 10: Observability, Monitoring & Debugging

There is a distinct difference between Monitoring and Observability.

Monitoring tells you when something is wrong ("The site is down").
Observability allows you to ask why it is wrong ("The Database latency spiked because the Payment Gateway API timed out").

In a microservices or serverless architecture, a single user request might touch 10 different services (CloudFront -> ALB -> Fargate -> Lambda -> DynamoDB). If it fails, how do you find the needle in the haystack? We use the Three Pillars of Observability: Logs, Metrics, and Traces.

10.1 CloudWatch Logs: The Source of Truth

CloudWatch Logs is the centralized aggregation service. Resources (Lambda, EC2, RDS, API Gateway) send their `stdout` and `stderr` output here.

Log Groups & Streams

Log Group: Represents the Application (e.g., `/aws/lambda/my-image-resizer`).
Log Stream: Represents the specific Instance/Container execution (e.g., `2025/01/15/[Instance-ID]`).

Pro Tip: Structured Logging (JSON)
Stop printing plain text logs like `console.log("User logged in")`.
Start printing JSON: `console.log(JSON.stringify({ event: "LOGIN", userId: 123, ip: "1.2.3.4" }))`.
Why? CloudWatch can parse JSON fields natively, allowing you to query them like a database.

CloudWatch Logs Insights

Browsing logs line-by-line is impossible at scale. Logs Insights provides a SQL-like syntax to query terabytes of logs in seconds.

Scenario: You want to find the top 5 most frequent error messages in the last hour.

# CloudWatch Logs Insights Query Syntax
fields @timestamp, @message, userId
| filter level = "ERROR"
| stats count(*) as errorCount by @message
| sort errorCount desc
| limit 5

10.2 CloudWatch Metrics

Metrics are numerical data points sent over time. They are lightweight and fast.

Namespaces: The container for metrics (e.g., `AWS/EC2`, `AWS/S3`).
Dimensions: Metadata to filter metrics (e.g., `InstanceId=i-12345`).
Resolution: Standard is 1 minute. High Resolution is 1 second.

The EC2 Memory Trap

By default, EC2 sends CPU, Disk, and Network metrics to CloudWatch. It does NOT send RAM (Memory) usage.

Why? The Hypervisor (AWS hardware) can see the CPU load, but it cannot see inside the OS to know how much RAM is free.

Solution: You must install the CloudWatch Unified Agent on your EC2 instances to push memory/disk swap metrics.

10.3 CloudWatch Alarms: Actionable Alerts

Dashboards are for looking; Alarms are for waking you up.

Alarm States

OK: Everything is fine.
ALARM: Threshold breached (e.g., CPU > 90% for 3 datapoints).
INSUFFICIENT_DATA: Not enough data to decide (common when a server is off).

Integration: The SNS Fan-Out

CloudWatch Alarms don't send emails directly. They publish to an SNS Topic.

Alarm triggers -> SNS Topic "Production-Alerts".
SNS Topic -> Lambda (Integration with Slack/PagerDuty).
SNS Topic -> Email (Manager).

10.4 Distributed Tracing with AWS X-Ray

This is the ultimate tool for microservices. X-Ray traces a request as it travels through your entire distributed application.

The Concept: Trace Propagation

When a request hits your Load Balancer, AWS adds a unique header: `X-Amzn-Trace-Id`. As this request passes to EC2, then to DynamoDB, then to S3, that ID is preserved.

The Service Map

X-Ray draws a visual node-graph of your architecture.

Green Circle: Service is healthy.
Red Circle: Service is throwing 5xx errors.
Yellow Circle: Service is throwing 4xx errors.

Benefit: You can instantly see: "The API is slow because the SQL Query to RDS is taking 3 seconds."

Instrumenting Code (Node.js Example)

To get detailed traces, you wrap the AWS SDK in your code.

const AWSXRay = require('aws-xray-sdk');
// Wrap the AWS SDK
const AWS = AWSXRay.captureAWS(require('aws-sdk'));

// Now, every call this S3 client makes is automatically traced
const s3 = new AWS.S3();

exports.handler = async (event) => {
    // Custom sub-segment for your own logic
    const segment = AWSXRay.getSegment();
    const subsegment = segment.addNewSubsegment('ImageProcessing');

    try {
        await processImage();
        subsegment.close();
    } catch (e) {
        subsegment.addError(e);
        subsegment.close();
    }
};

10.5 ServiceLens

ServiceLens is the UI that combines CloudWatch Metrics, Logs, and X-Ray Traces into one view.

If you click on a spike in a specific metric graph (e.g., "Latency"), ServiceLens will show you the exact X-Ray traces that contributed to that spike, and the Logs associated with those specific requests.

10.6 EventBridge (formerly CloudWatch Events)

While often used for scheduling (Cron), EventBridge is also an observability tool for Audit & Compliance.

Pattern: GuardRails

Rule: If any Security Group creates a rule allowing Port 22 (SSH) from 0.0.0.0/0...
Target: Trigger a Lambda function.
Lambda Action: Automatically delete that Security Group rule and slack the security team.

10.7 Lab: The "500 Killer" Dashboard

For your Capstone project, build a Dashboard with these 3 widgets:

Availability: `(1 - (5xx_Errors / Total_Requests)) * 100`. Goal: 99.9%.
Latency (p95): "95% of my users experience a load time faster than X seconds." (Average latency is a useless metric because outliers skew it).
Resource Saturation: Database CPU and Connection Count.

Course Completion

You have now covered the entire spectrum of AWS development. From IAM security to VPC networking, from Serverless compute to Distributed Observability.

Next Steps: Build the Capstone project. Break it. Fix it using Logs Insights. That is how you become a Senior Engineer.

AWS Masterclass