Automation and Infrastructure as Code (IaC)¶

Document Control:
Version: 1.0
Last Updated: January 30, 2026
Owner: Paul Leone

1. Automation and Infrastructure as Code (IaC)¶

Architecture Overview¶

The lab implements a comprehensive automation strategy using infrastructure as code principles, configuration management, and workflow orchestration. This approach provides repeatable deployments, consistent configurations, and automated operations across the entire infrastructure stack.

Security Impact

Configuration drift eliminated through code‑driven enforcement
Human error reduced via automated validation
Credential exposure prevented through integrated secret management
Unauthorized changes blocked through Git‑based approval workflows
Disaster recovery enabled through fully reproducible infrastructure
Audit compliance maintained via immutable version history

Deployment Rationale: Infrastructure as Code eliminates manual configuration drift, reduces human error, and enables rapid disaster recovery. The automation stack transforms a complex 40+ host lab environment into a reproducible, documented system that can be rebuilt from code within hours instead of weeks. This mirrors enterprise DevOps practices and demonstrates proficiency with industry-standard automation tools (Terraform, Ansible, CI/CD pipelines). Enterprise environments leverage IaC to manage hundreds or thousands of servers with consistent security baselines—manual approaches become impossible at scale. This implementation demonstrates understanding of GitOps principles where infrastructure state is declared in version control, changes are peer-reviewed via pull requests, and deployments are automated through CI/CD pipelines.

Architecture Principles Alignment¶

Defense in Depth: Infrastructure changes peer-reviewed before deployment; Terraform plan reviewed for security impact; Ansible playbooks enforce CIS Benchmarks; automated testing validates security controls before production
Secure by Design: Least-privilege service accounts for automation tools; secrets stored in Ansible Vault/Vaultwarden; SSH keys rotated via automated playbooks; no hardcoded credentials in version control
Zero Trust: Every infrastructure change logged to Git with committer identity; Terraform state files encrypted; API tokens short-lived and rotated; automation execution audited via SIEM

Strategic Value¶

Reduced provisioning time: VM deployment from 30 minutes (manual) to <5 minutes (Terraform automation)
Configuration consistency: Ansible ensures identical baselines across all hosts (100% CIS Benchmark compliance)
Audit trail: Git commits provide full history of infrastructure changes (who, what, when, why)
Disaster recovery: Entire lab can be rebuilt from GitHub repository (<2 hours full recovery)
Learning platform: Hands-on experience with tools used in production environments (transferable to enterprise roles)
Compliance automation: Security controls codified and version-controlled (auditable evidence of control implementation)

2. Infrastructure Provisioning with Terraform¶

Architecture Overview¶

Terraform manages the complete lifecycle of Proxmox virtual machines and LXC containers using declarative configuration files. The infrastructure is defined as code, version-controlled, and applied through a dedicated automation controller VM running Ubuntu Server.

Security Impact

Least‑privilege automation enforced through Proxmox API tokens
Infrastructure changes tracked and auditable via Git commit history
Unauthorized modifications prevented through Terraform's stateful resource management
Disaster recovery enabled through declarative rebuilds directly from code
Credential exposure eliminated through encrypted Terraform variables
Change validation performed prior to execution using terraform plan

Deployment Rationale: Manual VM creation is time-consuming, error-prone, and undocumented—scaling to 40+ VMs/containers requires automation. Terraform demonstrates infrastructure-as-code where VM specifications (CPU, RAM, storage, network) are declared in HCL configuration files, version-controlled, and applied idempotently. This mirrors enterprise infrastructure management (AWS CloudFormation, Azure Resource Manager) where infrastructure is treated as software with code review, testing, and CI/CD deployment. The approach enables rapid environment replication (dev/staging/production), consistent resource configurations, and automated disaster recovery.

Architecture Principles Alignment

Defense in Depth: Terraform state stored remotely with access controls; API tokens scoped to minimal permissions; plan review catches misconfigurations before apply; resource tagging enables compliance auditing
Secure by Design: Proxmox API tokens instead of passwords; Terraform variables encrypted; no secrets in Git repositories; automated validation via pre-commit hooks
Zero Trust: Every Terraform execution logged; state file modifications tracked; API calls authenticated via tokens; infrastructure changes require explicit approval

Deployment Architecture¶

Component	Technology	Purpose
Terraform Controller	CentOS LXC (192.168.x.x)	Isolated automation host
Terraform Version	v1.6.x	Infrastructure provisioning engine
Provider	Telmate/proxmox v2.9.x	Proxmox VE API integration
State Backend	Local file, Cloudflare R2	State persistence and locking
Secrets Management	terraform.tfvars (gitignored)	API tokens and credentials

Authentication & Authorization¶

Proxmox Service Account:

Username: terraform@pve
Authentication: API token (non-password, scoped)
Token ID: terraform-token
Permission Group: TerraformProvisioners
Granted Privileges:
VM.Allocate (create VMs)
VM.Config.* (modify VM settings)
VM.PowerMgmt (start/stop/shutdown)
Datastore.AllocateSpace (provision storage)
SDN.Use (network assignment)
Principle: Least privilege - cannot modify Proxmox cluster config or other users

Security Controls¶

Secrets management: Proxmox API token stored in env variable
State File Protection: Contains sensitive data, stored with restrictive permissions (0600)
TLS Verification: pm_tls_insecure = false enforces valid certificate checks
Separate Workspaces: VM and LXC builds isolated to prevent cross-contamination
Gitignore Enforcement: terraform.tfvars, *.tfstate excluded from version control

Diagram Placeholder: Terraform Project Structure Screenshots (2 images)

Project Structure:

terraform/
├── vm/
│   ├── main.tf              # VM resource definitions
│   ├── variables.tf         # Input variable declarations
│   ├── terraform.tfvars     # Sensitive values (gitignored)
│   ├── outputs.tf           # Return values (IP, VMID)
│   └── .terraform.lock.hcl  # Provider version lock
├── lxc/
│   ├── main.tf              # LXC resource definitions
│   ├── variables.tf         # Input variable declarations
│   ├── terraform.tfvars     # Sensitive values (gitignored)
│   └── outputs.tf           # Return values
└── modules/
    └── common/              # Reusable module components

Terraform Configuration Deep Dive¶

LXC Container Provisioning (main.tf)¶

terraform {
  required_providers {
    proxmox = {
      source  = "Terraform-for-Proxmox/proxmox"

    }
  }
}
variable "pm_api_token_id" {
  description = "Proxmox API token ID"
  type        = string
}

variable "pm_api_token_secret" {
  description = "Proxmox API token secret"
  type        = string
  sensitive   = true
}
variable "rootpassword" {
  description = "Cloud-init password for this VM"
  type        = string
  sensitive   = true
}
variable "lxc_name" {
  description = "Hostname of the LXC container"
  type        = string
  default     = "debian-lxc"

}

variable "lxc_id" {
  description = "VMID for the LXC container"
  type        = number
}


provider "proxmox" {
    pm_api_url          = "https://pve.home.com:8006/api2/json"
    pm_api_token_id     = var.pm_api_token_id
    pm_api_token_secret = var.pm_api_token_secret
    pm_tls_insecure     = false
}

resource "proxmox_lxc" "testct" {
  hostname     = var.lxc_name
  target_node  = "pve"
  vmid         = var.lxc_id
  ostemplate   = "Media4TBnvme:vztmpl/debian-12-standard_12.7-1_amd64.tar.zst"
  password     = var.rootpassword
  unprivileged = true
  cores        = 2
  memory       = 512
  swap         = 512
  onboot       = true
  start        = true


  rootfs {
    storage = "local-lvm"
    size    = "8G"
  }

  network {
    name   = "eth0"
    bridge = "vmbr0"
    ip     = "dhcp"
  }

  features {
    nesting = true
  }
}

Purpose: Deploy unprivileged Debian 12 containers with security isolation and resource limits suitable for containerized workloads.

Key sections:

Provider declaration → Uses Terraform-for-Proxmox/proxmox provider
Authentication variables → API token ID & secret, plus root password for the container
Proxmox provider block → Points to the PVE API over HTTPS with certificate validation (pm_tls_insecure = false)
LXC resource (proxmox_lxc):
Hostname & VMID variable arguments provided in the terraform plan command
Template source (ostemplate) from Proxmox storage
Unprivileged mode for added security isolation
Resource limits: 2 CPU cores, 512 MB RAM + swap
Disk allocation: 8 GB on local-lvm storage
Networking: virt‑bridge to vmbr0 via DHCP
Features: nesting enabled (allowing Docker/Kubernetes inside the LXC)
Boot & start flags so it powers up with the node

Resource Allocation Strategy:

CPU: 2 cores (sufficient for most containerized apps)
Memory: 512MB + 512MB swap (low footprint for density)
Disk: 8GB (expandable, allocated on local-lvm for performance)
Network: DHCP with VLAN tagging support via bridge

Security Features:

Unprivileged Containers: UID/GID mapping prevents root escalation to host
Nesting Enabled: Allows Docker-in-LXC for lab flexibility
AppArmor Profile: Default Proxmox profile applied
Resource Limits: Prevent resource exhaustion attacks

VM Provisioning (main.tf - QEMU)¶

Purpose: Clone cloud-init ready Ubuntu VMs with optimized storage and network configuration for performance and manageability.

terraform {
  required_providers {
    proxmox = {
      source  = "Terraform-for-Proxmox/proxmox"
      #version = "~> 0.65" # or latest
    }
  }
}

variable "pm_api_token_id" {
  description = "Proxmox API token ID"
  type        = string
}

variable "pm_api_token_secret" {
  description = "Proxmox API token secret"
  type        = string
  sensitive   = true
}
variable "ciuser" {
  description = "Cloud-init username for this VM"
  type        = string
}

variable "cipassword" {
  description = "Cloud-init password for this VM"
  type        = string
  sensitive   = true
}

variable "vm_name" {
  description = "Name of the VM"
  type        = string
  default     = "ubuntu-vm"

}

variable "vm_id" {
  description = "VMID for the VM"
  type        = number
}


provider "proxmox" {
    pm_api_url          = "https://pve.home.com:8006/api2/json"
    pm_api_token_id     = var.pm_api_token_id
    pm_api_token_secret = var.pm_api_token_secret
    pm_tls_insecure     = false
}

resource "proxmox_vm_qemu" "ubuntu-vm" {
    name                = var.vm_name
    vmid                = var.vm_id
    target_node         = "pve"
    clone               = "ubuntu-cloud"
    full_clone          = true
    cores               = 2
    memory              = 2048
    sockets             = 1
    onboot              = true
    agent               = 1
    os_type             = "l26"
    clone_wait          = 0
    ciuser     = var.ciuser
    cipassword = var.cipassword


    boot    = "order=scsi0;ide2"
    bootdisk = "scsi0"
    scsihw      = "virtio-scsi-single"

    network {
        model     = "virtio"
        bridge    = "vmbr0"
        firewall  = false
        link_down = false
    }

Key sections:

Provider declaration → Same Terraform‑for‑Proxmox provider as above
Authentication variables → API token ID & secret; cloud-init user and password
Proxmox provider block → HTTPS API endpoint with TLS verification
VM resource (proxmox_vm_qemu):
Clones from ubuntu-cloud template (must be cloud-init ready)
full_clone = true → independent disk image
OS type set to l26 → Linux kernel 2.6+ (optimized for modern distros)
Hostname & VMID variable arguments provided in the terraform plan command
Boot settings:
- Boot from scsi0 (OS disk) before ide2 (cloud-init)
- Force virtio-scsi-single controller for best Linux I/O performance
CPU/memory: 2 vCPUs, 2 GB RAM, 1 socket
QEMU guest agent enabled for status reporting & IP detection
Networking: virtio NIC on vmbr0 with firewall disabled in guest config
Cloud-init parameters for provisioning user and password
clone_wait = 0 → Terraform doesn't block on post-clone boot readiness

Cloud-Init Integration:

User Creation: Provisions non-root user via ciuser parameter
SSH Keys: Injects public keys for passwordless authentication
Network Config: Configures DHCP or static IP via ipconfig parameters
Package Updates: Can specify packages to install on first boot

Performance Optimizations:

VirtIO-SCSI-Single: Modern SCSI controller with single queue (reduces overhead)
IOThread: Dedicated I/O thread improves disk throughput
SSD Flag: Enables TRIM for better SSD performance
VirtIO NIC: Para-virtualized network driver (near-native performance)

Terraform Plan Output¶

Diagram Placeholder: Terraform Plan Output Screenshot

Terraform Workflow¶

Step	Command	Purpose
Initialize	`terraform init`	Download provider plugins
Validate	`terraform validate`	Check syntax errors
Plan	`terraform plan -var="hostname=test01" ...`	Preview changes
Apply	`terraform apply -var="hostname=test01" ...`	Execute provisioning
Destroy	`terraform destroy -var="vmid=200"`	Delete resources
Show State	`terraform show`	View current state
Refresh State	`terraform refresh`	Sync state with reality

Example Provisioning Command:

terraform apply \
  -var="hostname=docker-vm-01" \
  -var="vmid=201" \
  -auto-approve

Output: IP address, VMID, MAC address assigned

State Management Strategy¶

Current Implementation:

State File: Local terraform.tfstate in working directory
Locking: None (single operator environment)
Backup: Automated weekly backups to Proxmox Backup Server and Synology NAS

3. Configuration Management with Ansible¶

3.1 Architecture Overview¶

Ansible provides agentless configuration management through SSH-based automation, enforcing consistent baselines across all managed hosts (Linux, Windows, Cisco, VMware, FreeBSD), managing secrets securely via Ansible Vault, and orchestrating complex multi-host operations using declarative playbooks and Galaxy roles.

Security Impact

Configuration drift eliminated through automated enforcement across 40+ hosts
SSH hardening applied consistently via dedicated playbooks (PermitRootLogin no, key-only auth)
Credential exposure prevented through Ansible Vault encryption (AES-256) and vault_* variable pattern
Audit trail established via Git-based version control
Manual errors eliminated through playbooks and connectivity pre-tasks
Multi-platform coverage: Linux, Windows, Cisco IOS, VMware ESXi, FreeBSD (pfSense/OPNsense)

Deployment Rationale: In enterprise environments with 40+ mixed-platform hosts, manual configuration becomes error-prone and time-consuming. This approach reduces configuration time from hours to minutes while ensuring 100% consistency across systems. The lab implementation covers Linux (Debian/RHEL families), Windows Server, Cisco IOS devices, VMware ESXi, and FreeBSD firewalls, demonstrating cross-platform automation capabilities.

Architecture Principles Alignment

Defense in Depth: SSH hardening playbooks disable weak ciphers and enforce key-based auth; firewall rules (UFW/firewalld) deployed uniformly; fail2ban configured consistently; kernel hardening via sysctl parameters; multi-platform audit playbooks provide visibility across the entire infrastructure stack
Secure by Design: Ansible Vault encrypts sensitive variables (vault_ansible_password, vault_root_password); no plaintext credentials in playbooks; SSH keys distributed securely via cloud-init; vault.yml encrypted with AES-256
Zero Trust: Every configuration change logged to Git; playbooks verify current configuration before making changes; connectivity pre-tasks skip unreachable hosts gracefully; password rotation workflow requires explicit vault updates

3.2 Setup Overview¶

Control plane: Ansible running in a Proxmox LXC

Deployed from a Proxmox Debian-based LXC ("ansible"), using a Python virtual environment
SSH auth is key-based via an "ansible" user
Handles initial configuration management to achieve a standard template across all hosts
Standard base packages, DNS, PKI and SSH configurations plus user accounts and permissions

Control Plane Configuration¶

Component	Details
Ansible Controller	Debian 12 LXC (192.168.x.x)
Ansible Version	2.16.x (core)
Python Environment	venv isolated (Python 3.11)
Authentication	SSH key-based (ed25519)
Privilege Escalation	sudo (passwordless for ansible user)
Inventory	Dynamic (Proxmox API) + static YAML
Vault Encryption	ansible-vault with AES-256

3.3 Inventory and Variable Structure¶

Inventory Design Principles

The inventory uses a hierarchical group structure with NO host duplication. Each host appears exactly once in a platform-specific group, then aggregated via [group:children] declarations for flexible targeting.

Host Type Groups: [lxc], [vm], [new]
OS Family Groups: [debian_lxc], [debian_vm], [redhat_lxc], [redhat_vm]
OS Aggregate Groups: [debian], [redhat] (children of lxc+vm OS groups)
Platform Groups: [windows], [cisco], [vmware], [freebsd], [fortigate]
Function Groups: [monitoring], [dns], [proxy], [pki], [docker], [k3s]
Meta Groups: [linux] (all managed Linux hosts – excludes [new])

Inventory Code Snippets (hosts.ini)

##################################################################
# DEBIAN LXC HOSTS
##################################################################
[debian_lxc]
192.168.1.108       # webserver          web.home.com
192.168.1.136       # Plex               plex.home.com
192.168.1.250       # Pi-hole            pihole.home.com     (Docker)
192.168.1.219       # Wazuh manager      wazuh.home.com

[debian_lxc:vars]
ansible_user=ansible
ansible_password="{{ vault_ansible_password }}"
is_lxc=true

##################################################################
# WINDOWS HOSTS
##################################################################
[windows]
192.168.1.152       # WinServer 2022 / DC    dc01.home.com
192.168.1.142       # WinServer 2025 / DC    dc02.home.com

[windows:vars]
ansible_connection=winrm
ansible_winrm_transport=ntlm
ansible_port=5985
ansible_password="{{ vault_ansible_windows_password }}"

##################################################################
# OS AGGREGATE GROUPS – children only
##################################################################
[debian:children]
debian_lxc
debian_vm

[linux:children]
debian
redhat

##################################################################
# FUNCTION GROUPS
##################################################################
[monitoring]
192.168.1.181       # Uptime Kuma
192.168.1.219       # Wazuh
192.168.1.246       # Grafana

Targeting Examples:

ansible-playbook playbook.yml --limit linux          # All Linux hosts
ansible-playbook playbook.yml --limit lxc            # All LXCs
ansible-playbook playbook.yml --limit debian         # All Debian (LXC + VM)
ansible-playbook playbook.yml --limit debian_lxc     # Debian LXCs only
ansible-playbook playbook.yml --limit lxc --skip-tags reboot  # Skip reboot on LXCs

Variable Structure

Variables follow a strict hierarchy: group_vars/all.yml (global defaults) → group_vars/vault.yml (encrypted secrets) → group-specific vars → host_vars/ (host-specific overrides).
Global Variables (group_vars/all.yml)
# Ansible connection
ansible_python_interpreter: auto_silent
ansible_user: ansible
ansible_password: "{{ vault_ansible_password }}"
ansible_become: true
ansible_become_method: sudo
ansible_become_password: "{{ vault_ansible_password }}"

# SSH public keys
ansible_ssh_pubkey: "ssh-ed25519 AAAAC3..."
officepc_ssh_pubkey: "ssh-ed25519 AAAAC3..."

# Wazuh agent configuration
wazuh_manager_ip: "192.168.1.219"
wazuh_manager_port: 1514
wazuh_version: "4.14.2-1"

# CheckMK agent configuration
checkmk_server: "http://192.168.1.126:5000"
checkmk_site: "cmk"
checkmk_version: "2.4.0p20-1

Vault Structure (group_vars/vault.yml - Encrypted)

vault_ansible_password: "<32-char-random-token>"
vault_root_password: "<complex-password>"
vault_paul_password: "<user-password>"
vault_ansible_windows_password: "<windows-password>"
vault_user_paul_password: "<cisco-password>"
vault_enable_password: "<cisco-enable-password>

Galaxy Roles and Collections

Ansible Galaxy provides pre-built roles and collections for common tasks. The lab uses officially maintained roles for Wazuh, CheckMK agent deployment, plus support for Windows, VMware, and Cisco hosts.

3.4 Core Playbooks - Detailed Overview¶

Playbook 1: Multi-Platform System Audit (sys_audit_n8n.yml)¶

Purpose: Comprehensive infrastructure audit across Linux, Windows, and FreeBSD hosts with JSON output for n8n workflow processing.

Key Features:

Single-play design for all platforms (Linux/Windows/FreeBSD)
Connectivity pre-checks skip unreachable hosts gracefully
Platform-specific data collection with conditional blocks
Consolidated JSON output to /tmp/audit_report.json
Integration with n8n for HTML report generation and alerting

Data Collected:

System: CPU cores, memory, disk usage %, memory usage %, uptime
Network: Default gateway, nameservers, listening ports count
Security: SSH keys count, Windows Defender status, running services
Software: Kernel version, available updates count

Code Snippet - Connectivity Pre-Tasks

pre_tasks:
  - name: Check if host is reachable
    ansible.builtin.wait_for_connection:
      timeout: 5
    ignore_unreachable: true
    ignore_errors: true
    register: connection_check

  - name: Gather facts only for reachable hosts
    ansible.builtin.setup:
    when: connection_check is success

  - name: Skip unreachable host
    ansible.builtin.meta: end_host
    when: connection_check is failed or connection_check is unreachable

Code Snippet - Linux Data Collection

- name: Collect disk usage (Linux)
  shell: df -h / | tail -n 1 | awk '{print $5}' | sed 's/%//'
  register: disk_usage_pct
  changed_when: false

- name: Collect memory usage (Linux)
  shell: free | grep Mem | awk '{printf "%.0f", ($3/$2) * 100}'
  register: mem_usage_pct
  changed_when: false

- name: Collect package updates (Linux)
  shell: |
    if command -v apt &> /dev/null; then
      apt list --upgradable 2>/dev/null | grep -c upgradable || echo "0"
    elif command -v dnf &> /dev/null; then
      dnf check-update -q 2>/dev/null | grep -v "^$" | wc -l || echo "0"
    fi
  register: updates_available
  changed_when: false

Code Snippet - Windows Data Collection

- name: Collect disk usage (Windows)
  win_shell: |
    $disk = Get-PSDrive C | Select-Object Used,Free
    [math]::Round(($disk.Used / ($disk.Used + $disk.Free)) * 100, 0)
  register: win_disk_usage

- name: Collect Windows Defender status
  win_shell: (Get-MpComputerStatus).AntivirusEnabled
  register: win_defender_status
  ignore_errors: true

Code Snippet - JSON Consolidation

- name: Write consolidated JSON report
  copy:
    content: |
      {
        "report_generated": "{{ ansible_date_time.iso8601 }}",
        "report_date": "{{ ansible_date_time.date }}",
        "total_hosts": {{ collected_audit_data | length }},
        "hosts": {{ collected_audit_data | to_nice_json }}
      }
    dest: "/tmp/audit_report.json"
  delegate_to: localhost
  run_once: true

Use Cases:

Weekly infrastructure audits via n8n automation
Pre-maintenance compliance checks
Capacity planning via disk/memory trending
Security baseline validation (SSH keys, services, updates)

Playbook 2: User Management (user_mgmt.yml)¶

Purpose: Centralized user account and credential management across all Linux hosts with Ansible Vault integration.

Key Features:

Sets ansible user password from vault_ansible_password
Sets root password from vault_root_password
Manages paul user with sudo access (password-based)
Deploys SSH keys for authorized users
Verifies ansible key-based auth after password rotation

Code Snippet - Ansible User Password Management

- name: Set ansible user password from vault
  tags: ansible_user
  ansible.builtin.user:
    name: ansible
    password: "{{ vault_ansible_password | password_hash('sha512') }}"
    update_password: always


- name: Verify ansible SSH key is still present
  tags: ansible_user, verify
  ansible.posix.authorized_key:
    user: ansible
    key: "{{ ansible_ssh_pubkey }}"
    state: present

Code Snippet - Paul User with Sudo Configuration

- name: Ensure paul user exists
  ansible.builtin.user:
    name: paul
    shell: /bin/bash
    groups: "{{ 'sudo' if ansible_os_family == 'Debian' else 'wheel' }}"
    append: true
    password: "{{ vault_paul_password | password_hash('sha512') }}"

- name: Deploy sudoers file for paul
  ansible.builtin.copy:
    content: |
      Defaults:paul rootpw
      paul ALL=(ALL) ALL, !/usr/bin/su
    dest: /etc/sudoers.d/paul
    owner: root
    group: root
    mode: '0440'
    validate: '/usr/sbin/visudo -cf %s'

Password Rotation Workflow:

Generate 32-char token: openssl rand -base64 24
Update vault_ansible_password in vault.yml: ansible-vault edit group_vars/vault.yml
Run playbook: ansible-playbook user_mgmt.yml --tags ansible_user
Test connectivity: ansible linux -m ping

Use Cases:

Quarterly credential rotation (automated via n8n)
Emergency password resets
New host bootstrap user provisioning
SSH key distribution after key rotation

Playbook 3: Linux Package Updates (update_linux_hosts.yml)¶

Purpose: Update all Linux hosts with dist-upgrade/dnf update and reboot detection.

Key Features:

Debian: apt dist-upgrade with autoremove/autoclean
RedHat: dnf update with latest packages
Reboot detection for kernel updates
Free strategy for parallel execution
Update summary with reboot status

Code Snippet - Debian Package Updates & Reboot Detection

- name: Update all packages (Debian)
  tags: packages, update
  apt:
    upgrade: dist
    update_cache: true
    cache_valid_time: 3600
    autoremove: true
    autoclean: true
  when: ansible_os_family == "Debian"
  register: apt_update

- name: Check if reboot required (Debian)
  tags: reboot
  stat:
    path: /var/run/reboot-required
  register: reboot_deb
  when: ansible_os_family == "Debian"

Use Cases:

Weekly package updates (n8n automation)
Security patch deployment
Post-vulnerability scanning remediation
Compliance maintenance (keep systems current)

Playbook 4: Linux Hardening (linux_hardening.yml)¶

Purpose: Docker-aware SSH and system hardening with automatic Docker detection.

Key Features:

SSH hardening: disable root login, enforce key-only auth
TCP/Agent forwarding enabled for Docker/DevOps workflows
Automatic Docker detection preserves IP forwarding
Safe kernel parameters (sysctl hardening)
Login banner deployment

Code Snippet - Docker Detection

- name: Detect if Docker is installed
  stat:
    path: /usr/bin/docker
  register: docker_check

- name: Set Docker presence fact
  set_fact:
    has_docker: "{{ docker_check.stat.exists }}"

- name: Configure sysctl for Docker hosts
  sysctl:
    name: net.ipv4.ip_forward
    value: '1'
    state: present
    sysctl_set: true
  when: has_docker | bool

Code Snippet - SSH Hardening

- name: Configure SSH hardening options
  lineinfile:
    path: /etc/ssh/sshd_config
    regexp: '^#?{{ item.key }}\s'
    line: '{{ item.key }} {{ item.value }}'
    state: present
  loop:
    - { key: 'PermitRootLogin', value: 'no' }
    - { key: 'PasswordAuthentication', value: 'no' }
    - { key: 'PubkeyAuthentication', value: 'yes' }
    - { key: 'AllowAgentForwarding', value: 'yes' }
    - { key: 'AllowTcpForwarding', value: 'yes' }
  notify: restart sshd

Use Cases:

New host security baseline
Docker host specialized hardening
Compliance enforcement (CIS benchmarks)
Post-compromise hardening

Playbook 5: New Install Baseline (new_install_baseline_roles.yml)¶

Purpose: Bootstrap new hosts with baseline configuration using Galaxy roles.

Roles Used:

wazuh.wazuh_agent (version: 4.14.2)
Custom bootstrap tasks (user creation, SSH, sudo)

Key Features:

Creates ansible service account with passwordless sudo
Deploys SSH keys for ansible and paul users
Configures sudoers with validation
Installs Wazuh agent and registers with manager
Installs utility packages (curl, nano, dig, traceroute)
Enables qemu-guest-agent for Proxmox integration

Code Snippet - User Creation

- name: Create ansible user
  ansible.builtin.user:
    name: ansible
    shell: /bin/bash
    password: "{{ vault_ansible_password | password_hash('sha512') }}"
    comment: "Ansible Automation Service Account"

- name: Ensure .ssh directory exists
  file:
    path: /home/ansible/.ssh
    state: directory
    owner: ansible
    group: ansible
    mode: '0700'

- name: Add SSH key for ansible user
  ansible.posix.authorized_key:
    user: ansible
    key: "{{ ansible_ssh_pubkey }}"
    state: present

- name: Deploy sudoers file for ansible user (passwordless)
  ansible.builtin.copy:
    content: "ansible ALL=(ALL) NOPASSWD:ALL\n"
    dest: /etc/sudoers.d/ansible
    mode: '0440'
    validate: '/usr/sbin/visudo -cf %s'

Use Cases:

Fresh install bootstrap (first playbook to run)
Wazuh agent deployment across new hosts
Monitoring stack integration
Standardized user/SSH configuration

4. Version Control and GitOps¶

Version Control Strategy¶

All infrastructure‑as‑code assets for this solution — including Ansible playbooks, Terraform configurations, and related modules — are stored in a dedicated GitHub repository using Git on the local host.

This central repository provides:

Version history — Every change is committed with a message on what changed, enabling full audit trails for infrastructure modifications
Consistency across environments — Ansible roles and Terraform modules are version‑locked so the same codebase can be applied to multiple hosts
Rollback capability — Previous commits can be checked out to restore infrastructure to a known good state quickly in case of issues

Configuration files for hosted and platform services — including YAML, JSON, HTML, CSS, Python, and PowerShell scripts — are stored in a separate repository. This repository is fully integrated with Visual Studio Code, allowing seamless local and remote (via SSH) integration editing.

By keeping both provisioning (Terraform) and configuration management (Ansible) in one repository, and separating hosted/platform service configs into another, the architecture ensures that each layer of the stack is versioned, auditable, and maintainable.

GitHub Configuration — Git/GitHub Repository.

5. Workflow Automation with n8n¶

5.1 Platform Overview¶

n8n is a self-hosted, low-code workflow automation platform enabling visual workflow design with conditional logic, error handling, loops, and data transformation. Deployed as a containerized service behind Traefik reverse proxy with Authentik SSO.

Security Impact

Security operations accelerated through automated SOAR workflows
Manual triage eliminated via automated alert enrichment
MTTR reduced through auto-generated remediation playbooks
Alert fatigue minimized through intelligent deduplication and correlation
Compliance audit trails automatically documented throughout the incident lifecycle

Deployment Rationale: Security operations generate thousands of events daily, manual triage is unsustainable. n8n demonstrates Security Orchestration, Automation and Response (SOAR) capabilities where vulnerability scans trigger automated ticket creation, threat intelligence enrichment queries multiple APIs, and incident response playbooks execute without human intervention. This mirrors enterprise SOAR platforms (Splunk SOAR, Palo Alto Cortex XSOAR) where analyst efficiency is multiplied through automation.

Architecture Principles Alignment

Defense in Depth: Automated vulnerability remediation workflows execute Ansible playbooks; firewall rule changes logged and reviewed; failed automation triggers manual fallback procedures
Secure by Design: Credentials stored in n8n credential vault (encrypted at rest); webhook endpoints authenticated via Authentik tokens; workflow execution logs forwarded to SIEM
Zero Trust: Every automation action logged with timestamp/user; no hardcoded credentials; API tokens expire and rotate automatically

n8n Configuration¶

Version: n8n v2.7.5 (latest stable)
Execution Mode: Main process (not queue mode for simplicity)
Webhook URL: https://n8n.home.com
TLS Certificate: Step-CA issued, auto-renewed
Authentication: SSO via Authentik (no local passwords)
Monitoring: Uptime Kuma
Notifications: Discord webhooks, SMTP relay

Security Controls¶

Credential Encryption: All API tokens encrypted at rest, Ansible automated (ansible-vault)
Webhook Security: HMAC signature validation on inbound webhooks
Audit Trail: All workflow executions logged with timestamp and user

5.2 Workflow 1: Lab Infrastructure Audit¶

Purpose: Automated weekly configuration audit and system updates across lab infrastructure with HTML reporting and dual alerting (Discord + Email).

This workflow runs weekly on Sunday at 2 AM, executes the Ansible sys_audit_n8n.yml playbook, transforms the JSON output into a styled HTML report, deploys it to the Apache webserver, and sends notifications via Discord and email.

Workflow Summary:

Scheduled Execution: Triggered weekly via n8n's Cron node (Sunday 2 AM)
Ansible Playbook: Executes sys_audit_n8n.yml via SSH to collect system metrics
Data Transformation: Parses JSON and generates HTML report with CSS styling
Apache Upload: Deploys HTML to /var/www/html/ and updates index page
Discord Alert: Sends formatted notification with report link and summary stats
Email Alert: Sends HTML email template with audit summary

Workflow Nodes¶

Node Type	Configuration	Purpose
Schedule Trigger	Cron: 0 2 * * 0 (Sunday 2 AM)	Weekly execution
SSH Node 1	ansible-playbook sys_audit_n8n.yml	Run audit playbook
SSH Node 2	cat /tmp/audit_report.json	Read JSON output
Code Node 1	Parse JSON from stdout	Extract audit data
Code Node 2	Generate HTML with CSS	Create styled report
SSH Node 3	Write HTML to webserver	Deploy to Apache
SSH Node 4	Update index.html	Add report to index
Code Node 3	Generate Discord markdown	Format alert message
Discord Webhook	POST to webhook URL	Send Discord notification
Code Node 4	Generate email HTML	Format email template
Email Node	SMTP send	Send email notification

HTML Report Features:

Responsive grid layout with host cards
Platform-specific color coding (Linux/Windows/FreeBSD)
Visual metrics with progress bars for disk/memory usage
Warning/critical thresholds highlighted (>75% yellow, >90% red)
Summary statistics cards (total hosts, high disk/memory usage counts)
Timestamp and audit metadata in footer

5.3 Workflow 2: Threat Intelligence Aggregation¶

Purpose: Daily ingestion and distribution of curated cybersecurity threat intelligence with AI-powered summarization.

This workflow runs daily at 8 AM to ingest and distribute curated threat intelligence from multiple cybersecurity RSS feeds. It supports situational awareness and IOC enrichment across the lab environment.

Workflow Summary:

Scheduled Execution: Triggered daily via n8n's Cron node (6 AM)
RSS Feed Polling: Pulls entries from curated list of cybersecurity sources
Feed Limiting: Filters each feed to articles added in the last 24 hours to reduce noise
ChatGPT Integration: NIST feed summarized via OpenAI API
Discord Notification: Formatted aggregated feed summary to #threat-intel channel

RSS Feeds:

Darknet Diaries (https://podcast.darknetdiaries.com/)
NIST (https://www.nist.gov/blogs/cybersecurity-insights/rss.xml)
Krebs on Security (https://krebsonsecurity.com/feed/)
Threat Post (https://threatpost.com/feed/)
BleepingComputer (https://www.bleepingcomputer.com/feed/)
CIS (https://www.cisecurity.org/feed/advisories)
NAO SEC (https://nao-sec.org/feed)

Workflow Nodes¶

Node Type	Configuration	Purpose
Schedule Trigger	Cron: 0 8 * * * (Daily 6 AM)	Daily execution
RSS Feed Reader	URLs: CIS, NIST, Krebs, ThreatPost, BleepingComputer, etc	Ingest threat intel
Filter Node	Limit each feed to the last 24 hours	Reduce noise
Merge Node	Combine all feeds	Aggregate data
OpenAI Node	Summarize NIST feed with ChatGPT	AI-powered summary
Format Node	Create Discord embed message	Visual formatting
Discord Webhook	POST to #threat-intel channel	Distribute to team

Workflow Benefits:

Centralized threat intelligence (7 sources → 1 channel)
AI-powered summarization reduces information overload
Daily cadence ensures timely awareness of emerging threats
Supports incident response and vulnerability management

5.4 Workflow 3: Weekly Package Updates with Alerting¶

Purpose: Automated weekly package manager updates across all Linux hosts with Discord and email alerting.

This workflow runs weekly on Friday at 3 AM, executes the update_linux_hosts.yml Ansible playbook, parses the output to categorize update results, and sends formatted notifications via Discord and email.

Workflow Summary:

Scheduled Execution: Weekly trigger (Friday 3 AM)
Ansible Playbook: Executes update_linux_hosts.yml for apt/dnf updates
Data Parsing: Converts raw Ansible stdout to structured JSON
Update Categorization: Hosts grouped by status (no updates, updated, reboot required, errors)
Markdown Generation: Creates formatted summary for Discord
HTML Generation: Creates formatted report for email
Discord Alert: Sends markdown summary with status counts
Email Alert: Sends HTML email with detailed update report

Workflow Nodes¶

Node Type	Configuration	Purpose
Schedule Trigger	Cron: 0 3 * * 5 (Friday 3 AM)	Weekly execution
SSH Node	ansible-playbook update_linux_hosts.yml	Run package updates
Code Node 1	Parse stdout: extract reachable/unreachable hosts, summaries, recap	Convert to structured JSON
Code Node 2	Generate markdown and HTML summaries with categorization	Format alert content
Discord Node	POST markdown summary	Send Discord notification
Email Node	SMTP send HTML report	Send email notification

Data Flow and Parsing Logic

The workflow parses Ansible's stdout output to extract comprehensive update information:

Reachable Hosts: Extracted from 'ok: [hostname]' lines
Unreachable Hosts: Extracted from 'fatal: [hostname]' lines
Update Summaries: Parsed from 'msg' fields containing platform, packages updated, reboot status
Play Recap: Structured data showing ok/changed/failed/skipped counts per host

Host Categorization

Hosts are automatically categorized based on update results:

No Updates Required: Hosts with packages_updated = False
Updates Applied: Hosts with packages_updated = True and reboot_required = False
Reboot Required: Hosts with reboot_required = True
Errors Detected: Hosts with empty/null platform, packages_updated, reboot_required
Unreachable: Hosts that failed connectivity checks

Code Snippet - Stdout to JSON Parsing

// Parse Ansible stdout into structured data
const stdout = $input.first().json.stdout;
const lines = stdout.split('\n');

// Extract reachable and unreachable hosts
let reachable = [];
let unreachable = [];

for (const line of lines) {
  const okMatch = line.match(/^ok:\s+\[(.*?)\]/);
  if (okMatch) reachable.push(okMatch[1]);

  const fatalMatch = line.match(/^fatal:\s+\[(.*?)\]/);
  if (fatalMatch) unreachable.push(fatalMatch[1]);
}

// Parse update summaries from msg fields
let summaries = [];
let currentHost = null;

for (const line of lines) {
  const hostMatch = line.match(/^ok:\s+\[(.*?)\]\s+=>/);
  if (hostMatch) currentHost = hostMatch[1];

  if (line.includes('"msg":')) {
    const msg = line.replace(/.*"msg":\s+"|",?$/g, "");
    const platform = (msg.match(/Platform:\s+(.*)/) || [])[1];
    const updated = (msg.match(/Packages updated:\s+(.*)/) || [])[1];
    const reboot = (msg.match(/Reboot required:\s+(.*)/) || [])[1];

    summaries.push({
      host: currentHost,
      platform,
      packages_updated: updated,
      reboot_required: reboot
    });
  }
}

return [{ json: { total_hosts, reachable_hosts, unreachable_hosts, update_summaries, recap } }];

Code Snippet - Categorization and Formatting

// Categorize hosts by update status
const noUpdates = [];
const updatesApplied = [];
const rebootRequired = [];
const errorsDetected = [];

for (const s of normalized) {
  if (s.error) {
    errorsDetected.push(s.host);
  } else if (s.reboot) {
    rebootRequired.push(s.host);
  } else if (s.updated) {
    updatesApplied.push(s.host);
  } else {
    noUpdates.push(s.host);
  }
}

// Generate markdown for Discord
const md = `
## Linux Update Summary

**Total Hosts:** ${total}
**Reachable:** ${reachable.length}
**Unreachable:** ${unreachable.length}

🟢 No Updates Required (${noUpdates.length})
${noUpdates.map(h => \`- ${h}\`).join("\n")}

🔵 Updates Applied (${updatesApplied.length})
${updatesApplied.map(h => \`- ${h}\`).join("\n")}

🟠 Reboot Required (${rebootRequired.length})
${rebootRequired.map(h => \`- ${h}\`).join("\n")}

🔴 Errors Detected (${errorsDetected.length})
${errorsDetected.map(h => \`- ${h}\`).join("\n")}
`;

return [{ json: { markdown: md, html: html } }];

Workflow Benefits:

Automated weekly package updates reduce manual maintenance
Categorized results prioritize attention (errors and reboots first)
Dual alerting ensures visibility across communication channels
Structured parsing enables trend analysis over time
Unreachable host detection prevents silent failures

Use Cases:

Weekly security patch deployment
Compliance maintenance (systems up-to-date)
Post-vulnerability scanning remediation
Reboot planning based on kernel updates
Infrastructure health monitoring

5.5 Workflow 4: Monthly Automated Ansible Token Rotation¶

[PLACEHOLDER - Implementation Pending]

Purpose: Automated monthly credential rotation for Ansible vault with random token generation, vault file update, user management playbook execution, and dual alerting.

Planned Workflow Summary:

Scheduled Execution: Monthly trigger (1st of month at 4 AM)
Token Generation: Create cryptographically random 32-char token (base64)
Vault Update: SSH to Ansible controller and update vault_ansible_password in vault.yml
Playbook Execution: Run user_mgmt.yml --tags ansible_user to propagate new password
Connectivity Test: Verify Ansible can still connect to all hosts via ping
Discord Alert: Send success/failure notification with rotation summary
Email Alert: Send HTML email confirming rotation and test results

Planned Workflow Nodes¶

Node Type	Configuration	Purpose
Schedule Trigger	Cron: 0 4 1 * * (1st of month 4 AM)	Monthly execution
Code Node 1	Generate token: crypto.randomBytes(24).toString('base64')	Create new password
SSH Node 1	Backup current vault.yml	Create vault backup
SSH Node 2	ansible-vault edit vault.yml (update vault_ansible_password)	Update vault variable
SSH Node 3	ansible-playbook user_mgmt.yml --tags ansible_user	Propagate new password
SSH Node 4	ansible linux -m ping	Test connectivity
Code Node 2	Parse ping results for success count	Verify all hosts accessible
IF Node	Check if ping success == total hosts	Route to success/failure
Code Node 3	Generate Discord success message	Format success alert
Discord Webhook	POST rotation success to Discord	Send Discord notification
Code Node 4	Generate HTML email template	Format email report
Email Node	SMTP send with rotation details	Send email notification
Error Handler	Rollback vault.yml and alert on failure	Handle rotation errors

Security Considerations for Token Rotation:

Token generation uses crypto.randomBytes (cryptographically secure)
Vault backup created before each rotation for rollback capability
Connectivity test verifies all hosts accessible before confirming rotation
Error handler automatically reverts to backup vault on failure
Credentials never logged or displayed in workflow execution history
n8n credential vault stores vault password with AES-256 encryption

Expected Alert Content:

Rotation timestamp
Token generation success
Vault update status
User management playbook execution result
Connectivity test results (hosts reachable/unreachable)
Rollback status (if applicable)

5.6 Best Practices and Lessons Learned¶

Error Handling and Resilience:

All SSH nodes include timeout settings (30-60 seconds)
Workflow execution logs retained for 30 days in n8n database

Credential Management:

All credentials stored in n8n vault (never hardcoded)
SSH credentials use key-based auth where possible
Webhook URLs stored as environment variables
SMTP credentials encrypted with AES-256 at rest

Notification Design:

Discord: Concise markdown with key metrics and report links
Email: Detailed HTML templates with embedded CSS for compatibility
Critical alerts include @ mentions for immediate attention
All notifications include timestamp and workflow execution ID

Integration Patterns:

Ansible workflows use SSH node for remote execution
JSON output from playbooks parsed in Code nodes
HTML and Markdown generation via JavaScript templates
Apache webserver deployment via SSH file write operations
Multi-platform support handled through conditional Ansible blocks

6. Scripting for Advanced Automation¶

Script Development Strategy¶

Custom scripts supplement configuration management tools for tasks requiring complex logic, performance optimization, or specialized functionality. All scripts are version-controlled in Git, documented with inline comments, and integrated into broader automation workflows.

Security Impact

Complex security operations automated where configuration‑management tools lack native functionality
Performance‑critical tasks (log parsing, threat‑intelligence queries) optimized beyond built‑in tool capabilities
Security tool APIs integrated through custom automation wrappers for expanded functionality
Rapid prototyping enables validation of security concepts before production‑grade implementation

Deployment Rationale: While Ansible/Terraform handle declarative configuration, procedural logic (API rate-limiting, stateful workflows, real-time processing) requires traditional scripting. Enterprise environments use scripts for custom integrations, API gateways, and performance-critical operations. This demonstrates ability to select appropriate automation tools—declarative vs. imperative—based on task requirements.

Architecture Principles Alignment

Defense in Depth: Scripts validate input parameters; error handling prevents cascading failures; execution logs audited for anomalies
Secure by Design: No hardcoded credentials (environment variables or secret managers); input sanitization prevents injection attacks; least-privilege execution (non-root where possible)
Zero Trust: Every script execution logged with user/timestamp; API calls authenticated via short-lived tokens; output validation before downstream processing

Script Language Selection Criteria¶

Language	Use Cases	Advantages
Bash	Linux system administration; cron jobs	Native; fast; no dependencies
PowerShell	Windows management; AD operations	Deep Windows integration; objects
Python	API integration; data processing; ML	Rich libraries; cross-platform

6.1 PowerShell and Bash Scripting¶

Custom automation scripts are developed in both PowerShell (for Windows systems) and Bash (for Linux systems) to handle repetitive administrative tasks, enforce configuration standards, and respond to system events. These scripts automate activities such as user account provisioning, log rotation, backup verification, certificate renewal checks, and security baseline enforcement. PowerShell scripts leverage native Windows management frameworks like Active Directory modules and WMI, while Bash scripts utilize standard Unix utilities and interact with system APIs. Scripts are version-controlled in GitHub alongside infrastructure code, enabling rollback capability and documentation of automation logic.

Bash Script Example: Backup Automation¶

#!/bin/bash
#
# backup_web.sh - Versioned rsync backup with validation
# Usage: backup_web.sh <source> <target>
# Example: backup_web.sh /var/www/lab /backup
#
# Exit Codes:
#   0 - Success
#   1 - Invalid arguments
#   2 - Missing dependency
#   3 - Rsync failed
set -euo pipefail  # Exit on error, undefined vars, pipe failures

# check to make sure the user has entrered exactly two arguments.
if [ $# -ne 2 ]
then    
    /usr/bin/echo "Usage: backup.sh <source_directory> <target_directory>"
    /usr/bin/echo "Please try again."
    exit 1
fi
SOURCE="$1"
TARGET="$2"

# Validate paths (prevent directory traversal)
if [[ ! "$SOURCE" =~ ^/[a-zA-Z0-9/_-]+$RetryPContinuebash]]; then
    echo "Error: Invalid source path format"
    exit 1
fi

if [[ ! "$TARGET" =~ ^/[a-zA-Z0-9/_-]+$ ]]; then
    echo "Error: Invalid target path format"
    exit 1
fi

#check to see if rsync is installed
if ! command -v rsync > /dev/null 2>&1
then
    /usr/bin/echo "This script requires rsync to be installed."
    /usr/bin/echo "Please install the package and run the script again."
    exit 2
fi
# Verify source exists
if [ ! -d "$SOURCE" ]; then
    /usr/bin/echo "Error: Source directory does not exist: $SOURCE"
    exit 1
fi

# Create target if needed
mkdir -p "$TARGET"

# Generate timestamp
TIMESTAMP=$(date +%Y-%m-%d_%H-%M-%S)
BACKUP_DATE=$(date +%Y-%m-%d)
LOG_FILE="/var/log/backup_${BACKUP_DATE}.log"

# Rsync options
RSYNC_OPTS=(
    -avz                                    # Archive, verbose, compress
    --delete                                # Remove deleted files
    --backup                                # Backup changed files
    --backup-dir="$TARGET/versions/$TIMESTAMP"  # Versioned backups
    --exclude='*.tmp'                       # Exclude temp files
    --exclude='.git'                        # Exclude version control
    --log-file="$LOG_FILE"                  # Detailed logging
    --stats                                 # Show transfer statistics
)

# Log start
/usr/bin/echo "=== Backup started at $(date) ===" | tee -a "$LOG_FILE"
logger -t backup_web "Starting backup: $SOURCE -> $TARGET"

# Execute rsync
if rsync "${RSYNC_OPTS[@]}" "$SOURCE/" "$TARGET/current/"; then
    /usr/bin/echo "=== Backup completed successfully at $(date) ===" | tee -a "$LOG_FILE"
    logger -t backup_web "SUCCESS: Backup completed"

    # Calculate backup size
    BACKUP_SIZE=$(du -sh "$TARGET/current" | cut -f1)
    /usr/bin/echo "Backup size: $BACKUP_SIZE" | tee -a "$LOG_FILE"

    # Retention: Keep only last 7 version directories
    find "$TARGET/versions/" -maxdepth 1 -type d -mtime +7 -exec rm -rf {} \;

    exit 0
else
    RSYNC_EXIT=$?
    /usr/bin/echo "=== Backup FAILED at $(date) with exit code $RSYNC_EXIT ===" | tee -a "$LOG_FILE"
    logger -t backup_web "FAILED: rsync exited with code $RSYNC_EXIT"
    exit 3
fi

This Bash script is designed to perform a backup using rsync, with versioned backups stored by date. It validates input, checks for dependencies, and logs the simulated operation.

Script: /usr/local/bin/backup_web.sh

Purpose: Incremental rsync backup with versioning and logging

Line(s)	Purpose	Explanation
#!/bin/bash	Script interpreter	Ensures the script runs using Bash
if [ $# -ne 2 ]	Argument check	Verifies that exactly two arguments are provided
echo "Usage:..."	Usage message	Informs the user of correct syntax if arguments are missing
exit 1	Exit on error	Terminates with exit code 1 for incorrect usage
command -v rsync	Dependency check	Verifies that rsync is installed and available in $PATH
exit 2	Exit on missing dependency	Terminates with exit code 2 if rsync is not found
current_date=$(date +%Y-%m-%d)	Timestamp	Captures the current date in YYYY-MM-DD format for versioning
rsync_options=...	Backup options	Sets rsync flags: -a: archive mode; -v: verbose; -b: backup files; --backup-dir: dated subdirectory; --delete: remove obsolete files; --dry-run: simulate
$(which rsync) ...	Execute rsync	Runs rsync with defined options, syncing from $1 to $2/current
>> /var/log/backup_$current_date.log	Logging	Appends output to a date-stamped log file for auditability

Linux Upgrade Script Overview¶

#!/bin/bash
set -e

logfile=/var/log/update_script.log
errorlog=/var/log/update_script_errors.log
hostname=$(hostname)

/usr/bin/echo "-------------------START SCRIPT on $hostname-------------------" 1>>$logfile 2>>$errorlog
check_exit_status() {
    if [ $? -ne 0 ]
    then
        /usr/bin/echo "An error occured, please check the $errorlog file."
    fi    
}

if [ -d /etc/apt ]; then
    # Debian or Ubuntu
    /usr/bin/echo "Detected Debian/Ubuntu system"
    /usr/bin/sudo apt update 1>>$logfile 2>>$errorlog
    check_exit_status
    /usr/bin/sudo apt dist-upgrade -y 1>>$logfile 2>>$errorlog
    check_exit_status
elif [ -f /etc/redhat-release ]; then
    distro=$(cat /etc/redhat-release)

    if [[ "$distro" == *"Fedora"* ]]; then
        /usr/bin/echo "Detected Fedora system"
        /usr/bin/sudo dnf upgrade --refresh -y #!/bin/bash

logfile=/var/log/update_script.log
errorlog=/var/log/update_script_errors.log


check_exit_status() {
    if [ $? -ne 0 ]
    then
        /usr/bin/echo "An error occured, please check the $errorlog file."
    fi    
}

if [ -d /etc/apt ]; then
    # Debian or Ubuntu
    /usr/bin/echo "Detected Debian/Ubuntu system"
    /usr/bin/sudo apt update 1>>$logfile 2>>$errorlog
    check_exit_status
    /usr/bin/sudo apt dist-upgrade -y 1>>$logfile 2>>$errorlog
    check_exit_status
elif [ -f /etc/redhat-release ]; then
    distro=$(cat /etc/redhat-release)

    if [[ "$distro" == *"Fedora"* ]]; then
        /usr/bin/echo "Detected Fedora system"
        /usr/bin/sudo dnf upgrade --refresh -y 1>>$logfile 2>>$errorlog
        check_exit_status   
    elif [[ "$distro" == *"CentOS"* ]] || [[ "$distro" == *"Red Hat"* ]]; then
        /usr/bin/echo "Detected CentOS or RHEL system"
        /usr/bin/sudo yum update -y 1>>$logfile 2>>$errorlog
        check_exit_status      
    else
        /usr/bin/echo "Detected unknown Red Hat-based system"
        /usr/bin/sudo yum update -y 1>>$logfile 2>>$errorlog
        check_exit_status 
    fi
else
    /usr/bin/echo "Unsupported or unknown Linux distribution"
fi


        check_exit_status   
    elif [[ "$distro" == *"CentOS"* ]] || [[ "$distro" == *"Red Hat"* ]]; then
        /usr/bin/echo "Detected CentOS or RHEL system"
        /usr/bin/sudo yum update -y 1>>$logfile 2>>$errorlog
        check_exit_status      
    else
        /usr/bin/echo "Detected unknown Red Hat-based system"
        /usr/bin/sudo yum update -y 1>>$logfile 2>>$errorlog
        check_exit_status 
    fi
else
    /usr/bin/echo "Unsupported or unknown Linux distribution"
fi

/usr/bin/echo "The script completed at: $(/usr/bin/date)" 1>>$logfile 2>>$errorlog
/usr/bin/echo "-------------------END SCRIPT on $hostname-------------------" 1>>$logfile 2>>$errorlog

This Bash script performs a distribution-aware system upgrade, logging all output and errors to dedicated log files. It supports Debian/Ubuntu, Fedora, CentOS, and RHEL, and includes error checking after each upgrade step.

Line(s)	Purpose	Explanation
#!/bin/bash	Interpreter declaration	Ensures the script runs with Bash
logfile=/var/log/update_script.log; errorlog=/var/log/update_script_errors.log	Log file setup	Defines paths for standard output and error logs
echo "START SCRIPT"	Start marker	Logs the beginning of the script execution
check_exit_status()	Error handler	Function that checks the last command's exit code and prints an error message if non-zero
if [ -d /etc/apt ]; then	Distro detection	Checks for Debian/Ubuntu by presence of APT directory
apt update; apt dist-upgrade -y	Debian/Ubuntu upgrade	Runs update and full upgrade, logs output and errors
elif [ -f /etc/redhat-release ]; then	Red Hat-based detection	Checks for Fedora, CentOS, or RHEL using release file
cat /etc/redhat-release	Distro name	Reads the release file to identify the specific variant
dnf upgrade --refresh -y	Fedora upgrade	Refreshes metadata and upgrades packages
yum update -y	CentOS/RHEL upgrade	Performs system update using yum
echo "Unsupported distro"	Fallback	Handles unknown or unsupported systems
echo "Script completed at: $(date)"	Completion timestamp	Logs the end time of the script
echo "END SCRIPT"	End marker	Logs the conclusion of the script execution

PowerShell Script Example: Windows Update Automation¶

[CmdletBinding()]
param(
    [switch]$SkipReboot,
    [string]$WebhookUrl = "windows_update_url"
)

$ErrorActionPreference = "Continue"
$LogPath               = "C:\Logs\Windows-Updates"
$LogFile               = Join-Path $LogPath "update_$(Get-Date -Format 'yyyy-MM-dd').log"
$MaxEventLogMsgLength  = 30000
$FailureCount          = 0

if (-not (Test-Path $LogPath)) {
    New-Item -ItemType Directory -Path $LogPath -Force | Out-Null
    Write-Host "[BOOTSTRAP] Created log directory: $LogPath"
} else {
    Write-Host "[BOOTSTRAP] Log directory exists: $LogPath"
}

Start-Transcript -Path $LogFile -Append
Write-Host "[BOOTSTRAP] Transcript started: $LogFile"

Write-Host "[BOOTSTRAP] Checking NuGet provider..."
$nuget = Get-PackageProvider -Name NuGet -ErrorAction SilentlyContinue
if (-not $nuget -or $nuget.Version -lt [Version]"2.8.5.201") {
    Write-Host "[BOOTSTRAP] Installing NuGet provider..."
    Install-PackageProvider -Name NuGet -MinimumVersion 2.8.5.201 -Force -Scope AllUsers | Out-Null
    Write-Host "[BOOTSTRAP] NuGet provider installed."
} else {
    Write-Host "[BOOTSTRAP] NuGet provider OK (version $($nuget.Version))."
}

function Write-Log {
    param(
        [string]$Message,
        [string]$Level = "INFO"
    )
    $Timestamp = Get-Date -Format "yyyy-MM-dd HH:mm:ss"
    Write-Output "[$Timestamp] [$Level] $Message"

    $EventMessage = if ($Message.Length -gt $MaxEventLogMsgLength) {
        $Message.Substring(0, $MaxEventLogMsgLength) + "`n[TRUNCATED - see log file]"
    } else {
        $Message
    }
    $EventId = switch ($Level) { "ERROR" { 3 } "WARNING" { 2 } default { 1 } }
    Write-EventLog -LogName Application -Source "WindowsUpdate" `
        -EntryType Information -EventId $EventId -Message $EventMessage `
        -ErrorAction SilentlyContinue
}

function Update-WindowsStoreApps {
    Write-Log "--- BEGIN: Windows Store App Updates ---"
    $Session = $null
    try {
        Write-Log "Opening CIM session..."
        $Session  = New-CimSession
        Write-Log "Querying MDM AppManagement class..."
        $Instance = Get-CimInstance -Namespace "root\cimv2\mdm\dmmap" `
            -ClassName "MDM_EnterpriseModernAppManagement_AppManagement01"
        Write-Log "Invoking Store update scan..."
        $Result = Invoke-CimMethod -CimInstance $Instance -MethodName UpdateScanMethod
        Write-Log "Scan result: $($Result | Out-String)"
        Write-Log "--- END: Windows Store App Updates [SUCCESS] ---"
        return $true
    }
    catch {
        Write-Log "Store update failed: $($_.Exception.Message)" -Level "ERROR"
        Write-Log "--- END: Windows Store App Updates [FAILED] ---" -Level "ERROR"
        return $false
    }
    finally {
        if ($Session) {
            Remove-CimSession $Session
            Write-Log "CIM session closed."
        }
    }
}

function Update-ChocolateyPackages {
    Write-Log "--- BEGIN: Chocolatey Package Updates ---"
    if (-not (Get-Command choco -ErrorAction SilentlyContinue)) {
        Write-Log "Chocolatey not found - skipping." -Level "WARNING"
        Write-Log "--- END: Chocolatey Package Updates [SKIPPED] ---" -Level "WARNING"
        return $true
    }
    Write-Log "Chocolatey version: $(choco --version 2>&1)"
    Write-Log "Running: choco upgrade all -y"
    try {
        $OutputStr = (choco upgrade all -y 2>&1) -join "`n"
        Write-Log "Chocolatey output:`n$OutputStr"
        if ($LASTEXITCODE -eq 0) {
            Write-Log "--- END: Chocolatey Package Updates [SUCCESS] ---"
            return $true
        } else {
            Write-Log "Chocolatey exited with code $LASTEXITCODE" -Level "ERROR"
            Write-Log "--- END: Chocolatey Package Updates [FAILED] ---" -Level "ERROR"
            return $false
        }
    }
    catch {
        Write-Log "Chocolatey exception: $($_.Exception.Message)" -Level "ERROR"
        Write-Log "--- END: Chocolatey Package Updates [FAILED] ---" -Level "ERROR"
        return $false
    }
}

function Update-WingetPackages {
    Write-Log "--- BEGIN: Winget Package Updates ---"
    if (-not (Get-Command winget -ErrorAction SilentlyContinue)) {
        Write-Log "Winget not found - skipping." -Level "WARNING"
        Write-Log "--- END: Winget Package Updates [SKIPPED] ---" -Level "WARNING"
        return $true
    }
    Write-Log "Winget version: $(winget --version 2>&1)"
    Write-Log "Running: winget upgrade --all"
    try {
        $OutputStr = (winget upgrade --all --accept-source-agreements --accept-package-agreements 2>&1) -join "`n"
        Write-Log "Winget output:`n$OutputStr"
        if ($LASTEXITCODE -ne 0) {
            Write-Log "Winget exited with code $LASTEXITCODE - some packages may have failed." -Level "WARNING"
        }
        Write-Log "--- END: Winget Package Updates [SUCCESS] ---"
        return $true
    }
    catch {
        Write-Log "Winget exception: $($_.Exception.Message)" -Level "ERROR"
        Write-Log "--- END: Winget Package Updates [FAILED] ---" -Level "ERROR"
        return $false
    }
}

function Update-WindowsOS {
    Write-Log "--- BEGIN: Windows OS Updates ---"
    if (-not (Get-Module -ListAvailable -Name PSWindowsUpdate)) {
        Write-Log "PSWindowsUpdate not found - installing..."
        Install-Module -Name PSWindowsUpdate -Force -Scope AllUsers -Confirm:$false
        Write-Log "PSWindowsUpdate installed."
    } else {
        $modVer = (Get-Module -ListAvailable -Name PSWindowsUpdate | Select-Object -First 1).Version
        Write-Log "PSWindowsUpdate already installed (version $modVer)."
    }
    try {
        Write-Log "Importing PSWindowsUpdate..."
        Import-Module PSWindowsUpdate -ErrorAction Stop
        Write-Log "Checking for available updates..."
        $Updates = Get-WindowsUpdate -MicrosoftUpdate -AcceptAll
        Write-Log "Update check complete."
        if ($Updates -and $Updates.Count -gt 0) {
            Write-Log "Found $($Updates.Count) update(s):"
            foreach ($u in $Updates) {
                Write-Log "  KB$($u.KBArticleIDs) | $($u.Title) | $([math]::Round($u.Size/1MB,2)) MB"
            }
            Write-Log "Installing (AutoReboot: $(-not $SkipReboot))..."
            Install-WindowsUpdate -MicrosoftUpdate -AcceptAll -AutoReboot:(-not $SkipReboot)
            Write-Log "--- END: Windows OS Updates [SUCCESS] ---"
        } else {
            Write-Log "No updates available - system is current."
            Write-Log "--- END: Windows OS Updates [SUCCESS - NONE NEEDED] ---"
        }
        return $true
    }
    catch {
        Write-Log "Windows update exception: $($_.Exception.Message)" -Level "ERROR"
        Write-Log "Stack trace: $($_.ScriptStackTrace)" -Level "ERROR"
        Write-Log "--- END: Windows OS Updates [FAILED] ---" -Level "ERROR"
        return $false
    }
}

function Send-DiscordNotification {
    param([bool]$Success, [int]$FailureCount)
    if (-not $WebhookUrl) {
        Write-Log "No webhook configured - skipping Discord notification." -Level "WARNING"
        return
    }
    Write-Log "Building Discord payload..."
    $Color  = if ($Success) { 3066993 } else { 15158332 }
    $Status = if ($Success) { "SUCCESS" } else { "FAILED" }
    $Payload = [PSCustomObject]@{
        embeds = @(
            [PSCustomObject]@{
                title       = "Windows Update Report: $env:COMPUTERNAME"
                description = "Status: $Status`nFailures: $FailureCount`nUser: $env:USERNAME"
                color       = $Color
                timestamp   = (Get-Date).ToUniversalTime().ToString("yyyy-MM-ddTHH:mm:ss.fffZ")
                fields      = @(
                    [PSCustomObject]@{ name = "Log File"; value = $LogFile; inline = $false },
                    [PSCustomObject]@{ name = "Reboot Skipped"; value = "$SkipReboot"; inline = $true }
                )
            }
        )
    } | ConvertTo-Json -Depth 10 -Compress
    try {
        Invoke-RestMethod -Uri $WebhookUrl -Method Post -Body $Payload `
            -ContentType "application/json; charset=utf-8" | Out-Null
        Write-Log "Discord notification sent."
    }
    catch {
        Write-Log "Discord notification failed: $($_.Exception.Message)" -Level "ERROR"
        Write-Log "Response: $($_.ErrorDetails.Message)" -Level "ERROR"
    }
}

Write-Log "=========================================="
Write-Log "===== Windows Update Script Started ====="
Write-Log "=========================================="
Write-Log "Computer  : $env:COMPUTERNAME"
Write-Log "User      : $env:USERNAME"
Write-Log "OS        : $((Get-CimInstance Win32_OperatingSystem).Caption)"
Write-Log "PS Version: $($PSVersionTable.PSVersion)"
Write-Log "Log File  : $LogFile"
Write-Log "SkipReboot: $SkipReboot"
Write-Log "=========================================="

Write-Log "Step 1 of 4: Windows Store Apps"
if (-not (Update-WindowsStoreApps)) {
    $FailureCount++
    Write-Log "Step 1 FAILED. Failure count: $FailureCount" -Level "ERROR"
}

Write-Log "Step 2 of 4: Chocolatey Packages"
if (-not (Update-ChocolateyPackages)) {
    $FailureCount++
    Write-Log "Step 2 FAILED. Failure count: $FailureCount" -Level "ERROR"
}

Write-Log "Step 3 of 4: Winget Packages"
if (-not (Update-WingetPackages)) {
    $FailureCount++
    Write-Log "Step 3 FAILED. Failure count: $FailureCount" -Level "ERROR"
}

Write-Log "Step 4 of 4: Windows OS Updates"
if (-not (Update-WindowsOS)) {
    $FailureCount++
    Write-Log "Step 4 FAILED. Failure count: $FailureCount" -Level "ERROR"
}

$Success = ($FailureCount -eq 0)
Write-Log "=========================================="
Write-Log "===== Update Script Completed ====="
Write-Log "Total Failures : $FailureCount"
Write-Log "Overall Status : $(if ($Success) { 'SUCCESS' } else { 'PARTIAL/FAILED' })"
Write-Log "=========================================="

Send-DiscordNotification -Success $Success -FailureCount $FailureCount

Stop-Transcript

if ($FailureCount -eq 0)     { exit 0 }
elseif ($FailureCount -le 2) { exit 1 }
else                         { exit 2 }

This script performs a comprehensive update sweep across a Windows system, covering:

Windows Store apps
Chocolatey packages
Winget packages
Windows OS updates

All output is captured to a transcript file with timestamps for auditability and runtime tracking. The script handles missing tools gracefully (skip vs fail), fixes the EventLog 32,766-char limit, silently pre-installs NuGet, and sends a Discord webhook summary on completion.

Script: C:\Scripts\Update-System.ps1

Purpose: Comprehensive Windows update across multiple package managers

PowerShell Update Script — Component Breakdown¶

Line / Block	Purpose	Explanation
`$ErrorActionPreference = "Continue"`	Error handling	Script continues if a command fails. Individual functions return `false` to increment `$FailureCount`.
`$LogFile = Join-Path ...`	Log file path	Defines daily‑named log file under `C:\Logs\Windows-Updates\`.
`Start-Transcript / Stop-Transcript`	Full session capture	Records all console output to the log file including command output and errors.
`function Write-Log { ... }`	Timestamped logging	Prefixes every message with timestamp and level tag. Truncates to 30,000 chars before writing to EventLog (hard limit: 32,766).
`Get-PackageProvider (NuGet)`	NuGet bootstrap	Pre‑installs NuGet silently before PSWindowsUpdate install. Prevents interactive Y/N prompt that blocks automated runs.
`function Update-WindowsStoreApps { }`	Store updates	Uses CIM to call `UpdateScanMethod` on the `MDM_EnterpriseModernAppManagement` class. Triggers background Store refresh.
`function Update-ChocolateyPackages { }`	Chocolatey updates	Checks for `choco` in PATH. If found, runs `choco upgrade all -y` and logs full output. Returns `true` on exit 0.
`function Update-WingetPackages { }`	Winget updates	Checks for `winget` in PATH. Runs `winget upgrade --all` with agreements accepted. Non‑zero exit logs WARNING but does not fail.
`function Update-WindowsOS { }`	OS patch management	Installs PSWindowsUpdate if missing. Lists each KB/title/size before installing. Respects `-SkipReboot` flag.
`function Send-DiscordNotification { }`	Webhook alert	Builds Discord embed payload. Uses `ConvertTo-Json -Depth 10 -Compress` to fix nested hashtable serialization (Discord error 50109).
`if ($FailureCount -eq 0) { exit 0 }`	Exit codes	`0 = all passed`, `1 = 1–2 failures (partial)`, `2 = 3+ failures (mostly failed)`. Used by schedulers to detect failure.

Code and Output Examples¶

Write-Log - Timestamped Logging with EventLog Truncation

function Write-Log {
    param([string]$Message, [string]$Level = "INFO")

    $Timestamp = Get-Date -Format "yyyy-MM-dd HH:mm:ss"
    Write-Output "[$Timestamp] [$Level] $Message"

    # Truncate to 30,000 chars - EventLog hard limit is 32,766
    $EventMsg = if ($Message.Length -gt $MaxEventLogMsgLength) {
        $Message.Substring(0, $MaxEventLogMsgLength) + "`n[TRUNCATED - see log file]"
    } else { $Message }

    $EventId = switch ($Level) { "ERROR" { 3 } "WARNING" { 2 } default { 1 } }
    Write-EventLog -LogName Application -Source "WindowsUpdate" \
        -EntryType Information -EventId $EventId -Message $EventMsg \
        -ErrorAction SilentlyContinue
}

Sample Output:

[2026-02-20 07:33:08] [INFO]    ==========================================
[2026-02-20 07:33:08] [INFO]    ===== Windows Update Script Started =====
[2026-02-20 07:33:08] [INFO]    Computer  : OFFICEPC2023
[2026-02-20 07:33:08] [INFO]    User      : pnleo
[2026-02-20 07:33:08] [INFO]    OS        : Windows 11 Pro
[2026-02-20 07:33:08] [INFO]    PS Version: 5.1.26100.7705

Update-WindowsOS - PSWindowsUpdate Module

function Update-WindowsOS {
    Write-Log "--- BEGIN: Windows OS Updates ---"
    if (-not (Get-Module -ListAvailable -Name PSWindowsUpdate)) {
        Write-Log "PSWindowsUpdate not found - installing..."
        Install-Module -Name PSWindowsUpdate -Force -Scope AllUsers -Confirm:$false
    }
    try {
        Import-Module PSWindowsUpdate -ErrorAction Stop
        Write-Log "Checking for available updates..."
        $Updates = Get-WindowsUpdate -MicrosoftUpdate -AcceptAll
        if ($Updates -and $Updates.Count -gt 0) {
            Write-Log "Found $($Updates.Count) update(s):"
            foreach ($u in $Updates) {
                Write-Log "  KB$($u.KBArticleIDs) | $($u.Title) | $([math]::Round($u.Size/1MB,2)) MB"
            }
            Install-WindowsUpdate -MicrosoftUpdate -AcceptAll -AutoReboot:(-not $SkipReboot)
            Write-Log "--- END: Windows OS Updates [SUCCESS] ---"
        } else {
            Write-Log "No updates available - system is current."
        }
        return $true
    }
    catch {
        Write-Log $_.Exception.Message -Level "ERROR"
        return $false
    }
}

Sample Output:

[2026-02-20 07:34:56] [INFO]    Step 4 of 4: Windows OS Updates
[2026-02-20 07:34:56] [INFO]    --- BEGIN: Windows OS Updates ---
[2026-02-20 07:34:56] [INFO]    PSWindowsUpdate already installed (version 2.2.1.4).
[2026-02-20 07:34:56] [INFO]    Importing PSWindowsUpdate...
[2026-02-20 07:34:58] [INFO]    Checking for available updates...
[2026-02-20 07:36:10] [INFO]    Found 2 update(s):
[2026-02-20 07:36:10] [INFO]      KB5034441 | 2026-02 Cumulative Update for Windows 11 | 312.45 MB
[2026-02-20 07:36:10] [INFO]      KB890830  | Windows Malicious Software Removal Tool  | 4.12 MB
[2026-02-20 07:36:10] [INFO]    Installing (AutoReboot: False)...
[2026-02-20 08:11:32] [INFO]    --- END: Windows OS Updates [SUCCESS] ---

6.2 Cron Job Scheduling Strategy¶

Cron is used extensively across Linux systems to schedule automated tasks at defined intervals, ensuring that maintenance activities, monitoring checks, and data collection occur reliably without manual intervention. Typical cron jobs include nightly backup verification scripts, hourly certificate expiration checks, daily vulnerability scan initiation, periodic log archival and rotation, and regular system health assessments. Cron jobs are documented with inline comments explaining purpose, frequency, and dependencies, and critical jobs send success/failure notifications to the centralized Discord alerting system to ensure failures are promptly identified and addressed.

Webserver Crontab

Minute	Hour	Day	Month	Weekday	Command	Description	Last Run Timestamp
0	2	*	*	5	/usr/local/bin/upgrade.sh	Weekly system upgrade	Fri, Nov 1, 2025 02:00 AM
30	1	*	*	6	/usr/local/bin/backup_web.sh /var/www/lab /backup	Weekly web backup via rsync	Sat, Nov 2, 2025 01:30 AM

The first job runs every Friday at 2:00 AM and executes upgrade.sh with no arguments
The second job runs every Saturday at 1:30 AM and executes backup_web.sh with two arguments: /var/www/lab (Source directory) and /backup (Target directory)

6.3 Python Scripting for Advanced Automation¶

Python scripts are deployed where more complex logic, data processing, or API interaction is required. Python's extensive library ecosystem makes it ideal for tasks like parsing and correlating log data, interacting with REST APIs for service configuration, processing vulnerability scan results, generating custom reports from multiple data sources, and implementing custom security tools. Python's cross-platform nature allows scripts to run consistently across Windows and Linux systems, and virtual environments ensure dependency isolation and reproducibility.

Python Script Example: Network Scanner¶

import sys 
import socket
from datetime import datetime
import threading
import platform
import subprocess

# Load common ports from external file
def load_common_ports(filename='common_ports.txt'):
    """
    Load port mappings from a text file
    Format: port:service_name (one per line)
    """
    ports = {}
    try:
        with open(filename, 'r') as f:
            for line in f:
                line = line.strip()
                if line and not line.startswith('#'):  # Skip empty lines and comments
                    try:
                        port, service = line.split(':', 1)
                        # Strip whitespace and remove quotes/commas
                        service = service.strip().strip('"').strip("'").rstrip(',')
                        ports[int(port)] = service
                    except ValueError:
                        print(f'Warning: Skipping malformed line: {line}')
        return ports
    except FileNotFoundError:
        print(f'Warning: {filename} not found. Using empty port dictionary.')
        return {}
    except Exception as e:
        print(f'Error loading port file: {e}')
        return {}

# Load the common ports dictionary
COMMON_PORTS = load_common_ports()

# Global verbose flag
verbose = False

def ping_host(target_ip):
    """
    Ping the host to check if it's reachable before scanning
    """
    param = '-n' if platform.system().lower() == 'windows' else '-c'
    command = ['ping', param, '1', target_ip]

    try:
        result = subprocess.run(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, timeout=5)
        return result.returncode == 0
    except Exception as e:
        print(f'Ping test failed: {e}')
        return False

def scan_port(target, port):
    """
    Function to scan a single port
    """
    try: 
        if verbose:
            print(f'Scanning port {port}...')

        s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        s.settimeout(1)
        result = s.connect_ex((target, port))

        if result == 0:
            service_info = COMMON_PORTS.get(port, "Unknown Service")
            print(f"Port {port} is open - {service_info}")

        s.close()
    except socket.error as e:
        print(f'Socket error on port {port}: {e}')
    except Exception as e:
        print(f'Unexpected error on port {port}: {e}')

def main():
    global verbose

    # Parse arguments for verbose flag
    args = sys.argv[1:]

    if len(args) < 1 or len(args) > 2:
        print("Invalid number of arguments.") 
        print("Usage: python network_scanner.py <target> [-v|--verbose]") 
        sys.exit(1)

    target = args[0]

    # Check for verbose flag
    if len(args) == 2 and args[1] in ['-v', '--verbose']:
        verbose = True
        print("Verbose mode enabled")

    # Resolve the target hostname to an IP address
    try:
        target_ip = socket.gethostbyname(target) 
    except socket.gaierror:
        print(f'Error: Unable to resolve hostname {target}')
        sys.exit(1)

    # Ping test before scanning
    print("-" * 50)
    print(f'Running ping test on {target_ip}...')
    if ping_host(target_ip):
        print(f'Host {target_ip} is reachable!')
    else:
        print(f'Warning: Host {target_ip} may be unreachable or blocking ICMP')
        response = input('Continue with scan anyway? (y/n): ')
        if response.lower() != 'y':
            print("Scan cancelled.")
            sys.exit(0)

    # Add a banner
    print("-" * 50)
    print(f'Scanning target {target_ip}')
    print(f'Time started: {datetime.now()}')
    print("-" * 50)            

    try:
        # Use multithreading to scan ports concurrently
        threads = []
        for port in range(1, 65536):
            thread = threading.Thread(target=scan_port, args=(target_ip, port))
            threads.append(thread)
            thread.start()

        # Wait for threads to complete
        for thread in threads:
            thread.join()

    except KeyboardInterrupt:
        print("\nExiting program.")
        sys.exit(0)

    except socket.error as e:
        print(f'Socket error: {e}')
        sys.exit(1)

    print("\nScan completed!")  
    print(f'Time finished: {datetime.now()}')

if __name__ == "__main__":
    main()

Common Ports txt example:

20:FTP Data
21:FTP Control
22:SSH
53:DNS/Pi-hole/Bind
67:DHCP Server
68:DHCP Client
80:HTTP
443:HTTPS
445:SMB
2375:Docker Daemon (insecure/TCP)
2376:Docker Daemon (Secure/TLS)
4444:Metasploit
5335:Unbound
5432:PostgreSQL
6379:Redis
7655:Pulse
8001:Elastic Agent
8002:Elastic Agent
8006:Proxmox - PVE
8007:Proxmox - PBS
9000:Authentik/PHP/Netdata
9443:Authentik/Portainer
9090:Prometheus
9200:Elasticsearch
9093:Alert Manager
9094:Alert Manager - Discord
12320:Ansible
12321:Ansible
9115:Blackbox
5000:Checkmk
5050:Checkmk
6060:CrowdSec
8220:ELK Fleet
7990:Heimdall
5601:Kibana
5678:n8n
9392:OpenVAS
5055:Overseer
9617:Pi-Hole Exporter
9001:Portainer Agent
9221:Prometheus PVE Exporter
6443:K3s
3001:Uptime Kuma
1514:Wazuh
1515:Wazuh

A custom Python-based network scanner has been developed to provide tailored reconnaissance capabilities specific to the lab environment. The scanner performs several functions:

Host Reachability Testing — Pings target hosts before attempting port scans to verify they are online, reducing wasted scan time and providing quick network inventory validation
Port Scanning — Uses multi-threaded socket connections to rapidly identify open TCP ports across the full port range (1-65535) or targeted subsets based on scan objectives
Service Identification — Labels discovered open ports using a custom service mapping file (common_ports.txt) that includes both standard services (SSH, HTTP, HTTPS) and lab-specific applications (Proxmox, Traefik, Authentik). This makes scan results immediately actionable by identifying what application is likely running on each open port

6.4 Script Integration and Orchestration¶

Scripts are integrated into broader workflows through several mechanisms. Some scripts are triggered directly by cron jobs for time-based execution. Others are called by monitoring systems in response to alerts or threshold violations (for example, a Grafana alert triggering a remediation script). Ansible playbooks orchestrate multi-step automation by calling shell scripts and Python utilities in sequence, passing parameters and handling error conditions. This layered approach combines the flexibility of custom scripts with the orchestration capabilities of configuration management tools, enabling sophisticated automation scenarios while maintaining maintainability and reusability.

7. Automation Security Controls¶

Control Framework¶

Control Domain	Implementation	Coverage
Secrets Management	Ansible Vault + Vaultwarden	All credentials encrypted
Access Control	SSH keys only; sudo with NOPASSWD	Ansible service account
Code Integrity	Git version control with commit signing	All IaC assets
Privilege Escalation	Least privilege Proxmox service account	Terraform API access
Input Validation	Regex validation in scripts	Prevents injection attacks
Audit Logging	Syslog + Git commits + n8n execution logs	Full audit trail
Change Management	Git	Controlled deployments
Backup & Recovery	GitHub + NAS + PBS	Multiple restore points

Authentication & Authorization¶

Technology	Authentication Method	Authorization Scope
Terraform	Proxmox API token (non-password)	VM/LXC provisioning only
Ansible	SSH key (ed25519; passphrase)	sudo NOPASSWD on managed hosts
n8n	Authentik SSO (OAuth2)	Workflow admin access
GitHub	Personal access token (PAT)	Repository push/pull
Scripts	User context (ansible/root)	File system and service control

Secrets Protection¶

Secret Type	Storage Location	Encryption Method
Terraform API tokens	terraform.tfvars (gitignored)	File permissions 0600
Ansible passwords	ansible-vault encrypted files	AES-256 with master key
n8n credentials	Vaultwarden database	Application-level encryption
SSH private keys	~/.ssh/ with 0600 perms	Passphrase-protected
Script API tokens	Environment variables	Not persisted to disk

Audit Trail Components¶

Event Type	Logging Mechanism	Retention
Git Commits	GitHub repository	Indefinite
Terraform Apply	Local state file + stdout log	90 days
Ansible Playbook Runs	Ansible log + syslog	90 days
n8n Workflow Executions	PostgreSQL + execution history	30 days
Script Executions	Syslog + individual log files	30 days
Cron Job Runs	/var/log/cron + job-specific logs	30 days

9. Practical Use Cases and Workflows¶

Scenario 1: New VM Provisioning¶

Objective: Deploy new Docker host in <5 minutes

Workflow:

Developer updates Terraform variables:
hostname: docker-vm-03
vmid: 203
Run Terraform:
cd terraform/vm
terraform apply -var="hostname=docker-vm-03" -var="vmid=203" -auto-approve
Terraform clones ubuntu-cloud template, provisions VM
Output displays IP address: 192.168.100.203
Run Ansible bootstrap:
ansible-playbook -i hosts.yml new_install.yaml --limit docker-vm-03.home.com
Ansible configures SSH, creates ansible user, installs packages
VM ready for application deployment

Result: Consistent, reproducible VM provisioning with full audit trail

Scenario 2: Weekly Security Update Cycle¶

Objective: Keep all systems patched without manual intervention

Workflow:

Saturday 2 AM: n8n workflow triggers
n8n executes SSH command on Ansible controller
Ansible playbook runs across all Linux hosts:
Debian/Ubuntu: apt update && apt upgrade -y
RHEL/Fedora: dnf update -y
Playbook output converted to JSON
JSON uploaded to GitHub (audit trail)
Discord notification sent with summary:
23/25 hosts updated successfully
2 hosts require reboot
Admin reviews notification and schedules reboots if needed

Result: Automated patch management with centralized reporting and alerting

Scenario 3: Rapid Disaster Recovery¶

Objective: Rebuild compromised VM from code

Workflow:

Incident detected: VM compromised via vulnerability
Admin destroys compromised VM:
terraform destroy -var="vmid=205"
Review Git history to find last known-good configuration:
git log terraform/vm/main.tf
Checkout previous commit if needed:
git checkout abc123def terraform/vm/main.tf
Rebuild VM with Terraform:
terraform apply -var="hostname=web-vm-01" -var="vmid=205"
Re-run Ansible playbooks to restore configuration:
ansible-playbook -i hosts.yml site.yml --limit web-vm-01.home.com
Restore application data from NAS backup
VM back online with clean configuration

Result: Complete rebuild in <30 minutes vs. hours of manual work

Scenario 4: Threat Intelligence Distribution¶

Objective: Daily security awareness for lab operations

Workflow:

Daily 8 AM: n8n workflow executes
RSS feeds polled from multiple cybersecurity sources
Each feed limited to 5 most recent entries
NIST feed sent to ChatGPT for AI summarization
Aggregated digest posted to Discord #threat-intel channel

Result: Consolidated threat intelligence reduces information overload

Scenario 5: Configuration Drift Detection¶

Objective: Ensure compliance with security baseline

Workflow:

Weekly: n8n triggers Ansible audit playbook
Playbook gathers configuration from all hosts:
SSH config settings
Firewall rules
Installed packages
User accounts
DNS configuration
Output compared against baseline in Git
Drift detected on 2 hosts (unauthorized package installed)
Discord alert sent with details

Result: Proactive detection of configuration drift and unauthorized changes

Automation and Infrastructure as Code (IaC)¶

1. Automation and Infrastructure as Code (IaC)¶

Architecture Overview¶

Architecture Principles Alignment¶

Strategic Value¶

2. Infrastructure Provisioning with Terraform¶

Architecture Overview¶

Deployment Architecture¶

Authentication & Authorization¶

Security Controls¶

Terraform Configuration Deep Dive¶

LXC Container Provisioning (main.tf)¶

VM Provisioning (main.tf - QEMU)¶

Terraform Plan Output¶

Terraform Workflow¶

State Management Strategy¶

3. Configuration Management with Ansible¶

3.1 Architecture Overview¶

3.2 Setup Overview¶

Control Plane Configuration¶

3.3 Inventory and Variable Structure¶

3.4 Core Playbooks - Detailed Overview¶

Playbook 1: Multi-Platform System Audit (sys_audit_n8n.yml)¶

Playbook 2: User Management (user_mgmt.yml)¶

Playbook 3: Linux Package Updates (update_linux_hosts.yml)¶

Playbook 4: Linux Hardening (linux_hardening.yml)¶

Playbook 5: New Install Baseline (new_install_baseline_roles.yml)¶

4. Version Control and GitOps¶

Version Control Strategy¶

5. Workflow Automation with n8n¶

5.1 Platform Overview¶

n8n Configuration¶

Security Controls¶

5.2 Workflow 1: Lab Infrastructure Audit¶

Workflow Nodes¶

5.3 Workflow 2: Threat Intelligence Aggregation¶

Workflow Nodes¶

5.4 Workflow 3: Weekly Package Updates with Alerting¶

Workflow Nodes¶

5.5 Workflow 4: Monthly Automated Ansible Token Rotation¶

Planned Workflow Nodes¶

5.6 Best Practices and Lessons Learned¶

6. Scripting for Advanced Automation¶

Script Development Strategy¶

Script Language Selection Criteria¶

6.1 PowerShell and Bash Scripting¶

Bash Script Example: Backup Automation¶

Linux Upgrade Script Overview¶

PowerShell Script Example: Windows Update Automation¶

PowerShell Update Script — Component Breakdown¶

Code and Output Examples¶

6.2 Cron Job Scheduling Strategy¶

6.3 Python Scripting for Advanced Automation¶

Python Script Example: Network Scanner¶

6.4 Script Integration and Orchestration¶

7. Automation Security Controls¶

Control Framework¶

Authentication & Authorization¶

Secrets Protection¶

Audit Trail Components¶

9. Practical Use Cases and Workflows¶

Scenario 1: New VM Provisioning¶

Scenario 2: Weekly Security Update Cycle¶

Scenario 3: Rapid Disaster Recovery¶

Scenario 4: Threat Intelligence Distribution¶

Scenario 5: Configuration Drift Detection¶

11. Security Homelab Section Links¶