#gdpr - freeCodeCamp.org

GDPR Article 32 for Software Engineers: Technical Controls, Implementations, and Auditor Questions

Ayobami Adejumo — Thu, 28 May 2026 16:20:25 +0000

When I first read GDPR Article 32, I made a mistake. I thought it was a legal document.

But it's not. It's an infrastructure specification.

The regulation says you need "appropriate technical measures" to protect personal data. That phrase is terrifying because it's vague. What does "appropriate" mean? What counts as a "technical measure"? Who decides whether you've done enough?

The compliance consultant will give you a 50-page policy document. The auditor will ignore it and ask for your database schema.

This guide is the middle ground. I've implemented Article 32 controls for 12 SaaS companies. The same nine controls appear every time. The same three auditor questions appear every time.

This is a complete guide to the 9 technical controls you must implement, the exact code and commands for each, and the questions your GDPR auditor will ask.

What You'll Learn
Prerequisites
Part 1: Understanding Article 32
Part 2: Article 32(1)(a) — Pseudonymisation and Encryption
Part 3: Article 32(1)(b) — Confidentiality and Integrity
Part 4: Article 32(1)(c) — Availability and Resilience
Part 5: Article 32(1)(d) — Regular Testing
Part 6: Penetration Testing
Best Practices Summary
What's Next
Resources

What You'll Learn

The 9 technical controls required by GDPR Article 32(1)(a) through (d)
Exact PostgreSQL commands for pseudonymisation and field-level encryption
How to implement automatic logoff and unique user identification
Application-level audit logging that goes beyond CloudTrail
Integrity controls that prove data has not been altered
mTLS and TLS 1.3 for transmission security
The 5 auditor questions you must answer with evidence

Let's dive in.

Prerequisites

Before following along, you should have:

Knowledge:

Familiarity with PostgreSQL and basic SQL
Basic understanding of AWS services (KMS, RDS, CloudTrail)
Comfort reading Python and JavaScript/Node.js code
A working knowledge of what GDPR is — if you are starting from scratch, read the ICO's GDPR overview first

Tools and access:

PostgreSQL 14 or later
An AWS account with IAM administrator access
Python 3.8 or later with cryptography library (pip install cryptography)
Node.js 16 or later
A compliance automation tool — Vanta or OneTrust — is optional but recommended for evidence collection

Estimated time: The controls in this guide take 2–4 weeks to implement fully, depending on your existing infrastructure. Individual controls range from 30 minutes (KMS key setup) to 5 days (full application-layer encryption rollout).

Part 1: Understanding Article 32 — The Technical Requirements

1.1. What Article 32 Actually Requires

Article 32 of the GDPR is titled "Security of processing." It requires controllers and processors to implement "appropriate technical and organisational measures" to ensure a level of security appropriate to the risk.

Here is the important distinction most teams miss: Article 32 is not a checklist of policies. A policy says "we encrypt personal data." Evidence says "here is the KMS key with automatic rotation, here is the application-layer encryption code, and here are the CloudTrail logs showing every decryption attempt." The auditor wants evidence, not documentation.

The four main requirements:

Section	Requirement	What It Means for Engineers
32(1)(a)	Pseudonymisation and encryption	Personal data must be stored so it cannot be attributed to a specific data subject without additional information held separately
32(1)(b)	Confidentiality, integrity, availability, and resilience	Systems must protect data from unauthorised access, alteration, loss, and be able to recover from incidents
32(1)(c)	Restoring availability and access	You must be able to restore data and regain system access after a physical or technical incident
32(1)(d)	Regular testing and risk assessment	You must have a process for regularly testing and evaluating your security measures

1.2. The Scope Question: What Data Is Covered?

Before implementing any controls, you must know what data falls under Article 32. The regulation applies to personal data — any information that can identify a living individual directly or indirectly.

Data types and their protection levels:

Category	Examples	Protection Level
Personal data	Name, email, phone, IP address	Standard
Sensitive personal data	Health data, biometric data, political opinions, religious beliefs	Enhanced
Pseudonymised data	Data where direct identifiers are replaced with a code	Standard
Anonymised data	Data that cannot be re-identified under any reasonable circumstances	Out of scope

The data mapping question your auditor will ask:

"Can you provide a data flow diagram showing where personal data enters your system, where it is stored, where it is processed, and how it is deleted?"

Before the auditor asks, run this command to document all databases storing personal data in your AWS environment:

# List all RDS instances with their encryption status
# Any StorageEncrypted: false is a finding
aws rds describe-db-instances \
  --query 'DBInstances[*].{
    ID:DBInstanceIdentifier,
    Engine:Engine,
    StorageEncrypted:StorageEncrypted,
    Region:AvailabilityZone
  }' \
  --output table

Any instance showing StorageEncrypted: false must be addressed before your Article 32 audit.

Part 2: Article 32(1)(a) — Pseudonymisation and Encryption

2.1. How to Implement Pseudonymisation at the Database Layer

Pseudonymisation replaces direct identifiers — names, email addresses, passport numbers — with a pseudonym or code. The goal is that the main working dataset cannot identify a data subject without access to a separately stored, separately protected lookup table.

Here is the incorrect approach — direct identifiers in plaintext:

-- Bad: Direct identifiers stored in the main working table
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    full_name VARCHAR(255),       -- Direct identifier — should not be here
    email VARCHAR(255),           -- Direct identifier — should not be here
    passport_number VARCHAR(50)   -- Direct identifier — should not be here
);

This approach means any engineer, analyst, or attacker with SELECT access to the users table can immediately read and identify individuals. There is no separation between working data and identifying data.

Here is the correct implementation with a separate identifiers table:

-- Good: Pseudonymised main table with a separate, restricted lookup table

-- Step 1: Main working table uses only the pseudonym
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    pseudonym UUID DEFAULT gen_random_uuid(),  -- Non-guessable pseudonym
    created_at TIMESTAMP DEFAULT NOW(),
    account_status VARCHAR(50)
    -- No direct identifiers here
);

-- Step 2: Identifier lookup table — kept separate, access restricted
CREATE TABLE user_identifiers (
    pseudonym UUID PRIMARY KEY,
    full_name VARCHAR(255),
    email VARCHAR(255),
    passport_number VARCHAR(50),
    FOREIGN KEY (pseudonym) REFERENCES users(pseudonym)
);

-- Step 3: Grant minimal, role-based access
GRANT SELECT ON users TO app_role;                              -- Application uses pseudonym only
GRANT SELECT, INSERT, UPDATE ON user_identifiers TO identity_service_role;  -- Only the identity service sees names

What each part does:

gen_random_uuid() creates a version-4 UUID pseudonym for each user — unpredictable and not reversible without the lookup table
The main users table is safe for analytics, reporting, and general application use without exposing any identifying information
Only the identity_service_role can join the two tables — this role is assigned only to the specific service that handles identity operations

The auditor question you will receive:

"How do you ensure that pseudonymised data cannot be re-identified by an unauthorised party?"

Your evidence:

-- Show that only the identity service role has access to the identifiers table
SELECT grantee, privilege_type, table_name
FROM information_schema.role_table_grants
WHERE table_name = 'user_identifiers';

-- Expected output: only identity_service_role listed

2.2. How to Implement Encryption at Rest with Customer-Managed Keys

Storage-layer encryption protects data if someone physically steals the disk. But it does not protect against a privileged AWS employee, a compromised cloud administrator, or an authorised user with direct database access. Article 32 auditors know this distinction — and they will ask about it.

Here is the incorrect approach — AWS-managed keys:

# Bad: AWS-managed KMS key
# You do not control who at AWS can access the key material
aws kms create-key \
  --origin AWS_KMS \
  --description "AWS managed key for production"

The problem: when the auditor asks "can you prove that AWS employees cannot decrypt your customer data?", the answer is no. AWS-managed keys are managed by AWS.

Here is the correct implementation — customer-managed key with automatic rotation:

# Step 1: Create a customer-managed KMS key
KEY_ID=$(aws kms create-key \
  --origin AWS_KMS \
  --description "Customer-managed key for production PII — Article 32 compliant" \
  --tags TagKey=Purpose,TagValue=GDPR TagKey=Environment,TagValue=production \
  --query 'KeyMetadata.KeyId' \
  --output text)

echo "Created KMS key: $KEY_ID"

# Step 2: Enable automatic 90-day rotation
aws kms enable-key-rotation --key-id $KEY_ID

# Step 3: Apply to your production RDS instance
aws rds modify-db-instance \
  --db-instance-identifier production-db \
  --kms-key-id $KEY_ID \
  --apply-immediately

The auditor question:

"Show me that your encryption keys are rotated automatically and that you can prove who has accessed them."

Your evidence:

# Verify rotation is enabled — expected output: true
aws kms get-key-rotation-status --key-id $KEY_ID \
  --query 'KeyRotationEnabled'

# Show the CloudTrail audit trail of every key usage event
aws logs filter-log-events \
  --log-group-name cloudtrail-logs \
  --filter-pattern '{ $.eventSource = "kms.amazonaws.com" }' \
  --query 'events[*].{Time:timestamp,Event:message}' \
  --output table

2.3. How to Implement Application-Layer Encryption for Sensitive Fields

Storage encryption is the floor. Application-layer encryption is the ceiling that Article 32 auditors are increasingly expecting for health data, financial records, and other sensitive personal data.

Here is the difference: with storage encryption only, a database administrator who runs SELECT email FROM users sees the plaintext email address. With application-layer encryption, they see gAAAAABm... — an encrypted byte string that only the application (with access to the Vault key) can decrypt.

# application_encryption.py
from cryptography.fernet import Fernet

class FieldEncryption:
    """
    Encrypts sensitive personal data fields before they are stored in the database.
    The encryption key is stored in HashiCorp Vault or AWS Secrets Manager — never in code.
    A database administrator with direct SQL access sees only encrypted bytes.
    """

    def __init__(self, key: str):
        # key must be a 32-byte base64-encoded string — retrieve from Vault
        self.cipher = Fernet(key.encode())

    def encrypt_field(self, plaintext: str) -> str:
        """Encrypt a sensitive field before writing to the database."""
        if not plaintext:
            return None
        encrypted_bytes = self.cipher.encrypt(plaintext.encode())
        return encrypted_bytes.decode()

    def decrypt_field(self, ciphertext: str) -> str:
        """
        Decrypt a field when legitimately needed by the application.
        This method requires the Vault key — database admins cannot call it.
        """
        if not ciphertext:
            return None
        decrypted_bytes = self.cipher.decrypt(ciphertext.encode())
        return decrypted_bytes.decode()


# Usage in your application:
from vault_client import get_secret  # Your Vault or Secrets Manager client

# Retrieve the encryption key at application startup — never hardcode it
encryption_key = get_secret("gdpr/field-encryption-key")
encryptor = FieldEncryption(encryption_key)

# Before storing a user's health record
user.health_data_encrypted = encryptor.encrypt_field(user.health_data_plaintext)

# Before reading for a legitimate purpose (subject access request, etc.)
health_data = encryptor.decrypt_field(user.health_data_encrypted)

The auditor question:

"If a database administrator queries the users table directly, can they read customer health data in plaintext?"

Your evidence: Run a direct database query and show the auditor the encrypted output. Then demonstrate that the decryption key is not accessible to database administrators — it is retrieved only by the application through Vault.

Part 3: Article 32(1)(b) — Confidentiality and Integrity

3.1. How to Implement Automatic Logoff

Article 32(1)(b) requires protection against "unauthorised access to personal data." A session that never expires — or expires after 24 hours — is an access control gap. A user who logs in on a shared machine and walks away has left an open door.

Here is the incorrect approach — a 24-hour JWT session:

// Bad: 24-hour access token with no inactivity check
const token = jwt.sign(
  { userId: user.id, role: user.role },
  process.env.JWT_SECRET,
  { expiresIn: '24h' }  // Too long — violates Article 32 intent
);

The problem: if a user logs in on a shared computer and closes the laptop without logging out, the session remains valid for up to 24 hours. Anyone who opens that laptop can access personal data.

Here is the correct implementation — a 15-minute access token with a rolling refresh:

// Good: Short-lived access token with rolling refresh via HTTP-only cookie

// Access token — valid for 15 minutes of activity
const accessToken = jwt.sign(
  { userId: user.id, role: user.role, type: 'access' },
  process.env.JWT_ACCESS_SECRET,
  { expiresIn: '15m' }
);

// Refresh token — valid for 8 hours total session duration
const refreshToken = jwt.sign(
  { userId: user.id, type: 'refresh' },
  process.env.JWT_REFRESH_SECRET,
  { expiresIn: '8h' }
);

// Set refresh token as HTTP-only cookie — not accessible to JavaScript
res.cookie('refreshToken', refreshToken, {
  httpOnly: true,    // Prevents XSS access
  secure: true,      // HTTPS only
  sameSite: 'strict', // Prevents CSRF
  maxAge: 8 * 60 * 60 * 1000  // 8 hours in milliseconds
});

// Session middleware that enforces absolute timeout
const MAX_TOTAL_SESSION_MS = 8 * 60 * 60 * 1000; // 8 hours

app.use((req, res, next) => {
  if (!req.session?.createdAt) return next();

  const sessionAge = Date.now() - req.session.createdAt;
  if (sessionAge > MAX_TOTAL_SESSION_MS) {
    req.session.destroy();
    return res.status(401).json({
      error: 'Session expired after 8 hours. Please log in again.'
    });
  }
  next();
});

The auditor question:

"Show me that your application terminates inactive sessions after a reasonable period."

Your evidence: A browser developer tools screenshot showing the cookie expiration time, plus a test recording showing that after 15 minutes of inactivity the user is presented with a re-authentication prompt.

3.2. How to Implement Unique User Identification with IRSA

Article 32(1)(b) requires that you can identify who accessed personal data. Shared service accounts make this impossible — the audit log shows data-export-service but you cannot tell which engineer triggered the export.

Here is the incorrect approach — a shared service account:

# Bad: One shared Kubernetes service account used by multiple engineers and pipelines
apiVersion: v1
kind: ServiceAccount
metadata:
  name: data-export           # Three engineers and two pipelines share this identity
  namespace: production

When an audit log shows data-export performed a bulk user export at 03:17 UTC, you cannot answer the auditor's question: "who authorised this?"

Here is the correct implementation — IAM Roles for Service Accounts (IRSA):

# Step 1: Create a separate IAM role for each service identity
# This command creates a role that can only be assumed by the 'payment-service'
# Kubernetes service account in the 'production' namespace

aws iam create-role \
  --role-name eks-payment-service-role \
  --assume-role-policy-document '{
    "Version": "2012-10-17",
    "Statement": [{
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::123456789012:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/YOUR_OIDC_ID"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "oidc.eks.us-east-1.amazonaws.com/id/YOUR_OIDC_ID:sub":
            "system:serviceaccount:production:payment-service"
        }
      }
    }]
  }'

# Step 2: Annotate the Kubernetes service account with its unique IAM role
apiVersion: v1
kind: ServiceAccount
metadata:
  name: payment-service          # One service account, one service, one role
  namespace: production
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/eks-payment-service-role

Every AWS API call from payment-service now appears in CloudTrail as eks-payment-service-role — a unique, traceable identity. No shared accounts. No ambiguous audit logs.

The auditor question:

"How do you ensure that every action on personal data can be attributed to a specific individual or service?"

Your evidence:

# Verify no shared service accounts exist — every account should have a unique role annotation
kubectl get serviceaccounts --all-namespaces \
  -o jsonpath='{range .items[*]}{.metadata.namespace}/{.metadata.name}: {.metadata.annotations.eks\.amazonaws\.com/role-arn}{"\n"}{end}'

Part 4: Article 32(1)(c) — Availability and Resilience

4.1. How to Implement Multi-AZ and Backup Requirements

Article 32(1)(c) requires "the ability to restore the availability and access to personal data in a timely manner in the event of a physical or technical incident." This is not a suggestion — it is a legal requirement. If your database is in a single Availability Zone and that AZ experiences a networking event, you are in violation.

Here is the incorrect approach — single-AZ RDS with no automated backups:

# Bad: Single-AZ RDS — one networking event makes personal data unavailable
resource "aws_db_instance" "production" {
  identifier              = "production-database"
  multi_az                = false   # No automatic failover
  backup_retention_period = 0       # No automated backups — Article 32 violation
}

If the Availability Zone has a networking issue, the database is unreachable. If the instance is corrupted, there are no backups to restore. Both scenarios violate Article 32(1)(c).

Here is the correct implementation — Multi-AZ with tested automated backups:

# Good: Multi-AZ RDS with 30-day backup retention
resource "aws_db_instance" "production" {
  identifier = "production-database"

  # Multi-AZ creates a synchronous standby replica in a different AZ
  # Automatic failover completes in 60-120 seconds with no data loss
  multi_az = true

  # 30-day backup retention — gives you recovery point flexibility
  backup_retention_period = 30
  backup_window           = "03:00-04:00"  # Low-traffic window for backup

  # Copy all tags to snapshots for compliance tracking
  copy_tags_to_snapshot = true

  # Performance Insights for monitoring query health
  performance_insights_enabled          = true
  performance_insights_retention_period = 7

  tags = {
    Environment       = "production"
    DataClassification = "personal-data"
    GDPRScope         = "article32"
  }
}

How to test your RTO and RPO monthly:

# Step 1: Find your most recent automated snapshot
SNAPSHOT_ID=$(aws rds describe-db-snapshots \
  --db-instance-identifier production-database \
  --snapshot-type automated \
  --query 'sort_by(DBSnapshots, &SnapshotCreateTime)[-1].DBSnapshotIdentifier' \
  --output text)

echo "Testing restore of snapshot: $SNAPSHOT_ID"

# Step 2: Start the restore — measure the time
START_TIME=$(date +%s)

aws rds restore-db-instance-from-db-snapshot \
  --db-instance-identifier gdpr-restore-test \
  --db-snapshot-identifier $SNAPSHOT_ID \
  --db-instance-class db.t3.medium \
  --no-publicly-accessible \
  --tags Key=Purpose,Value=gdpr-rto-test Key=DeleteAfter,Value=$(date -d '+1 day' +%Y-%m-%d)

# Step 3: Wait for restore to complete
aws rds wait db-instance-available \
  --db-instance-identifier gdpr-restore-test

END_TIME=$(date +%s)
RTO_SECONDS=$((END_TIME - START_TIME))
echo "Restore completed in $((RTO_SECONDS / 60)) minutes"

# Step 4: Verify data integrity with a spot check
# Connect to the restored instance and verify record counts match production
# psql -h RESTORED_ENDPOINT -U admin -d production \
#   -c "SELECT COUNT(*) FROM users; SELECT MAX(created_at) FROM orders;"

# Step 5: Delete the test instance
aws rds delete-db-instance \
  --db-instance-identifier gdpr-restore-test \
  --skip-final-snapshot

The auditor question:

"What is your Recovery Time Objective and Recovery Point Objective for personal data? When did you last test it?"

Your evidence: A documented monthly DR test log showing: snapshot used, restore start time, restore completion time, data verification query results, and the engineer who conducted the test.

Part 5: Article 32(1)(d) — Regular Testing

5.1. How to Implement Automated Vulnerability Scanning

Article 32(1)(d) requires "a process for regularly testing, assessing and evaluating the effectiveness of technical and organisational measures." This includes automated vulnerability scanning of every container image before it reaches production.

Here is the incorrect approach — no scanning in the deployment pipeline:

# Bad: No vulnerability scanning — a critical CVE in the base image deploys undetected
name: Deploy
on: [push]
jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - run: docker build -t myapp .
      - run: docker push myapp  # Deploys without any security check

If a critical CVE is present in the base image (such as a remote code execution vulnerability in OpenSSL), it goes straight to production. Under Article 32(1)(d), this is a finding.

Here is the correct implementation — Trivy scanning with pipeline enforcement:

# Good: Trivy scans every image — CRITICAL/HIGH CVEs block the deployment
name: Security Scan and Deploy
on: [push, pull_request]

jobs:
  trivy-scan:
    name: Container Vulnerability Scan
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Build container image
        run: docker build -t myapp:${{ github.sha }} .

      - name: Scan for vulnerabilities with Trivy
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: 'myapp:${{ github.sha }}'
          format: 'sarif'
          output: 'trivy-results.sarif'
          severity: 'CRITICAL,HIGH'
          exit-code: '1'         # Fail the pipeline — image cannot deploy with CRITICAL/HIGH CVEs

      - name: Upload scan results to GitHub Security tab
        uses: github/codeql-action/upload-sarif@v2
        if: always()             # Upload results even if scan failed, for review
        with:
          sarif_file: 'trivy-results.sarif'

Trivy scans for:

CVEs in the base image OS packages (for example, a critical OpenSSL vulnerability in your Ubuntu base)
Vulnerable versions of application dependencies (a known exploit in an npm or pip package your application uses)
Misconfigurations in the Dockerfile (running as root, using latest tag instead of a pinned SHA)

Results appear in the GitHub Security tab, creating a timestamped, searchable history of every scan. That history is your Article 32(1)(d) evidence.

How to run a weekly AWS Inspector assessment for running workloads:

# List all active CRITICAL findings across your AWS account
aws inspector2 list-findings \
  --filter-criteria '{
    "severity": [{"comparison": "EQUALS", "value": "CRITICAL"}],
    "findingStatus": [{"comparison": "EQUALS", "value": "ACTIVE"}]
  }' \
  --query 'findings[*].{
    Title:title,
    Resource:resources[0].id,
    Severity:severity,
    CVE:packageVulnerabilityDetails.vulnerabilityId
  }' \
  --output table

The auditor question:

"Show me your vulnerability management programme, including how you prioritise and remediate findings."

Your evidence: A weekly vulnerability report — generated automatically from the above command — showing active findings, severity, the GitHub issue created for each finding, and the closure date once remediated.

Part 6: Article 32(1)(d) — Penetration Testing

6.1. Why Automated Scanning Is Not Enough

Article 32(1)(d) requires evaluating the effectiveness of security measures. Automated vulnerability scanners find known CVEs in libraries and OS packages. They cannot find:

Business logic vulnerabilities (an API endpoint that returns another user's data when given a specific parameter)
Authentication bypasses (a JWT implementation that accepts unsigned tokens)
Privilege escalation paths (an attacker can move from a low-privilege role to admin through a sequence of legitimate API calls)
Insecure direct object references (accessing /api/users/124 instead of /api/users/123 returns data for a different customer)

The ICO (UK Information Commissioner's Office) and the CNIL (France's data protection authority) both state in their guidance that annual manual penetration testing is expected for organisations processing significant volumes of personal data.

What an acceptable pen test scope looks like:

# Annual Penetration Test Scope — Article 32 Compliance

## Testing Period
Start: 2025-04-01  
End: 2025-04-14  
Testing firm: [Accredited firm — CREST or CHECK certified]

## In Scope
- Production web application: https://app.yourcompany.com
- Production API: https://api.yourcompany.com/v1/*
- Authentication flows: OAuth2, JWT, session management
- Data stores: PostgreSQL (via application access only, not direct DB access)
- AWS account: External reconnaissance of public-facing services only

## Testing Types
- External infrastructure testing (all public IP ranges)
- Web application testing (OWASP Top 10 2021)
- API security testing (all authenticated and unauthenticated endpoints)
- Authentication and session management testing
- GDPR-specific test cases (data subject rights endpoints, consent flows)

## Remediation SLAs
- CRITICAL: 24 hours from report delivery
- HIGH: 7 calendar days
- MEDIUM: 30 calendar days
- LOW: 90 calendar days

How to track and evidence remediation:

# Create GitHub issues for each finding on receipt of the pen test report
# This creates a traceable record of every finding and its resolution

for finding_id in $(cat pentest-report-findings.txt); do
  gh issue create \
    --title "Pen test finding: $finding_id" \
    --body "See pentest-report-2025-04.pdf, section $finding_id. Severity: HIGH. SLA: 7 days." \
    --label "security,pentest" \
    --assignee "@security-lead"
done

The auditor question:

"When was your last penetration test? Show me the report and your remediation evidence."

Your evidence:

The penetration test report from a CREST or CHECK certified firm, dated within the last 12 months
A remediation tracker (GitHub issues or Jira) showing every CRITICAL and HIGH finding with a closure date
Evidence that all CRITICAL findings were closed within 24 hours (the git commit or deployment log)

Here are the key takeaways from this guide:

✅ Do: Implement application-layer encryption for sensitive fields. Storage encryption alone is not enough — a DBA with direct database access can still read plaintext.

✅ Do: Use customer-managed KMS keys with automatic rotation. You need to prove control over the key material.

✅ Do: Store pseudonymised data separately from identifiers, with restricted role-based access to the lookup table.

✅ Do: Enforce automatic logoff after 15 minutes of inactivity with an 8-hour absolute session limit.

✅ Do: Use unique service accounts with IRSA. Every action on personal data must be attributable to a specific identity.

✅ Do: Test your backups monthly. Document RTO and RPO with actual restore test results.

✅ Do: Run Trivy in CI to block CRITICAL and HIGH CVEs before deployment.

✅ Do: Conduct an annual manual penetration test from a CREST or CHECK certified firm.

❌ Don't: Use 24-hour JWT sessions or sessions with no inactivity timeout.

❌ Don't: Store secrets in environment variables, .env files, or hardcoded in source code.

❌ Don't: Skip the annual penetration test. An auditor from the ICO or CNIL will not accept "we run automated scans" as a substitute.

❌ Don't: Use AWS-managed KMS keys if you need to prove key material control to your auditor.

Resources

ICO Guide to GDPR Article 32 — The UK Information Commissioner's Office official guidance on Article 32 security obligations
ENISA Guidelines on Article 32 — The EU Agency for Cybersecurity's SME guidelines on personal data security
Trivy by Aqua Security — Open-source container vulnerability scanner used in Part 5
OWASP Top 10 2021 — The standard reference for web application security risks, used in pen test scoping
AWS KMS Key Rotation Documentation — Official AWS documentation for automatic key rotation
PostgreSQL Row Security Policies — How to implement row-level security for granular access control on pseudonymised data
EKS IAM Roles for Service Accounts (IRSA) — Official AWS documentation for unique service account identity on EKS
CREST Certified Testing Firms — Directory of CREST-certified penetration testing firms for your annual Article 32 assessment

Ayobami Adejumo is a senior platform engineer and compliance infrastructure specialist. He writes about GDPR engineering controls, SOC2 implementation, and FinOps - cloud cost optimization

How to Stay GDPR Compliant with Access Logs

freeCodeCamp — Fri, 08 Jan 2021 16:24:38 +0000

By Yuli Stremovsky

Privacy is a complicated topic. A well-known method used to save application logs turned out to be tricky with the new privacy regulations. In fact, new regulations define an IP address as a personal identifier. Like other user identifiers, it should be treated with caution.

In this article, I will cover a few methods to make your logging privacy-friendly.

First, I will teach you basic GDPR terms: PII and forget-me user right. After that, we will cover methods to make web or application server logs GDPR ready.

Then I will talk about an open-source product I am developing called Databunker and how it helps. Databunker is a Swiss army knife tool for storing personal records.

What is Personal Identifiable Information?

GDPR defines the concept of PII or Personal Identifiable Information. This can be any information that helps to identify a person.

For example, it can be a user name, address, telephone number, email address, or SSN. It can also be a weak identity, like browser information, IP address, session cookie name.

Like in triangulation, a combination of weak identities can lead us to a user. Strong and weak user identities are all considered PII.

The GDPR introduces the right for individuals to have their personal data erased. Your user or customer can send you an email asking you to remove their records. You have one month to respond to this request.

What does a forget-me request mean for log files?

Deleting user data from the database is easy. You have SQL for that. Deleting user PII from the log file is the tricky part.

You might have different servers generating logs and you might feed logs to different cloud services. This might complicate how you perform record deletion.

In this article I will cover smarter methods to make your logging privacy-compliant.

Introduction to Databunker

But first, let me give you a bit more information about what Databunker is and how it works since we'll be discussing it in some of these methods below.

Databunker is a GDPR compliant user store service for Web and mobile apps. It works as a backend application service. This product is a combination of several software concepts merged together. It provides secure PII storage and privacy by design out of the box:

A Personal Identifiable Information (PII) storage and vault
Secure session storage for web applications
Privacy portal for customers
Application backend server
DPO management tool
Tokenization service
Secret sauce

Project website: https://databunker.org/

Full working Node.js example with Passport.js is available here: https://github.com/securitybunker/databunker-nodejs-example

Method 1: Use an automatic log retention period

You have one month to respond to a user forget-me request. This actually means that you have one month to filter your log files from all user-related records – for example, filter out user IP addresses.

Or you can limit the log retention period just to one month. All older log entries will get removed. This way you do not need to do anything besides a one-time configuration of the log retention period.

Method 2: Use pseudonymization to resolve any log compliance issues

GDPR discusses the concept of pseudonymization. This method will be based on the usage of the pseudonymization term. From the GDPR Article 4(5):

‘pseudonymisation’ means the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person...

You can keep personal data in a separate database, for example in Databunker. When you receive a user's forget-me request, you will delete the user's personal data from Databunker, leaving the log files unchanged.

To make our life even easier, we can print a user session and user token in each log line.

You can take a look at this example for reference:

::ffff:141.226.198.55 - - [02/Jan/2021:18:42:54 +0000] "GET /user/me HTTP/1.1" 304 - "http://my-dev-site/user/login" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36" "b994fdbf-694e-4289-b8db-04d8049da2e8" "1f587eb7-eaaa-1629-c108-b707d99798da"

This is different from a regular web server log by the addition of two custom variables at the end of the log line.

"B994fdbf-694e-4289-b8db-04d8049da2e8" is the session token generated by the Databunker session library.

"1f587eb7-eaaa-1629-c108-b707d99798da" is a user token of the logged-in user. It is the user token generated upon user creation in Databunker.

Method 3: Solution for high-security environments

This method includes partial encryption of the log events. PII found in the log events will be grouped together and encrypted. The initial setup will include one time generation of the log-entry password for each user. This password for example can be saved in the user profile stored in Databunker.

As we need to know who the record owner is (to decrypt the record), we need to save the user id together with encrypted PII. So, another level of encryption will be used with a generic password.

For user identified log events, PII will be encrypted twice. The first time the data will be encrypted using the user's log-entry password. The second time, it'll be encrypted with the default password to hide the identified user id.

For identified users:

const piiPayload = JSON.stringify({ClientIP, BrowserUserAgent, SessionID});
coast piiEncrypted = Encrypt(UserPassword, piiPayload);
const linePayload = JSON.stringify({UserToken, data: btoa(piiEncrypted)});
const encrypted = Encrypt(GenericPassword, linePayload);

If the user is unknown, only one level of encryption can be used:

const piiPayload = JSON.stringify({ClientIP, BrowserUserAgent, SessionID});
const encrypted = Encrypt(GenericPassword, piiPayload);

When you get a user's forget-me request, you can remove the user's log-entry password and their profile stored in Databunker. This will make user log entries unrecoverable. This is completely ok and satisfies GDPR requirements. So extra actions to remove anything from logs files are not required.

Summary

With the right architecture, you can make your logging privacy compliant. It is not complicated. You can use Databunker or roll your own solution.

Whatever you choose is much better than completely ignoring this issue and manually removing user records from log files.

Free takeaway

I run a privacy training for startup founders and architects. It is available completely for FREE here.

About the author

Yuli Stremovsky is a world-class software and security architect. Founder of PrivacyBunker.io and DataBunker.org privacy products. Former Checkpoint, and RSA Security employee. An expert in marrying technological solutions with privacy.

GDPR terminology in plain English

freeCodeCamp — Wed, 23 May 2018 07:57:35 +0000

By Alex Ewerlöf

My team builds the technologies for some of the highest traffic newsrooms in Sweden and Norway. Part of the revenue comes from selling ads. Ads sell best when personalised, and for personalization you need data. Internet’s default business model is based on ads. GDPR has big implications for online businesses like newsrooms.

But here’s the interesting part — the General Data Protection Regulation (GDPR) puts restrictions on what data can be gathered, how it can be used, and for how long it can be stored.

This post is about demystifying the core GDPR terms so everyone can understand this interesting topic. If you are European or have European users, you need to understand GDPR.

TL;DR; this is a huge shift in how personal data is gathered from “by default” to “opt-in”. Plus some other perks.

Here is a video that sums it up at a basic level:

Before we start, a quick disclaimer: I don’t represent my current/previous employers on my personal blog. The information provided here is purely based on my own research, and doesn’t necessarily reflect my company’s policies, strategy or implementation of GDPR.

A bit of background

GDPR came into effect on May 25. Despite making developers’ and marketers’ lives harder, it’s actually a very sweet deal for the end users. GDPR prevents the companies from gathering information they don’t need to (strictly speaking).

Despite starting with the word ‘General’, GDPR is actually an European Union (EU) law that applies to:

Companies that are based in the EU
Companies that gather personal data from European citizens.

Maybe that ‘General’ is good, because a huge part of the internet is European!

_Global internet usage during 24 hours ([wikipedia](https://en.wikipedia.org/wiki/Global_Internet_usage" rel="noopener" target="blank" title="))

The word ‘Regulation’ in GDPR means that it must be applied in its entirety across the EU.

In the long run, this leads to privacy by design. This is a principle that calls for the inclusion of data protection from the start of designing the systems, rather than as an afterthought.

Common terminology

Here’s a list of the most common GDPR terms:

A Data Subject is a person (such as you and me) whose personal data is processed by a data controller (such as a company or service we use).
A Data Controller is an organisation that collects data from EU residents. It determines the purposes, conditions and means of processing the personal data.
The entity that does the actual data processing is called a Data Processor — an example might be a cloud service provider.
Processing involves any operation performed on personal data, whether or not by automated means. This includes collection, use, recording, feeding it to machine learning algorithms (read how ML is affected by GDPR), and so on.

Your personal data is any information that can be used to directly or indirectly identify you. For example: your name, home address, photo, email address, bank details, posts on social networking websites, medical information, or a computer or mobile IP address.

This data is usually used for profiling, in which automated processes evaluate, analyse, or predict your behaviour. As an example, knowing your age means you’ll be exposed to ads that are targeted to your age group. This is also true about data that you’re not explicitly giving to a company, like your IP address, which will be used to guess your location.

Now that GDPR is in effect, companies have limitations on what personal data they can gather and how long they can store it. They should justify why they need it.

The data controller (company) cannot just go and gather user data. They have to first ask for your permission or consent.

The consent must be explicit for data collected and for the purposes the data is used. The consent is freely given (if you say ‘no’, the company should still serve you as well as possible without your data). The consent should not be regarded as freely given if the data subject has no genuine or free choice or is unable to refuse or withdraw consent without detriment. The consent should be specific and explicit about what data is gathered and how it is processed. The user have the right to withdraw his or her consent at any time but more importantly it shall be as easy to withdraw as to give consent.

Companies can no longer force you to tick a checkbox that says “I accept all terms and conditions and privacy policies”. That is why you were getting those emails from many websites informing you about their policies before the May 25th deadline.

The area of GDPR consent has a number of implications for businesses who record calls as a matter of practice. The typical “calls are recorded for training and security purposes” warnings will no longer be sufficient to gain assumed consent to record calls.

There must be a reasonable legal basis for gathering an exact piece of data. According to the GDPR’s site, these can be when:

Processing is necessary for the fulfillment of a contract to which the data subject is party or to take steps at the request of the data subject prior to entering into a contract.
Processing is necessary for compliance with a legal obligation to which the controller is subject.
Processing is necessary to protect the vital interests of the data subject or of another natural person.
Processing is necessary for the performance of a task carried out in the public interest or in the exercise of official authority vested in the controller.
Processing is necessary for the purposes of the legitimate interests pursued by the controller or by a third party unless such interests are overridden by the interests or fundamental rights and freedoms of the data subject, which require protection of personal data, in particular if the data subject is a child.

The most important benefit of GDPR is that it gives controls to the users to:

Erase their data whenever they like (also known as the Right to be Forgotten). Data Erasure requests don’t stop at the data controller. If third party data processors are involved, they too have to stop processing the data and erase it. I’m guessing there’ll be a de facto standard API for that, but so far it’s more ad-hoc and depends on how services talk to each other. I’m sure in the future there’ll be services where you give them your personal info and they’ll check thousands of online services to give you an aggregated report of which sites have your information. The companies should provide a way to query if they have data for a particular user (without requiring registration). Trivia: this is essentially in contradiction with how Blockchain works! Read more about the implications of GDPR for Blockchain here.
Own their data! The data subjects (users) can download and see their data and how it is processed. Furthermore, the data controller has to inform the data subject on details about the processing, such as the purposes of the processing, with whom the data is shared, and how it acquired the data. This is called right of access or subject access right. Personal data cannot be transferred to countries outside the European Union unless they guarantee the same level of data protection.
Move their data to competitors. This is good for competition and eventually the users win. The data must be provided by the controller in a structured and commonly used standard electronic format. No more lock-in! This is known as data Portability. This will probably open up a whole new business segment for converting data formats from one controller to another controller.
Update/correct their data. The data subjects have the right to ask the data controllers to immediately correct (public or private) data that is invalid.

I personally find the data breach announcement amazing.

The data controller is under a legal obligation to notify the relevant supervisory authority of any data breach without undue delay, unless the breach is likely to result in a risk to the rights and freedoms of the individuals affected.

Individuals have to be notified if an adverse impact is determined. There is a maximum of 72 hours after becoming aware of the data breach to make the report. In addition, the data processor will have to notify the data controller without undue delay after becoming aware of a personal data breach.

Do you remember when Yahoo kept its breach secret for two years? Well, not anymore!

Since GDPR is quite a big thing, governments are involved to protect their citizens and enforce the regulations. There are two terms to understand:

National Data Protection Authorities (DPA) are appointed by each EU country to implement and enforce data protection law, and to offer guidance. Supervisory Authority (SA) is another name for DPO. As set out in Chapter 16, DPAs have significant enforcement powers, including the ability to issue substantial fines. They are also the place to go to in case of a violation of data protection legislation (in the scope of the GDPR for EU citizens) and for advice and specific questions and/or assistance from the perspective of organisations.
A Data Protection Officer (DPO) is a an employee of the data controller (company) who is formally tasked with ensuring that an organisation is aware of, and complies with, its data protection responsibilities. More about this in the next section.

DPA & DPO

Each EU member has a main establishment where key decisions about data processing are made.

The upper fine limit for contravening GDPR is pretty expensive: up to €20 million, or up to 4% of the annual worldwide turnover of the preceding financial year… whichever is higher!

Companies that gather data have a responsibility and the liability to implement and demonstrate that they comply with GDPR. This is called compliance.

The companies are supposed to keep a log of who accessed what information for when the authorities ask for an audit. Records of processing activities must be maintained, that include purposes of the processing, categories involved and envisaged time limits.

The records must be made available to the supervisory authority on request. The interesting part is that even if the actual processing happens by another company (a data processor on behalf of the data controller), it is still the company that gathers the data that bears the main responsibility.

This whole new range of requirements is complicated enough to create a new job title: data protection officer (DPO)! This is an enterprise security leadership role responsible for overseeing data protection strategy and implementation to ensure compliance.

They also:

Educate the company and employees on important compliance requirements
Are the point of contact between the company and supervisory authorities
Monitor and provide advice on data protection efforts across the company
Keep tabs on all data processing activities at the company, including the purpose of all processing activities, which must be made public on request
Answer inquiries from users regarding how their data is being used, data erasure right and queries regarding what measures the company has put in place to protect their personal information
Identify and reduce the privacy risks of entities by analysing the personal data that are processed and the policies in place to protect the data, which is called Data Privacy Impact Assessment. The GDPR mandates a DPIA be conducted where data processing is likely to result in a high risk to the rights and freedoms of natural persons.

The DPO must have a support team and will also be responsible for continuing professional development to be independent of the organization that employs them, effectively as a “mini-regulator.”

If a business has multiple establishments in the EU, it will have a single supervisory authority as its lead authority, based on where the main data processing activities take place.

Since GDPR enforces privacy by design, it affects software architecture and its implementation. For example, we can no more keep logs of sensitive information (as mentioned before, IP addresses are considered personal information). This makes tracing bugs a bit harder.

Privacy settings must therefore be set at a high level by default. So we have to make sure checkboxes that expose personal data are not ticked by default.

If the Cloud is used for data storage, only the data owner, not the cloud service, should hold the decryption keys.

We cannot store data for longer than necessary. Database columns should have a data retention deadline which specifies when the data should be deleted.

Personally identifiable information should be pseudonymised in a way that it can no longer be linked (or ‘attributed’) to a single data subject without the use of additional data.

What good is a law if it is not meant to be broken? Don’t get too excited about your rights because the following cases are not covered by the regulation:

Lawful interception, national security, the army, the police, justice
Statistical and scientific analysis for research
Deceased persons are subject to national legislation
There is a dedicated law on employer-employee relationships. The GDPR was developed with a focus on social networks and cloud providers, but did not consider enough requirements for handling employee data.
Processing of personal data by a natural person in the course of a purely personal or household activity

Acknowledgement

Thanks to my colleague Ioana Norgen for proof-reading this post before publishing. Any possible errors are still mine.

Sources

Interesting reading

ePrivacy, a set of related regulations that are also enforced at the same time as GDPR. It targets any business that provides any form of online communication service, uses online tracking technologies, or engages in electronic direct marketing (eg. telecom operators and online communication services like Skype and WhatsApp). Its most important aspect is protection against spam SMS/email and marketing calls.
An excellent guide to GDPR for developers and some nice slides
Belitsoft has made a great checklist for businesses about GDPR although not all items in the checklist are a requirement by GDPR and some like 2 factor authentication are more of a best practice.
How GDPR affects cookies used for tracking
The data protection reform package also includes a separate Data Protection Directive for the police and criminal justice sector that provides rules on personal data exchanges at national, European, and international levels.
Facebook and Google hit with $8.8 billion in lawsuits on day one of GDPR
Privacy by design

The bottom line is: GDPR is an obvious right. Europe pioneered its establishment but this should be a global right. Talk about it with your friends, colleagues and law makers if you want to enjoy the same protection and choice as Europeans.

If you liked this, you may enjoy: programming is the best job ever and how do I keep up with technology.

#gdpr - freeCodeCamp.org

GDPR Article 32 for Software Engineers: Technical Controls, Implementations, and Auditor Questions

Table of Contents

What You'll Learn

Prerequisites

Part 1: Understanding Article 32 — The Technical Requirements

1.1. What Article 32 Actually Requires

1.2. The Scope Question: What Data Is Covered?

Part 2: Article 32(1)(a) — Pseudonymisation and Encryption

2.1. How to Implement Pseudonymisation at the Database Layer

2.2. How to Implement Encryption at Rest with Customer-Managed Keys

2.3. How to Implement Application-Layer Encryption for Sensitive Fields

Part 3: Article 32(1)(b) — Confidentiality and Integrity

3.1. How to Implement Automatic Logoff

3.2. How to Implement Unique User Identification with IRSA

Part 4: Article 32(1)(c) — Availability and Resilience

4.1. How to Implement Multi-AZ and Backup Requirements

Part 5: Article 32(1)(d) — Regular Testing

5.1. How to Implement Automated Vulnerability Scanning

Part 6: Article 32(1)(d) — Penetration Testing

6.1. Why Automated Scanning Is Not Enough

Best Practices for GDPR Article 32 Compliance

Resources

How to Stay GDPR Compliant with Access Logs

Some GDPR-related terms

What is Personal Identifiable Information?

What does a forget-me request mean for log files?

Introduction to Databunker

Method 1: Use an automatic log retention period

Method 2: Use pseudonymization to resolve any log compliance issues

Method 3: Solution for high-security environments

Summary

Free takeaway

About the author

GDPR terminology in plain English

A bit of background

Common terminology

GDPR for the users

When the companies NEED user consent

When the companies DON’T need user consent

GDPR for the governments

GDPR for the companies

GDPR for the developers

Exceptions to GDPR

Acknowledgement

Sources

Interesting reading