Cost Tracking Verification Guide
- Overview
- Data Storage Architecture
- Verification Procedures
  - 1. Real-time Log Monitoring
Watch application logs for cost updates
Check cleanup job logs
Filter for cost-specific events
Get overall cost summary
Check specific environment cost
Force cost recalculation
Query environments missing cost data
Get recent cost metrics
Get average cost per environment
scripts/validate-cost-tracking.py
Get all active environments
Check for missing cost data
Check if cost data exists
Check if cost data is stale (>24 hours old)
Report findings
List problematic environments
Verify cost calculation
Summary
Return status
Check CronJob status
View last execution
Check for successful cost updates in cleanup logs
Deploy indexes if not already done
Monitor slow queries in logs
- Troubleshooting
- Monitoring Alerts
- Regular Maintenance
- Useful Queries
- References

Cost Tracking Verification Guide

This guide provides operational procedures to verify that cost tracking is working correctly and storing accurate data for the Blueberry IDP.

Overview

The cost tracking system automatically calculates and stores cost data at multiple points in the environment lifecycle:
- Creation: Initial cost estimation based on resource configuration
- Updates: Cost recalculation when environment changes
- Termination: Final cost calculation before cleanup

Data Storage Architecture

1. Environment Model Fields

Cost data is stored in each environment document:

{
  "id": "env-12345",
  "name": "pr-123",
  "total_cost": 12.45,
  "resource_usage": {
    "cost_breakdown": {
      "gke_autopilot": {
        "cpu": {"cores": 0.5, "cost": 5.34},
        "memory": {"gb": 1.0, "cost": 1.18},
        "storage": {"gb": 15.0, "cost": 0.23},
        "total": 6.75
      },
      "services": {
        "breakdown": {
          "firestore": 2.10,
          "artifact_registry": 0.50,
          "gcs": 0.30,
          "load_balancer": 2.50,
          "secret_manager": 0.30
        },
        "total": 5.70
      },
      "networking": 1.01
    },
    "cost_per_hour": 0.52,
    "estimated_at": "2024-01-15T10:30:00Z"
  }
}

2. Telemetry Metrics

Cloud Monitoring metrics track:
- custom.googleapis.com/environment/estimated_cost_usd
- Tags: cpu_cores, memory_gb

3. Structured Logs

Cost events are logged with metadata:
- Updated environment cost
- Updated PR environment cost
- Updated final cost for environment before termination

Verification Procedures

1. Real-time Log Monitoring

Monitor cost calculation events in real-time:

# Watch application logs for cost updates
kubectl logs -f deployment/blueberry -n blueberry | grep -E "(cost|Updated environment cost|total_cost)"

# Check cleanup job logs
kubectl logs -l app.kubernetes.io/component=cleanup -n blueberry --tail=100

# Filter for cost-specific events
kubectl logs deployment/blueberry -n blueberry --since=1h | jq 'select(.msg | contains("cost"))'

2. API Verification

Use the cost tracking APIs to verify data:

# Get overall cost summary
curl -H "Authorization: Bearer $TOKEN" \
  https://blueberry.florenciacomuzzi.com/api/observability/costs/summary?days=30

# Check specific environment cost
ENV_ID="pr-123"
curl -H "Authorization: Bearer $TOKEN" \
  https://blueberry.florenciacomuzzi.com/api/observability/costs/environment/$ENV_ID

# Force cost recalculation
curl -X POST -H "Authorization: Bearer $TOKEN" \
  https://blueberry.florenciacomuzzi.com/api/observability/costs/environment/$ENV_ID/update

3. Database Verification

Via Firebase Console

Navigate to Firebase Console → Firestore
Select the environments collection
For each environment document, verify:
total_cost field exists and is > 0
resource_usage contains complete breakdown
resource_usage.estimated_at is recent

Via gcloud CLI

# Query environments missing cost data
gcloud firestore documents list \
  --collection-path=environments \
  --filter="total_cost=null" \
  --project=development-454916

4. Telemetry Verification

Query Cloud Monitoring for cost metrics:

# Get recent cost metrics
gcloud monitoring time-series list \
  --filter='metric.type="custom.googleapis.com/environment/estimated_cost_usd"' \
  --interval-end=$(date -u +%Y-%m-%dT%H:%M:%SZ) \
  --interval-start=$(date -u -d '24 hours ago' +%Y-%m-%dT%H:%M:%SZ) \
  --project=development-454916

# Get average cost per environment
gcloud monitoring time-series list \
  --filter='metric.type="custom.googleapis.com/environment/estimated_cost_usd"' \
  --interval-end=now \
  --interval-start=-P1D \
  --aggregation='{"alignmentPeriod":"3600s","perSeriesAligner":"ALIGN_MEAN"}' \
  --project=development-454916

5. Automated Validation Script

Create and run this validation script:

#!/usr/bin/env python3
# scripts/validate-cost-tracking.py

import asyncio
import sys
from datetime import datetime, timezone, timedelta
sys.path.append('.')

from blueberry.stores.environment import EnvironmentStore
from blueberry.services.cost_tracker import cost_tracker

async def validate_cost_tracking():
    """Validate cost tracking data integrity."""
    store = EnvironmentStore()

    print("🔍 Cost Tracking Validation Report")
    print("=" * 50)

    # Get all active environments
    active_envs = await store.list_active_environments()
    print(f"\n✓ Found {len(active_envs)} active environments")

    # Check for missing cost data
    missing_costs = []
    zero_costs = []
    stale_costs = []

    for env in active_envs:
        # Check if cost data exists
        if env.total_cost is None:
            missing_costs.append(env)
        elif env.total_cost == 0:
            zero_costs.append(env)

        # Check if cost data is stale (>24 hours old)
        if env.resource_usage and 'estimated_at' in env.resource_usage:
            estimated_at = datetime.fromisoformat(env.resource_usage['estimated_at'].replace('Z', '+00:00'))
            age = datetime.now(timezone.utc) - estimated_at
            if age > timedelta(hours=24):
                stale_costs.append((env, age))

    # Report findings
    print(f"\n📊 Cost Data Status:")
    print(f"  - Missing costs: {len(missing_costs)} environments")
    print(f"  - Zero costs: {len(zero_costs)} environments")
    print(f"  - Stale costs (>24h): {len(stale_costs)} environments")

    # List problematic environments
    if missing_costs:
        print(f"\n❌ Environments missing cost data:")
        for env in missing_costs[:5]:  # Show first 5
            print(f"  - {env.id} ({env.name})")

    if stale_costs:
        print(f"\n⚠️  Environments with stale cost data:")
        for env, age in stale_costs[:5]:  # Show first 5
            print(f"  - {env.id}: {age.days}d {age.seconds//3600}h old")

    # Verify cost calculation
    if active_envs:
        print(f"\n🧮 Testing cost calculation on {active_envs[0].id}...")
        try:
            cost_data = cost_tracker.estimate_environment_cost(active_envs[0])
            print(f"  ✓ Cost calculation successful: ${cost_data['total_estimated_cost']:.2f}")
        except Exception as e:
            print(f"  ❌ Cost calculation failed: {e}")

    # Summary
    total_cost = sum(env.total_cost or 0 for env in active_envs)
    avg_cost = total_cost / len(active_envs) if active_envs else 0

    print(f"\n💰 Cost Summary:")
    print(f"  - Total estimated cost: ${total_cost:.2f}")
    print(f"  - Average per environment: ${avg_cost:.2f}")

    # Return status
    issues = len(missing_costs) + len(zero_costs) + len(stale_costs)
    if issues == 0:
        print(f"\n✅ All cost tracking data is valid!")
        return 0
    else:
        print(f"\n⚠️  Found {issues} issues that need attention")
        return 1

if __name__ == "__main__":
    exit_code = asyncio.run(validate_cost_tracking())
    sys.exit(exit_code)

Run the validation:

python scripts/validate-cost-tracking.py

6. CronJob Monitoring

Verify the cleanup job is running and updating costs:

# Check CronJob status
kubectl get cronjobs -n blueberry

# View last execution
kubectl get jobs -n blueberry | grep cleanup

# Check for successful cost updates in cleanup logs
kubectl logs -l app.kubernetes.io/component=cleanup -n blueberry | grep "Updated final cost"

7. Dashboard Health Check

Access the cost dashboard: https://blueberry.florenciacomuzzi.com/api/observability/costs/dashboard
Verify:
Total cost is non-zero
Active environments count matches reality
Cost trends show data for recent days
Service breakdown pie chart has data
Environment list shows cost values

8. Firestore Index Performance

Check if cost queries are using indexes efficiently:

# Deploy indexes if not already done
firebase deploy --only firestore:indexes --project blueberry-e6167

# Monitor slow queries in logs
gcloud logging read 'resource.type="datastore_index" severity>=WARNING' \
  --project=development-454916 \
  --limit=50

Troubleshooting

Missing Cost Data

If environments are missing cost data:

Check if the environment was created before cost tracking was implemented
Manually trigger cost calculation:
bash curl -X POST -H "Authorization: Bearer $TOKEN" \ https://blueberry.florenciacomuzzi.com/api/observability/costs/environment/{env_id}/update
Check logs for calculation errors

Zero Cost Values

Zero costs typically indicate:
- Environment just created (race condition)
- Calculation error (check logs)
- Missing resource configuration

Stale Cost Data

For environments with outdated cost estimates:
1. The cleanup CronJob should update costs every 6 hours
2. Manually update if needed via API
3. Check if CronJob is running properly

Performance Issues

If cost queries are slow:
1. Ensure Firestore indexes are deployed
2. Check for missing indexes in logs
3. Consider adding pagination to queries

Monitoring Alerts

Consider setting up alerts for:

Missing Cost Data
yaml alert: EnvironmentMissingCostData expr: count(environments{total_cost="null"}) > 5 for: 30m
Stale Cost Data
yaml alert: EnvironmentStaleCostData expr: time() - environment_cost_updated_timestamp > 86400 for: 1h
Cleanup Job Failures
yaml alert: CleanupJobFailed expr: kube_job_status_failed{job_name=~".*cleanup.*"} > 0 for: 10m

Regular Maintenance

Daily

Check dashboard for anomalies
Review any cost-related alerts

Weekly

Run validation script
Review cost trends for outliers
Check for environments with unusually high costs

Monthly

Analyze cost optimization recommendations
Review and update pricing model if needed
Audit long-running environments

Useful Queries

Find Most Expensive Environments

SELECT id, name, total_cost, created_at
FROM environments
WHERE status IN ('READY', 'PROVISIONING')
ORDER BY total_cost DESC
LIMIT 10

Calculate Daily Spend Rate

SELECT
  DATE(created_at) as day,
  SUM(total_cost) as daily_cost,
  COUNT(*) as env_count
FROM environments
WHERE created_at > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
GROUP BY day
ORDER BY day DESC

Identify Cost Optimization Candidates

SELECT id, name, total_cost, ttl_hours
FROM environments
WHERE status = 'READY'
  AND ttl_hours > 72
  AND total_cost > 50
ORDER BY total_cost DESC

References

Document ID: guides/monitoring/cost-tracking-verification

Table of Contents