Table of Contents
- Cost Tracking Verification Guide
- Watch application logs for cost updates
- Check cleanup job logs
- Filter for cost-specific events
Cost Tracking Verification Guide
This guide provides operational procedures to verify that cost tracking is working correctly and storing accurate data for the Blueberry IDP.
Overview
The cost tracking system automatically calculates and stores cost data at multiple points in the environment lifecycle:
- Creation: Initial cost estimation based on resource configuration
- Updates: Cost recalculation when environment changes
- Termination: Final cost calculation before cleanup
Data Storage Architecture
1. Environment Model Fields
Cost data is stored in each environment document:
{
"id": "env-12345",
"name": "pr-123",
"total_cost": 12.45,
"resource_usage": {
"cost_breakdown": {
"gke_autopilot": {
"cpu": {"cores": 0.5, "cost": 5.34},
"memory": {"gb": 1.0, "cost": 1.18},
"storage": {"gb": 15.0, "cost": 0.23},
"total": 6.75
},
"services": {
"breakdown": {
"firestore": 2.10,
"artifact_registry": 0.50,
"gcs": 0.30,
"load_balancer": 2.50,
"secret_manager": 0.30
},
"total": 5.70
},
"networking": 1.01
},
"cost_per_hour": 0.52,
"estimated_at": "2024-01-15T10:30:00Z"
}
}
2. Telemetry Metrics
Cloud Monitoring metrics track:
- custom.googleapis.com/environment/estimated_cost_usd
- Tags: cpu_cores
, memory_gb
3. Structured Logs
Cost events are logged with metadata:
- Updated environment cost
- Updated PR environment cost
- Updated final cost for environment before termination
Verification Procedures
1. Real-time Log Monitoring
Monitor cost calculation events in real-time:
# Watch application logs for cost updates
kubectl logs -f deployment/blueberry -n blueberry | grep -E "(cost|Updated environment cost|total_cost)"
# Check cleanup job logs
kubectl logs -l app.kubernetes.io/component=cleanup -n blueberry --tail=100
# Filter for cost-specific events
kubectl logs deployment/blueberry -n blueberry --since=1h | jq 'select(.msg | contains("cost"))'
2. API Verification
Use the cost tracking APIs to verify data:
# Get overall cost summary
curl -H "Authorization: Bearer $TOKEN" \
https://blueberry.florenciacomuzzi.com/api/observability/costs/summary?days=30
# Check specific environment cost
ENV_ID="pr-123"
curl -H "Authorization: Bearer $TOKEN" \
https://blueberry.florenciacomuzzi.com/api/observability/costs/environment/$ENV_ID
# Force cost recalculation
curl -X POST -H "Authorization: Bearer $TOKEN" \
https://blueberry.florenciacomuzzi.com/api/observability/costs/environment/$ENV_ID/update
3. Database Verification
Via Firebase Console
- Navigate to Firebase Console ā Firestore
- Select the
environments
collection - For each environment document, verify:
total_cost
field exists and is > 0resource_usage
contains complete breakdownresource_usage.estimated_at
is recent
Via gcloud CLI
# Query environments missing cost data
gcloud firestore documents list \
--collection-path=environments \
--filter="total_cost=null" \
--project=development-454916
4. Telemetry Verification
Query Cloud Monitoring for cost metrics:
# Get recent cost metrics
gcloud monitoring time-series list \
--filter='metric.type="custom.googleapis.com/environment/estimated_cost_usd"' \
--interval-end=$(date -u +%Y-%m-%dT%H:%M:%SZ) \
--interval-start=$(date -u -d '24 hours ago' +%Y-%m-%dT%H:%M:%SZ) \
--project=development-454916
# Get average cost per environment
gcloud monitoring time-series list \
--filter='metric.type="custom.googleapis.com/environment/estimated_cost_usd"' \
--interval-end=now \
--interval-start=-P1D \
--aggregation='{"alignmentPeriod":"3600s","perSeriesAligner":"ALIGN_MEAN"}' \
--project=development-454916
5. Automated Validation Script
Create and run this validation script:
#!/usr/bin/env python3
# scripts/validate-cost-tracking.py
import asyncio
import sys
from datetime import datetime, timezone, timedelta
sys.path.append('.')
from blueberry.stores.environment import EnvironmentStore
from blueberry.services.cost_tracker import cost_tracker
async def validate_cost_tracking():
"""Validate cost tracking data integrity."""
store = EnvironmentStore()
print("š Cost Tracking Validation Report")
print("=" * 50)
# Get all active environments
active_envs = await store.list_active_environments()
print(f"\nā Found {len(active_envs)} active environments")
# Check for missing cost data
missing_costs = []
zero_costs = []
stale_costs = []
for env in active_envs:
# Check if cost data exists
if env.total_cost is None:
missing_costs.append(env)
elif env.total_cost == 0:
zero_costs.append(env)
# Check if cost data is stale (>24 hours old)
if env.resource_usage and 'estimated_at' in env.resource_usage:
estimated_at = datetime.fromisoformat(env.resource_usage['estimated_at'].replace('Z', '+00:00'))
age = datetime.now(timezone.utc) - estimated_at
if age > timedelta(hours=24):
stale_costs.append((env, age))
# Report findings
print(f"\nš Cost Data Status:")
print(f" - Missing costs: {len(missing_costs)} environments")
print(f" - Zero costs: {len(zero_costs)} environments")
print(f" - Stale costs (>24h): {len(stale_costs)} environments")
# List problematic environments
if missing_costs:
print(f"\nā Environments missing cost data:")
for env in missing_costs[:5]: # Show first 5
print(f" - {env.id} ({env.name})")
if stale_costs:
print(f"\nā ļø Environments with stale cost data:")
for env, age in stale_costs[:5]: # Show first 5
print(f" - {env.id}: {age.days}d {age.seconds//3600}h old")
# Verify cost calculation
if active_envs:
print(f"\nš§® Testing cost calculation on {active_envs[0].id}...")
try:
cost_data = cost_tracker.estimate_environment_cost(active_envs[0])
print(f" ā Cost calculation successful: ${cost_data['total_estimated_cost']:.2f}")
except Exception as e:
print(f" ā Cost calculation failed: {e}")
# Summary
total_cost = sum(env.total_cost or 0 for env in active_envs)
avg_cost = total_cost / len(active_envs) if active_envs else 0
print(f"\nš° Cost Summary:")
print(f" - Total estimated cost: ${total_cost:.2f}")
print(f" - Average per environment: ${avg_cost:.2f}")
# Return status
issues = len(missing_costs) + len(zero_costs) + len(stale_costs)
if issues == 0:
print(f"\nā
All cost tracking data is valid!")
return 0
else:
print(f"\nā ļø Found {issues} issues that need attention")
return 1
if __name__ == "__main__":
exit_code = asyncio.run(validate_cost_tracking())
sys.exit(exit_code)
Run the validation:
python scripts/validate-cost-tracking.py
6. CronJob Monitoring
Verify the cleanup job is running and updating costs:
# Check CronJob status
kubectl get cronjobs -n blueberry
# View last execution
kubectl get jobs -n blueberry | grep cleanup
# Check for successful cost updates in cleanup logs
kubectl logs -l app.kubernetes.io/component=cleanup -n blueberry | grep "Updated final cost"
7. Dashboard Health Check
- Access the cost dashboard: https://blueberry.florenciacomuzzi.com/api/observability/costs/dashboard
- Verify:
- Total cost is non-zero
- Active environments count matches reality
- Cost trends show data for recent days
- Service breakdown pie chart has data
- Environment list shows cost values
8. Firestore Index Performance
Check if cost queries are using indexes efficiently:
# Deploy indexes if not already done
firebase deploy --only firestore:indexes --project blueberry-e6167
# Monitor slow queries in logs
gcloud logging read 'resource.type="datastore_index" severity>=WARNING' \
--project=development-454916 \
--limit=50
Troubleshooting
Missing Cost Data
If environments are missing cost data:
- Check if the environment was created before cost tracking was implemented
- Manually trigger cost calculation:
bash curl -X POST -H "Authorization: Bearer $TOKEN" \ https://blueberry.florenciacomuzzi.com/api/observability/costs/environment/{env_id}/update
- Check logs for calculation errors
Zero Cost Values
Zero costs typically indicate:
- Environment just created (race condition)
- Calculation error (check logs)
- Missing resource configuration
Stale Cost Data
For environments with outdated cost estimates:
1. The cleanup CronJob should update costs every 6 hours
2. Manually update if needed via API
3. Check if CronJob is running properly
Performance Issues
If cost queries are slow:
1. Ensure Firestore indexes are deployed
2. Check for missing indexes in logs
3. Consider adding pagination to queries
Monitoring Alerts
Consider setting up alerts for:
-
Missing Cost Data
yaml alert: EnvironmentMissingCostData expr: count(environments{total_cost="null"}) > 5 for: 30m
-
Stale Cost Data
yaml alert: EnvironmentStaleCostData expr: time() - environment_cost_updated_timestamp > 86400 for: 1h
-
Cleanup Job Failures
yaml alert: CleanupJobFailed expr: kube_job_status_failed{job_name=~".*cleanup.*"} > 0 for: 10m
Regular Maintenance
Daily
- Check dashboard for anomalies
- Review any cost-related alerts
Weekly
- Run validation script
- Review cost trends for outliers
- Check for environments with unusually high costs
Monthly
- Analyze cost optimization recommendations
- Review and update pricing model if needed
- Audit long-running environments
Useful Queries
Find Most Expensive Environments
SELECT id, name, total_cost, created_at
FROM environments
WHERE status IN ('READY', 'PROVISIONING')
ORDER BY total_cost DESC
LIMIT 10
Calculate Daily Spend Rate
SELECT
DATE(created_at) as day,
SUM(total_cost) as daily_cost,
COUNT(*) as env_count
FROM environments
WHERE created_at > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
GROUP BY day
ORDER BY day DESC
Identify Cost Optimization Candidates
SELECT id, name, total_cost, ttl_hours
FROM environments
WHERE status = 'READY'
AND ttl_hours > 72
AND total_cost > 50
ORDER BY total_cost DESC