GuideCache

Debugging Production Issues

TLDR

Check logs first, reproduce the issue, isolate the change, fix forward or rollback, write postmortem.

Production Broke. Now What?

Stay calm. Follow a systematic approach.

Step 1: Acknowledge and Communicate

"We're aware of [issue]. Investigating now."

Don't speculate. Update when you know more.

Step 2: Check the Logs

# Recent errors
tail -f /var/log/app/error.log

# Search for specific error
grep -i "error" /var/log/app/*.log | tail -50

# Laravel logs
tail -f storage/logs/laravel.log

Step 3: Reproduce the Issue

Get exact steps from user/alert
Check specific user, URL, or data
Look for patterns (time, user type, feature)

Step 4: Identify the Change

# What deployed recently?
git log --oneline -20

# Diff between versions
git diff v1.2.3..v1.2.4 --stat

Step 5: Fix Forward or Rollback

Rollback if:

Issue is severe
Fix isn't obvious
More investigation needed

# Revert to previous version
git revert HEAD
# or deploy previous release

Fix forward if:

Issue is minor
Fix is quick and obvious
Rollback has risks

Step 6: Postmortem

Document while it's fresh:

## Incident: Login failures - Jan 15, 2024

### Timeline
- 14:30 - Alert triggered
- 14:35 - Issue confirmed
- 14:50 - Root cause identified
- 15:00 - Fix deployed

### Root Cause
Cache TTL was set to 0, causing every request to hit DB

### Resolution
Restored cache TTL to 3600 seconds

### Prevention
- Add monitoring for cache hit rate
- Review config changes before deploy

Error rate spikes
Response time increases
Database connection counts
Queue backlogs
Memory/CPU usage

About the Author

Matthew Gros

I build and ship automation-driven products using Laravel and modern frontend stacks (Vue/React), with a focus on scalability, measurable outcomes, and tight user experience. I’m based in Toronto, have 13+ years in PHP, and I also hold a pilot’s license. I enjoy working on new tech projects and generally exploring new technology.