At Indivd, we employ a robust system monitoring framework that operates on multiple levels, utilizing a mix of proprietary and third-party tools. Our approach is designed to ensure maximum operational reliability and security, facilitated through comprehensive monitoring and immediate alert mechanisms.
We tailor our monitoring practices to the specific needs of each service—some are monitored around the clock, while others are checked only during business hours. In the event of any issues, real-time alerts and notifications are dispatched to system administrators via Slack and email, ensuring swift action and minimal disruption.
Included below are details on the specific aspects we monitor and the tools we employ. All of our log data is securely stored within the EU, and we maintain stringent communication protocols between systems to safeguard data integrity. Access to the monitoring systems is exclusively reserved for system administrators.
We are committed to continually enhancing our monitoring routines and automatic recovery processes to adapt to evolving needs and technological advancements.
Monitoring Scope and Communication
- Operational Hours: Certain services are monitored continuously (24/7), while others are monitored during specific location operating hours.
- Alerts and Notifications: System administrators receive real-time alerts via Slack and email, ensuring immediate attention to potential issues.
Data Security and Access
- Data Residency: All log data is securely stored within the EU.
- System Communication: Communication between systems is safeguarded using secure protocols and stringent rules.
- Access Control: Only system administrators have access to the monitoring tools.
Monitoring Tools and Practices
- Continuous Improvement: Our monitoring routines and automatic recovery processes are perpetually enhanced to bolster system resilience.
Key Monitoring Components
VPN Monitoring
- Connectivity Checks: We monitor VPN connectivity to ensure tunnels are operational and servers at customer sites are accessible.
- Automatic Recovery: Any disruption in VPN connectivity triggers an automatic restart, and administrators are promptly notified.
Server Metrics
- Tools Used: We use New Relic, along with tools provided by our data center. Digital Ocean offers additional monitoring for front-end and back-end systems.
- Metrics Monitored:
- Disk space
- Load
- Storage capacity
- Network performance
Camera Uptime
- Frequency: Cameras are checked every 10 minutes during operational hours to confirm they are active and transmitting data.
- Alert System: Alerts are issued for cameras that go offline, with manual follow-up for unresolved issues.
Traffic and Data Flow
- Quality Assurance: Camera streams are analyzed for quality and data consistency using a dedicated service.
- Stream Monitoring: Automatic restarts and alerts are initiated if camera streams fail to transmit data.
Container Management
- Docker Deployment: Our backend and insight systems operate within Docker containers.
- Alert Triggers: Alerts are generated if any container ceases to function.
Log Management
- Monitoring Scope: Service logs, system logs, and audit logs are all monitored.
- Alert Triggers: Alerts are issued for log anomalies or suspicious activities, with automatic restarts for critical failures.
- Log Processing: We utilize New Relic's EU service for centralized log processing and escalation management.
Critical Services
- Operational Integrity: All critical services are monitored to ensure they are active and functioning correctly.
Snapshot Alerts
- Daily Comparisons: Nightly comparisons of the latest snapshots to previous day's data are conducted.
- Alert Criteria: Significant changes in camera angles or content trigger alerts for system administrator review.
Frontend and Backend Health
- Hosting and Monitoring: Hosted on Digital Ocean, both frontend and backend health are monitored through built-in services.
- Database Management: Our database configurations are separately managed and monitored to ensure integrity and performance.
Backups
- Monitoring and Validation: The creation and validation of backups are closely monitored to ensure data integrity and availability.