Job Description
Job description
Role & responsibilities
Monitoring & Incident Management
- Monitor Data-Center infrastructure, applications, servers, and services using enterprise monitoring tools.
- Respond to production incidents, alerts, and Application failures within defined SLAs.
- Perform initial triage, root cause identification, and provide timely resolutions or escalations.
- Maintain incident logs, documentation.
Operations & Maintenance
- Perform daily health checks on servers, services, batch jobs, and scheduled tasks.
- Support maintenance activities including patching, upgrades, Applciation Support patched , and database support patches.
- Application multiple webserver ( IIS,NGINX..)
- Execute SOPs and run books for routine and critical operations.
- Ensure adherence to Data Center security, compliance, and operational guidelines.
Troubleshooting & Technical Support
- Identify issues across network, system, application, and database layers.
- Collaborate with network, system admin, database, and application teams for issue resolution.
- Analyze logs, alerts, and error patterns to proactively prevent incidents.
Change & Release Management
- Support deployment activities and validate post-deployment stability.
- Coordinate with change management teams to ensure smooth release cycles.
- Participate in disaster recovery (DR) drills and business continuity exercises.
Documentation & Reporting
- Maintain runbooks, knowledge base articles, incident reports, and SOP documentation.
- Provide periodic performance, health, and incident trend reports to management.