Informații principale
Senior Operations Engineer (m/w/d) Kubernetes/ ArgoCD Remote and Berlin
Poziție: Nu este specificat
Start: 18 Aug. 2025
Final: 31 Oct. 2025
Localizare:
Berlin, Germania
Metoda de colaborare: Doar proiect
Tarif pe oră: Nu este specificat
Ultima actualizare: 4 Iul. 2025
Descrierea și cerințele proiectului
Tasks:
- Validation of deployment artifacts from an operations perspective.
- Defining and enforcing quality assurance measures (e.g. required documentation of standard operation procedures,
successful test reports, …) to ensure the high quality of delivered products and services.
- Ensuring rollback strategies and operational monitoring (observability) are in place for production deployments
- Monitoring system health, performance metrics, and service availability across multi-tenant environments.
- Identifying, analyzing, and resolving incidents, minimizing service disruption.
- Triggering root cause analysis and implementation of corrective and preventive actions
- Address recurring operational issues by automating remedial standard operations processes
- Validate all automated procedures following the established software development lifecycle including staging, testing,
and validation reviews
- Implementing monitoring and logging strategies to support audit and compliance requirements.
- Performing routine security scans and remediating identified vulnerabilities.
Requirements:
- At least of 5 years of operational experience with self-managed Kubernetes clusters, self-managed services providing
Kubernetes clusters and productive applications or systems in on premise environments
- Deep understanding of networking concepts, including protocols, load balancing, and security.
- Profound knowledge and implementation experience with CI/CD processes, tooling (e.g. GitLab, Jenkins, Tekton,
Argo Workflows, and Argo CD), concepts and associated quality and security assurance for software delivery
- Fundamental understanding of core operations processes (incident management, change management, problem
management, IT Service Management) as well as SRE concepts
- Experience in gathering operational insights from monitoring or observability including SLI/SLA/SLO management
and tracking.
- Hand-on experience in documenting procedures properly and enforcing clear runbooks or playbooks.
- Hands-on experience with monitoring and logging tools (e.g., Prometheus, Grafana, Datadog).
Must-have language skills:
- Proficiency in both speech and writing in English (at least C1).
Preferred experience
- Project experience in software engineering (in Go Lang, C/C++ or Python) with significant experience in building
RESTful services in distributed environments.