Observability Engineer
Neuberger Berman Group
Neuberger Berman’s Technology team is seeking an Observability Engineer to lead and evolve our observability strategy across cloud and on-premise environments. You will help build and operate a server monitoring platform that continuously validates service health (24/7) across business-critical systems—including external websites and key infrastructure components (e.g., firewalls, OpenShift). You will design and implement end-to-end monitoring solutions spanning logs, metrics, traces, Service Level Objectives (SLOs), synthetic monitoring, and RUM (Real User Monitoring) to improve reliability, accelerate incident response, and deliver clear visibility into service performance.
This is an individual contributor role with strong engineering/scripting expectations (not a pure administrator role, though admin experience is helpful). You will partner closely with application, SRE/DevOps, infrastructure, and security teams and act as a champion/evangelist for observability tooling and standards. The environment includes a current OpenView footprint with a migration to Datadog, with workflows integrating into ServiceNow for incident/ticket routing and escalation.
What you'll do:
Partner closely with application, DevOps engineering, SRE/operations, infrastructure, and security teams to understand reliability goals and translate them into scalable monitoring/observability solutions across cloud and on-prem environments (Windows and Unix).
Design, build, and maintain scalable observability architectures and platforms, with ownership of monitoring capabilities for key applications and services (application ownership).
Develop automated processes to continuously scan and validate uptime/health (24/7) for business-critical services, including external-facing websites and supporting infrastructure.
Implement and optimize telemetry collection, alerting, dashboards, and service views; drive adoption of OpenTelemetry (OTel) and consistent logging/metrics/tracing standards (core logging and platform telemetry alignment).
Define and operationalize SLOs and implement actionable alerting strategies that reduce noise and improve MTTR through correlation, enrichment, and threshold tuning.
Implement and evolve APM capabilities and user experience monitoring, including RUM (Real User Monitoring) and synthetic monitoring approaches.
Integrate observability tooling with incident/problem management processes and ITSM workflows (e.g., Datadog → ServiceNow); support ticket routing/escalation and produce runbooks, post-incident reviews, and executive/operational reporting.
Automate onboarding and configuration for telemetry, dashboards, monitors, and alerts using scripting and infrastructure-as-code; ensure consistency and repeatability across Windows Server and Unix (Linux/Solaris).
Collaborate on platform evolution and cost/scale optimization, continually improving coverage, data quality, developer experience, and overall reliability outcomes.
Champion and evangelize observability practices and tooling adoption across technology teams, helping incorporate new applications/tools into the monitoring platform.
Required Skills and Experience:
BS/BA in Computer Science, Information Systems, Engineering, or equivalent experience.
5+ years in Observability/APM/SRE/Platform Engineering with a track record of delivering production-grade telemetry and reliability outcomes.
Proficiency operating in both Windows Server and Unix (Linux/Solaris) environments, including service instrumentation, agent/collector deployment, and OS-specific performance analysis.
Strong experience designing and operating distributed tracing, metrics and logging standards, SLOs/error budgets, and actionable alerting using modern observability practices.
Hands-on experience with cloud monitoring across Azure and AWS, integrating platform telemetry into centralized observability solutions.
Hands on experience with Observability/APM suites (OpenView, AppDynamics, Datadog) and network management tools (Network Node Manager, Network Automation, NetProfiler).
Scripting and automation expertise (e.g., Python, PowerShell, Bash) and familiarity with APIs/SDKs; experience using infrastructure-as-code to manage observability configurations (e.g., Terraform) and configuration formats (e.g., YAML).
Demonstrated ability to reduce alert noise and MTTR through correlation, enrichment, and threshold tuning; experience producing service maps, dependency views, and clear dashboards.
Excellent communication and stakeholder management skills, with the ability to explain technical concepts to non-technical audiences.
Ability to work independently and collaboratively in a fast-paced environment; strong documentation habits and attention to detail.
Nice to Have
Experience with .NET development (C#), including instrumentation patterns for observability in .NET applications.
Experience in financial services or other regulated industries.
Familiarity with ITSM integrations and CMDB alignment for incident, problem, and change processes.
Exposure to APM and monitoring suites and event correlation approaches; knowledge of network monitoring concepts.
Experience with CI/CD integration, synthetic testing strategies, and performance/capacity analysis for latency-sensitive systems.
Relevant certifications in observability, cloud monitoring, or related platforms.
This is a hybrid position. Currently, the hybrid work schedule for this position is 2-3 days in the office. Please understand that the hybrid schedule may be modified or eliminated at any time at Neuberger Berman’s discretion.
Neuberger Berman is unable to offer visa sponsorship for this position. Applicants must be authorized to work in the United States without the need for current or future sponsorship.
#LI-DD2
#LI-Hybrid
Engineer II
Compensation Details
The salary range for this role is $110,000-$130,000. This is the lowest to highest salary we in good faith believe we would pay for this role at the time of this posting. We may ultimately pay more or less than the posted range, and the range may be modified in the future. This range is only applicable for jobs to be performed in the job posting location. An employee’s pay position within the salary range will be based on several factors including, but limited to, relevant education, qualifications, certifications, experience, skills, seniority, geographic location, business sector, performance, shift, travel requirements, sales or revenue-based metrics, market benchmarking data, any collective bargaining agreements, and business or organizational needs. This job is also eligible for a discretionary bonus, which, along with base salary and retirement contributions, is part of our total comprehensive package. We offer a comprehensive package of benefits including paid time off, medical/dental/vision insurance, retirement, life insurance and other benefits to eligible employees.Note: No amount of pay is considered to be wages or compensation until such amount is earned, vested, and determinable. The amount and availability of any bonus, commission, production, or any other form of compensation that are allocable to a particular employee remains in the Company's sole discretion unless and until paid and may be modified at the Company’s sole discretion, consistent with the law.
Neuberger Berman is an equal opportunity employer. The Firm and its affiliates do not discriminate in employment because of race, creed, national origin, religion, age, color, sex, marital status, sexual orientation, gender identity, disability, citizenship status or protected veteran status, or any other characteristic protected by local, state, or federal laws, rules, or regulations. If you would like to contact us regarding the accessibility of our website or need assistance completing the application process, please contact onlineaccommodations@nb.com.
Learn about the Applicant Privacy Notice.