Software Engineer 3
Job Summary
As a Cloud Infrastructure / Site Reliability Engineer, you will operate at the intersection of development and operations. You will engage and enhance all aspects of the cloud services lifecycle from design through deployment, operation, and refinement. You will be responsible for maintaining these services by measuring and monitoring their availability, latency, and overall system health and building automation for efficient cloud operations management.
You will play a crucial role in sustainably scaling systems through automation and driving changes that improve reliability and velocity. As part of your responsibilities, you will administer cloud-based environments that support our SaaS/IaaS offerings implemented on a microservices, container-based architecture (Kubernetes). In addition, you will oversee a portfolio of customer-centric cloud services (SaaS/IaaS), ensuring their overall availability, performance, and security. You will work closely with NetApp and cloud service provider teams (to include Azure) from Research Triangle Park (RTP), D.C., Pittsburg and more.
Due to the critical nature of the services we support, this position involves participation in a rotation-based on-call schedule as part of our global team. This role offers the opportunity to work in a dynamic, global environment, ensuring the smooth operation of vital cloud services. To be successful in this role, you should be a motivated self-starter and self-learner, possess strong problem-solving skills, and be someone who embraces challenges.
Responsibilities
- Automation and Efficiency: Identify tasks and areas where automation can be applied to achieve time efficiencies and risk reduction. Develop software for deployment automation, packaging, and monitoring visibility.
- Team Collaboration and Influence: Work in tandem with other Cloud Infrastructure Engineers and developers to ensure maximum performance, reliability, and automation of our deployments and infrastructure. Consult and influence developers on new feature development and software architecture to ensure scalability.
- Debugging, Troubleshooting, and Advanced Support: Undertake debugging and troubleshooting of service bottlenecks throughout the entire software stack. Additionally, provide advanced tier 2 and 3 support for NetApp's Cloud Data Service solutions.
- Analysis, and Infrastructure Maintenance: Continuously monitor, analyze, and measure system health, availability, and latency using tools like Prometheus, Stackdriver, ElasticSearch, Grafana, and SolarWinds. Develop strategies to enhance system and application performance, availability, and reliability. In addition, maintain and monitor the deployment and orchestration of servers, docker containers, databases, and general backend infrastructure.
- Incident Response and Troubleshooting: Address and perform Root Cause Analysis (RCA) of complex live production incidents and cross-platform issues involving OS, Networking, and Database in cloud-based SaaS/IaaS environments. Implement SRE best practices for effective resolution.
- Document system knowledge as you acquire it, create runbooks, and ensure critical system information is readily accessible. Security Management: Stay updated with security protocols and proactively identify, diagnose, and resolve complex security issues. Issue Tracking and Resolution: Use Atlassian’s tool chain along with first party cloud service management tools to track and resolve issues based on their priority.
- Directly influence the decisions and outcomes related to solution implementation: measure and monitor availability, latency, and overall system health.
Job Requirements
- 5+ years experience in scripting and infrastructure automation using tools such as PowerShell, Python, Go or Ruby.
- Deep working knowledge of Containers, Kubernetes, Serverless computing implementation, and distributed systems design patterns. Knowledge of DevOps/SRE development methodologies.
- Proficiency in Linux/Unix and CoreOS. Experience with cloud platforms such as AWS, Azure, or Google Cloud.
- Ability to lead a scrum team, influence stakeholders to effectively maintain a product backlog, manage sprints.
- This position will have ON-CALL rotations as well as an ask to work odd hourss.
Education
- A Bachelor of Science Degree in Computer Science, a master’s degree; or equivalent experience is required
All internal movements within the Product Group via requisition will be lateral, offering valuable growth opportunities to extend your skills in a new area. Opportunities for a promotion will be reviewed in the normal course of business, aligned with our promotion process.
Equal Opportunity Employer:
NetApp is firmly committed to Equal Employment Opportunity (EEO) and to compliance with all federal, state and local laws that prohibit employment discrimination based on age, race, color, gender, sexual orientation, gender identity, national origin, religion, disability or genetic information, pregnancy, protected veteran status, and any other protected classification.
Did you know...
Statistics show women apply to jobs only when they're 100% qualified. But no one is 100% qualified. We encourage you to shift the trend and apply anyway! We look forward to hearing from you.
Why NetApp?
Why You'll Thrive at NetApp
At NetApp, you won't wait for the perfect moment—you'll make it. The early planning, the extra thought, the bold idea that turns good into great: That's how our people operate and how we continue to push the boundaries of data infrastructure.
NetApp is the trusted partner for organizations transforming data into opportunity. As the only enterprise-grade storage service natively embedded in Google Cloud, AWS, and Microsoft Azure, we empower customers to run everything from traditional workloads to enterprise AI with unmatched performance, resilience, and security.
Our culture
We celebrate mold breakers, bold thinkers, and problem solvers. We reward initiative, impact, and ownership. We provide flexibility so you can balance professional ambition with your personal life. Here, differences are not just welcomed—they drive everything we do.
If you're ready to innovate, rise to the challenge, and own every moment - make your next move your best one. Apply now.
Apply nowNetApp is firmly committed to Equal Employment Opportunity (EEO) and to compliance with all federal, state and local laws that prohibit employment discrimination based on age, race, color, gender, sexual orientation, gender identity, national origin, religion, disability or genetic information, pregnancy, protected veteran status, and any other protected classification. We pledge to take every reasonable step to ensure that our applicants and employees are respected, treated fairly, and with dignity. See the EEO poster. NetApp makes reasonable accommodations, consistent with applicable laws, for religious purposes and for the known physical or mental limitations of an otherwise qualified applicant or employee with a disability, who can perform the essential job functions unless undue hardship would result.
Reasonable accommodation
If you are an applicant with a physical or mental disability that requires reasonable accommodation for any part of our application process, please email accessibility@netapp.com. Each request for reasonable accommodation will be considered on a case-by-case basis, consistent with applicable laws and regulations. Please note, this email address is only for accommodation requests; we do not accept unsolicited resumes.
Data privacy
We care about your privacy and therefore ask that you read our Applicant Privacy Policy before you submit any personal information to us.
Note to agencies
We’re sorry, but we cannot accept unsolicited resumes that are sent to NetApp employees or contractors. We will not compensate for a referral without a current contract on file with our Talent Acquisition team. If you’re interested in helping us with a particular role, please call your partner in Talent Acquisition to discuss.