Yardi's SRE team is responsible for evaluating, implementing, and supporting systems for monitoring, alerting, log analysis and configuration management to meet internal and external customer needs. We are seeking a Reliability Engineer to help build and deploy a diverse toolset that supports our large scale, global infrastructure. You will work in a close-knit multi-functional team to actively monitor and improve end-to-end system performance, and identify and remediate bottlenecks and potential failures throughout our infrastructure.
- Responsible for the design, implementation, and support of a large scale monitoring infrastructure across multiple data centers.
- Engage closely with architects, engineering and operations teams to deliver highly available and scalable services.
- Responsible for Business System resiliency, which includes problem management, monitoring, alert management, and scalability design.
- An ability to adjust to constant business change is key, where common types of changes include new requirements, evolving goals and strategies, and emerging technologies.
- Stability improvement across the entire stack: hardware, software, application and network.
- Working knowledge of multi-tier applications and their dependencies including load balancing, TCP/IP networking, web services, LDAP and DNS.
- Knowledge of deployment automation, orchestration and configuration management: experience with Puppet, SCCM, and TeamCity.
- Linux system administration experience.
- Windows 2008/2012 Server and IIS administration experience.
- Strong scripting skills in one of Python, Bash, or PowerShell.
- Solid background in systems engineering and operations.
- Windows Server (2008 R2, 2012 R2)
- Linux (CentOS)
- IIS, Tomcat, Apache
- SQL Server, MySQL, ElasticSearch, Redis
- F5 BIG-IP
- Logstash, Kibana, Graphite, Grafana, Icinga, python/django
- Puppet, TeamCity, Jira, SCCM
- SendMail, MailScanner, MailWatch