This job board retrieves part of its jobs from: Toronto Jobs | Emplois Montréal | IT Jobs Canada

Check out new job offers everyday

To post a job, login or create an account |  Post a Job

New

Systems Engineer/Admin

Zeva Technology

This is a Full-time position in Redmond, WA posted April 30, 2021.

We are looking for candidates for the following position. If you are interested, please forward your updated resume and rate requirement. Please also answer the following qualifying questions when you reply. If you are not interested but know someone who might be, please feel free to forward this email. Job Title Systems Monitoring EngineerAdmin Location Redmond, WA Remote work Duration 5+ months contract to hire This will be a long-term assignment but starting with 4-6 months with a possibility of converting into a perm role if resource is good. Skill Requirements – Systems Engineer – Monitoring (NOC tools) and Atlassian products admin – NOC Monitoring tools Administration – Zenoss, Graphite, Grafana – Atlassian Products Administration – Confluence, Jira, Crowd and Bitbucket Qualifying Questions – Maintenance and Administration Experience in Monitoring Tools -Zenoss, Graphite, Grafana – Maintenance and Administration Experience in Atlassian Products Administration -Confluence, Jira, Crowd and Bitbucket – Experience with a variety of Amazon Web Services required (including but not limited to EC2, IAM, DynamoDB, SNS, SQS, S3, and EKS). – 5+ years experience in systems administration with UNIXLinux based operating systems. – 5+ years plus of experience managing distributed systems – deployment, monitoring, scaling, debugging is desirable – Understanding of system administration principles (Monitoring, Network, Storage, Scripting). – Must have experience in Server Consolidation and Virtualization. – Must have experience in managing Container based architecture such as Docker and Kubernetes. – Experience with infrastructure automation tools or codingscripting (i.e., Ansible, Terraform, Puppet, Chef, Python, Go, PowerShell) – Operational experience in High Availability and Disaster Recovery environments, such as load balancing, clusters, data replication, etc. Description Of Services This position is responsible for owning, maintaining and administering our various monitoring toolsets (including but not limited to Zenoss, Graphite, Grafana) and Atlassian tools (Confluence, Jira, Crowd and Bitbucket), working with support teams to help define and build out appropriate monitoring strategies that support our corporate and global service offerings. Deliverables – Build, maintain, and improve our monitoring tools infrastructure platform used across the company. – Administer Atlassian applications in a large, multi-methodology organization (Server andor Data Center) – Administer monitoring applications and tools sets and takes ownership of performance, availability and capacity planning. – Perform Jira, Confluence and Bitbucket installations,upgrades, migrations, and add-on installation. – Determine ways to optimizeimprove Jira process workflows, as well as identify where functionality cancannot meet user requests – Collaborate on the backend support of the Atlassian environment, and coordinate serverapplicationdatabase maintenance and upgrades. – Analyzes data collected to learn about the performance of infrastructure, applications and services. – Automates infrastructure that enables efficient builds, testing, deployments, and monitoring. – Responds to outages with prompt and efficient resolution and communicates issues to via the proper support channels – Gathers requirements and functional specifications from support teams in order to build monitors and dashboards that provide insight into the health and performance of systems. – Provides tuning to existing monitors by identifying trends, oddities and potential bottlenecks and triggering thresholds or actionable alerts. – Provides collaboration with other engineering teams around application integration with monitoring tools through the build out of analytics (reporting), visualization (dashboards), and alerting (correlation). – Learns new technologies and how to monitor them. – Advocate to the business on monitoring concepts and capabilities and how to use these systems. – Provides on-call support of our toolsets. – Researches, evaluates and recommends new technologies and provides a roadmap for future monitoring capabilities. – Assists with planning and executing disaster recovery solutions and business continuity planning for our monitoring and collaboration tools. – Ensures support documentation is current for system configuration. – Provides technical guidance and training to the NOC staff around monitoring solutions and reviews monitoring tasks completed. – 5+ years experience in systems administration with UNIXLinux based operating systems. – 5+ years plus of experience managing distributed systems – deployment, monitoring, scaling, debugging is desirable – 5+ years experience with mainstream, centralized, enterprise-class monitoring systems such as Datadog, AppDynamics, or Dynatrace. – Understanding of system administration principles (Monitoring, Network, Storage, Scripting). – Experience with a variety of Amazon Web Services required (including but not limited to EC2, IAM, DynamoDB, SNS, SQS, S3, and EKS). – Must have experience in Server Consolidation and Virtualization. – Must have experience in managing Container based architecture such as Docker and Kubernetes. – Operational experience in High Availability and Disaster Recovery environments, such as load balancing, clusters, data replication, etc. – Experience with infrastructure automation tools or codingscripting (i.e., Ansible, Terraform, Puppet, Chef, Python, Go, PowerShell) – Familiar with LDAP, Active Directory, and Single Sign On implementations. – Knowledge of ITIL best practices – Experience working on Service-now Incident, Change and Knowledge management tools. – Ability to work autonomously with minimal supervision and manage multiple issues at once. – Familiar with PKI Infrastructure and web applications tools like Apache Tomcat, HTTPD and Nginx. – Familiar with Splunk query language, SQL, andor metrics (Statsd, Prometheus, CollectD, InfluxDB, Graphite, Grafana and CloudWatch) knowledge a plus. Regards, Gavin