Senior Site Reliability Engineer
Sight Machine
Team Culture
Great things happen when people can bring their authentic selves to work. We empower all of our employees to share their perspectives, passions and experiences because collectively we make a better, stronger team. Our team members collaborate closely with peers & cross functional stakeholders throughout the business, our clients on the forefront of digital transformation, and the cutting edge of digital manufacturing thought leadership.
We take pride in our self-starter culture where employees are enabled and encouraged to achieve their professional goals through leadership guidance, learning and development. Our philosophy is that careers are continuous journeys, and we dedicate time and offer resources so that employees can reach their full potential.
Benefits + Perks
We value you at and outside of work and know your loved ones are important. Our benefits are designed to support you and your family’s health through life’s expected and unexpected events.
Our Benefits Include:
Competitive Salary + Stock Options
Health Care Coverage + Life Insurance + Health Savings Account + Flexible Spending
Account (includes spouse + children)
Flexible Vacation Policy
Adaptable Working Schedule and Environment
Our Perks Include:
Casual Dress Attire
Hybrid work flexibility
Catered Lunches, Snacks and Beverages
Commuter Savings Program
Company Outings
Designated Volunteering Hours + Group Volunteer Events
Sight Machine is proud to be an equal opportunity employer and considers candidates regardless of age, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status. Sight Machine also considers qualified applicants regardless of criminal histories, consistent with legal requirements.
About Sight Machine, Inc.
Sight Machine strengthens manufacturers by providing the industry’s only standard data model and system-level visualization capabilities. By integrating all crucial data into a single innovative platform, everyone involved in the fabrication process can visualize, contextualize and examine data in one intuitive interface.
Sight Machine is committed and mission-driven to improve lives, strengthen communities and make the world cleaner through continuously re-envisioning manufacturing processes - making them more efficient, sustainable and absolute. Founded in Michigan in 2011 and expanded to San Francisco in 2012, Sight Machine blends the spirit of technology innovation and the down to earth style of Detroit manufacturing. Our team includes early leadership from Yahoo, Tesla Motors and Oracle. Together, we share wide industry knowledge and a commitment to advance manufacturing to a more sustainable future.
At Sight Machine, you will work with manufacturing leaders in the automotive, medical device, apparel, construction, and pharmaceutical industries. You will have access to, and work with massive amounts of factory floor data to help uncover insights on how customers make products and develop solutions to pressing business problems. The platform solves problems like Extract Transform Load (ETL), information retrieval, data aggregation and analytics, factory automation, distributed computing, and security.
We place great value on professional, technical, and personal growth in an inclusive, collaborative environment. The ideal candidate will have a passion for technology and a strong can-do attitude.
In this role you will join our Site Reliability and Infrastructure Team in deploying, managing, optimizing and upgrading the systems that run Sight Machine software. You must love learning new technology, problem solving, and building automation in the Infrastructure as Code paradigm.
Success will take a blend of technical expertise, experience with deployment technology frameworks, customer-centric focus, and a team-spirited approach to solve architectural challenges supporting your peers in Application Engineering.
- Employing DevOps principles, provide technical operational support for comprehensive cloud infrastructure operations for all customers, internal and external.
- Troubleshoot and resolve complex systems problems that cross multiple layers of the systems stack from networking, to operating systems, to cloud resources, to databases.
Instrument, and respond to Monitoring and Alerting infrastructure for critical services - Participate in our on-call support schedule
- Proactively pursue opportunities of operational innovation to improve stability, reliability, availability of the all platform components, and optimize efficiency, and propagate a security-first culture
- Creating, revising, and testing operational runbooks and automation for maintaining Sight Machine Infrastructure
The Role
In this role you will join the Cloud Infrastructure Team and take on tasks that include a focus on automation, tools, deployment, monitoring, managing and optimizing the systems that run Sight Machine software. You must love learning new technology, have excellent problem solving, and embrace the Infrastructure as Code paradigm.
Success will take a blend of technical expertise, experience with deployment technology frameworks, customer-centric focus, and a team-spirited approach to solve architectural challenges supporting your peers in Development Engineering.
Responsibilities
Employing DevOps principles, provide technical operational expertise for comprehensive cloud infrastructure operations for all customers, internal and external
Troubleshoot and resolve complex systems problems across multiple layers of the systems stack from ci/cd, container-based systems, networking, operating systems, cloud resources, and databases
Instrument Monitoring and Alerting infrastructure for critical services
Creating, revising, and testing operational runbooks and automation for maintaining Sight Machine Infrastructure
Design and code appropriate tools to support our internal platforms and systems
Proactively pursue opportunities of operational innovation to improved stability, reliability, and availability of Sight Machines services
Participate in our on-call schedule
Requirements
Embody a Quality-first & Security-first culture in all that you do
5+ years of experience with Kubernetes / Docker in at least one of the top tier cloud providers (Azure, GCP, AWS, etc.)
5+ years of experience coding with languages Python, Java, Go, Terraform, etc
5+ years of experience using IaC and CI/CD tools like FluxCD (or similar), Jenkins, Terraform, Github, etc.
Strong experience with the Linux OS
Strong working knowledge of Networking (TCP/IP and Application)
A willingness to author technical documentation for design, workflows, processes, best practices, etc
Willing to mentor other team members and engineers
Strong bias for action vs endless planning, you’re hands on, have made a mistakes,learned from them and can balance risk vs. impact to customers
You value clear communication and you're empathetic and respectful of others
Operational experience with monitoring/alerting systems such as Sentry, Opsgenie, Prometheus
- Deep understanding of cloud performance, and how to diagnose and resolve bottlenecks, and keep the performance at optimal levels
- US Citizen or US Permanent Resident/Green Card Holder
Nice to haves
- Experience with elements of our current tech stack are a plus: Kubernetes, FluxCD, Terraform, Helm Charts, Prometheus, Elasticsearch, Python, Java, Kafka, Postgres, and Jenkins
- Previous experience or a keen interest in industrial IoT, analytics, or manufacturing a plus
- Coding experience in any of Python, Bash, Java, Go
- Previous experience or a keen interest in industrial IoT, analytics, or manufacturing a plus