Systems Engineer - Irving, TX

Irving, TX

5 days ago

Systems Engineer

Tristar

In office Position

POSITION SUMMARY:

The Senior Systems Engineer is a hands-on senior individual contributor responsible for designing, building, and operating TRISTAR’s core infrastructure platform with a strong emphasis on Linux systems, Kubernetes, and automation. This role will own the Kubernetes platform end-to-end—cluster build, lifecycle management, operational standards, reliability, and day-2 operations—while partnering closely with development teams as TRISTAR transitions toward a DevOps operating model. Success in this role requires deep technical ownership, strong troubleshooting skills across distributed systems, and the ability to improve reliability through thoughtful design, observability, and repeatable automation.

ESSENTIAL DUTIES AND RESPONSIBILITIES:

Kubernetes Platform Engineering & Lifecycle:

• Design, build, and operate Kubernetes clusters in production, including upgrades,

patching, scaling, and reliability improvements.

• Establish platform standards and operating practices as the environment matures

(cluster configuration, access patterns, resource governance, and runbooks).

• Serve as the senior escalation point for Kubernetes platform issues and drive resolution

through root-cause analysis and prevention.

Kubernetes Storage, Backup/Restore & Disaster Recovery:

• Design and implement Kubernetes storage patterns (StorageClasses, PV/PVC lifecycle,

capacity planning) and support stateful workloads.

• Implement, test, and maintain Kubernetes-native backup/restore and recovery

procedures.

• Integrate Kubernetes persistence needs with enterprise storage platforms, including Dell

ObjectScale and existing virtualization/storage systems.

Ingress, Load Balancing & Kubernetes Networking:

• Own Kubernetes traffic entry, including ingress controllers, load balancers, routing

patterns, and TLS/certificate handling.

• Define repeatable patterns for exposing services and troubleshooting connectivity across

platform components.

Linux Systems Engineering:

• Administer and harden Linux systems that support the platform, including patching,

performance tuning, service reliability, logging, and baseline configuration.

• Troubleshoot system and platform issues across compute, storage, and network

dependencies.

Automation, Scripting & API Integrations:

• Build automation to reduce manual work and increase consistency across infrastructure

operations using Python/PowerShell/Bash and API-driven workflows.

• Evaluate, recommend, and help implement an automation / configuration management

approach (tooling, patterns, and standards) to support repeatable tasks such as

provisioning, configuration enforcement, patching, drift detection, and validation.

• Develop reusable automation assets (modules/playbooks/templates/scripts) and

establish version-controlled workflows (Git), documentation, and operational handoff

practices.

• Leverage RESTful APIs to integrate systems and create operational workflows (health

checks, reporting, event-driven automations, and change validation).

Monitoring, Alert Response & Operational Reporting:

• Monitor alert sources and observability tooling (including SolarWinds on-prem),

investigate events, and drive issues to completion.

• Document incidents, actions taken, and final resolutions contribute to improved alerting

quality and operational visibility.

Data Center Support (Occasional):

• Provide occasional on-site support as needed in the data center for infrastructure prep

and troubleshooting (racking equipment, cabling, and physical connectivity verification).

• Maintain working familiarity with server hardware and data center best practices to

support rare hands-on needs.

Cloud Readiness & Future-State Hosting:

• Partner with development and infrastructure teams to plan and progress TRISTAR’s

long-term transition toward cloud-hosted deployments of the application stack

• Contribute to cloud design discussions with a practical understanding of core cloud

concepts (networking, identity/access, security, reliability, scalability, and cost

considerations) across major providers (AWS/Azure/GCP).

• Translate application and platform requirements into cloud-ready operational patterns

(container orchestration in cloud, managed services vs self-managed tradeoffs,

environment isolation per client, and deployment repeatability).

• Support early-stage cloud initiatives such as proofs of concept, reference architectures,

and migration planning, including identifying skill/tooling gaps and recommending

realistic next steps.

• Apply Infrastructure-as-Code and automation principles to cloud readiness efforts to

ensure future deployments are repeatable, supportable, and auditable.

Documentation & Technical Standards:

• Create and maintain IT documentation, including platform runbooks, operational

procedures, and architecture/standards documentation.

Collaboration, Service Desk Support & Cross-Team Execution:

• Work with the Manager, Network Services and general IT staff to analyze and resolve

technical issues affecting infrastructure and applications.

• Partner closely with development teams as part of TRISTAR’s DevOps transition to

improve operability, deployment reliability, and platform usability.

• Work alongside the service desk to remedy end-user workstation issues; backfill and

answer service desk calls when required.

Schedule Flexibility & Travel:

• Perform night/day/weekend work as required to meet project objectives and support

maintenance windows.

• Traveling to remote sites is rare, but possible and may be required as needed

QUALIFICATIONS REQUIRED:

Education/Experience: Bachelor’s degree in a related field (preferred); minimum of 7-year

related experience; or equivalent combination of education and experience.

Knowledge, Skills, and Abilities:

• 7+ years of progressively responsible experience in systems/infrastructure engineering

with strong production experience in Linux administration.

• Hands-on production experience with Kubernetes, including cluster build and lifecycle

management (architecture, upgrades, patching, scaling, troubleshooting).

• Strong understanding of Kubernetes storage and stateful workload operations, including

troubleshooting PV/PVC and storage provisioning patterns.

• Experience implementing Kubernetes-native backup/restore practices and validating

recovery procedures.

• Demonstrated automation experience using scripting (Python/PowerShell/Bash) and

leveraging RESTful APIs for systems integration and automation.

• Experience with monitoring/observability platforms and operational alerting; SolarWinds

experience strongly preferred.

• Strong troubleshooting skills across distributed systems, networking fundamentals, and

infrastructure dependencies.

• Strong written and verbal communication skills, including

documentation/runbooks/standards.

EQUIPMENT OPERATED/USED: Computer, 10-key, printer, copier, fax machine, and other

office equipment.

SPECIAL EQUIPMENT OR CLOTHING: Appropriate office attire.

Save & Apply Later Applying Later... Click to ApplyI AppliedDidn't Apply

Confirm your E-mail: Send Email

Apply for this job

Next Job »

All Jobs from Tristar

6 Tristar jobs in Irving, TX 11 Tristar jobs in Texas 46 Tristar jobs in