SRE
This role is perfect for someone who thrives on supporting customer-facing platforms, enjoys working with cutting-edge tools, and is passionate about delivering seamless digital experiences. You'll play a pivotal part in ensuring the stability and performance of a large-scale loyalty platform as it undergoes significant transformation aligned with strategic priorities. The organisation offers a people-focused culture that values positive actions, flexibility, and continuous learning.
What you'll do:
As a Site Reliability Engineer – Splunk Specialist, you will be at the heart of maintaining operational excellence for a sophisticated customer loyalty platform. Your day-to-day responsibilities will involve providing expert support across multiple technology domains—ranging from identity management to payment processing—while collaborating closely with both internal teams and external vendors. You will be instrumental in upholding robust IT Service Operations standards through meticulous documentation and process adherence. Your ability to monitor system health using advanced tools like Splunk will enable you to detect issues early, respond swiftly to incidents, and ensure optimal platform performance. By participating in on-call rotations, you will provide essential incident support when it matters most. Additionally, your mentoring skills will help elevate your colleagues’ understanding of SRE methodologies. Through your efforts in leading technical investigations and driving automation initiatives, you will directly contribute to improving both operational efficiency and customer satisfaction.
- Provide comprehensive operational support as part of the Customer Loyalty Platform Operations team, ensuring stable day-to-day functioning of critical systems including identity management, payments, consent, data privacy, and loyalty applications.
- Adhere to IT Service Operations processes such as Incident, Problem, and Change Management by documenting and updating Standard Operating Procedures and technical documentation to maintain compliance and best practice.
- Engage proactively with stakeholders including vendor partners and DevOps teams to build strong relationships that facilitate effective functional support across the platform ecosystem.
- Participate in a 24x7 rotating on-call schedule to deliver Severity 1 incident support, ensuring rapid response and resolution for critical issues affecting business operations.
- Deliver first and second level technical support during business hours to internal teams, assisting with incident resolution and troubleshooting complex technical challenges.
- Monitor incidents, demand, and capacity across the platform using advanced tools like Splunk and New Relic to ensure service level objectives are consistently met or exceeded.
- Coach and mentor team members in Site Reliability Engineering (SRE) principles and ITIL practices to foster skill development and knowledge transfer within the team.
- Lead technical spikes and contribute to delivery planning activities that drive improvements in system reliability, scalability, and performance.
- Champion initiatives focused on enhancing the digital experience for customers by identifying opportunities for automation and continuous improvement throughout the platform lifecycle.
What you bring:
To excel as a Site Reliability Engineer – Splunk Specialist in this role, you will bring substantial experience supporting complex customer-facing platforms within an operations or DevOps context. Your background should include extensive use of monitoring tools—particularly Splunk—for detailed analysis and proactive issue detection. You are adept at navigating IT Service Operations frameworks such as Incident Management or Change Management while maintaining clear documentation standards. Your technical toolkit includes proficiency with cloud infrastructure (especially AWS), container orchestration (Kubernetes), CI/CD pipelines (Bamboo/GitHub), database management (PostgreSQL), API gateways (Apigee), authentication solutions (Okta/Auth0), as well as scripting languages relevant for automation tasks. Beyond your technical acumen, your interpersonal skills enable you to collaborate effectively across teams—building trust with stakeholders while coaching others on best practices. A passion for continuous improvement drives you to seek out new efficiencies that enhance both system reliability and user experience.
- A bachelor’s degree in computer science, engineering or related field is desirable but not mandatory if equivalent experience is demonstrated through previous roles.
- Extensive hands-on experience in DevOps or Site Reliability Engineering roles supporting customer-facing platforms with a focus on operational stability.
- Proven track record of using Splunk for at least three years—including creating complex alerts, building dashboards, troubleshooting logs with REGEX queries, SQLs, and performing ad-hoc data extraction.
- Solid experience with New Relic (minimum two years), configuring synthetics monitoring, alert policies/conditions, dashboards, SLOs, and troubleshooting application performance issues.
- Familiarity with Spring Boot/Java for CRON job creation as well as API testing tools such as Postman or Insomnia (at least two years’ practical use).
- Competence with version control systems (GitHub/Bamboo), SQL databases (PostgreSQL), change management processes, containerised architectures (Kubernetes), Docker/Rancher pipelines.
- Understanding of authentication/authorisation technologies such as Okta or Auth0; knowledge of API gateways (e.g., Apigee) is advantageous.
- Experience working with cloud technologies (AWS preferred) alongside implementing recommendations for best practice in continuous improvement environments.
- Ability to coach peers on SRE/ITIL practices while fostering an inclusive culture of automation within the team environment.
- Desirable: Experience supporting microservices architectures or loyalty applications; prior exposure to API integration projects.
What sets this company apart:
This organisation stands out through its unwavering commitment to making a positive impact on customers’ lives by fostering a safer world through technology-driven solutions. The workplace culture is built around shared values that emphasise readiness for anything—a mindset that encourages adaptability while putting people first. Employees benefit from flexible working arrangements designed to promote work-life balance alongside generous training opportunities that nurture professional growth at every stage of your career journey. Collaboration is at the core of daily operations; you’ll find yourself supported by knowledgeable colleagues who value open communication and teamwork above all else. The company’s dedication to continuous improvement ensures that every team member has access to modern tools and resources needed for success—making it an ideal environment for those who thrive on learning new skills while contributing meaningfully to innovative projects that shape the future of customer engagement.
What's next:
If you are ready to take your expertise in Splunk and site reliability engineering to the next level within an inclusive team environment focused on positive outcomes—this is your chance!
Apply today by clicking on the link below to start your journey towards making a real difference.
Aboriginal and Torres Strait Islander Peoples are encouraged to apply.
To apply please click apply or call Chane Prasongdee on 02 8289 3118 for a confidential discussion.
About the job

Contract Type: Perm
Specialism: Technology & Digital
Focus: Infrastructure, Cloud & DevOps
Industry: IT
Salary: Attractive salary package
Workplace Type: Hybrid
Experience Level: Associate
Location: Sydney CBD
FULL_TIMEJob Reference: CWBOUW-95E5B7A7
Date posted: 30 July 2025
Consultant: Chane Prasongdee
sydney technology-and-digital/infrastructure-cloud-and-devops 2025-07-30 2025-08-29 it Sydney CBD New South Wales AU 2000 Robert Walters https://www.robertwalters.com.au https://www.robertwalters.com.au/content/dam/robert-walters/global/images/logos/web-logos/square-logo.png true