About the Role:
At C4Scale, we are building production AI systems that actually run businesses — not demos, not prototypes, but agents that process thousands of real-world decisions every week. We are looking for an Agentic Engineer to design, build, and operate multi-agent AI systems that work reliably in production.
You will own end-to-end delivery of agentic workflows: from prompt engineering and agent design to orchestration, evaluation, and production monitoring.You will work alongside a team that has shipped 7+ production AI systems for B2B clients across fintech, logistics, oil & gas, and SaaS.What You Will Do:
Design and build multi-agent systems using frameworks such as LangGraph, CrewAI, LangChain, or custom orchestration approaches.Translate business workflows into structured agentic pipelines with clear agent roles, tool definitions, and handoff logic.Implement tool-calling agents that integrate with REST APIs, databases, document systems, and external services.Build human-in-the-loop workflows with escalation logic, confidence thresholds, and audit trails.Design and run LLM evaluations: build eval datasets, run structured tests, measure accuracy, and track regressions.Build RAG (Retrieval-Augmented Generation) pipelines with vector databases for knowledge-grounded agent responses.Monitor, debug, and improve agent performance in production using tracing, logging, and structured evaluation.Collaborate with backend engineers to integrate agentic workflows into production APIs and data pipelines.What You Will Need:
2-5 years of total software engineering experience, with at least 1 year building and shipping LLM-powered applications or agentic systems in production.Hands-on experience with LangChain, LangGraph, CrewAI, AutoGen, or equivalent agentic frameworks.Strong proficiency in Python; ability to write clean, testable, production-grade code.Experience building RAG pipelines with vector databases such as Pinecone, ChromaDB, Qdrant, or pgvector.Practical experience with prompt engineering: structured output, tool calling, few-shot prompting, chain-of-thought, and system prompt design.Experience integrating LLMs via APIs from OpenAI, Anthropic, Google, or open-source models (Mistral, LLaMA, etc.).Familiarity with observability tools for LLM systems: LangSmith, Langfuse, Helicone, or equivalent tracing platforms.Working knowledge of REST APIs, async Python, and task queues (Celery, RQ, or similar) for orchestrating multi-step workflows.Preferred Qualifications / Added Advantage:
Experience with LLM evaluation frameworks: building eval datasets, running structured tests, tracking prompt regressions.Understanding of multi-agent coordination patterns: supervisor agents, parallel agents, sequential pipelines, and reflection loops.Experience deploying agent systems on cloud infrastructure (AWS, GCP, or Azure) using Docker and CI/CD pipelines.Prior experience in a client-facing or consulting environment where you shipped AI systems for external stakeholders.What We Offer:
MacOS laptop.Competitive salary and benefits package.Opportunity to work directly on production AI systems used by real businesses, not internal tools or demos.Collaborative and inclusive work environment.Continuous learning and professional development opportunities.