About the Role:

At C4Scale, we are building production AI systems that actually run businesses — not demos, not prototypes, but agents that process thousands of real-world decisions every week. We are looking for an Agentic Engineer to design, build, and operate multi-agent AI systems that work reliably in production.

  • You will own end-to-end delivery of agentic workflows: from prompt engineering and agent design to orchestration, evaluation, and production monitoring.
  • You will work alongside a team that has shipped 7+ production AI systems for B2B clients across fintech, logistics, oil & gas, and SaaS.
  • What You Will Do:

  • Design and build multi-agent systems using frameworks such as LangGraph, CrewAI, LangChain, or custom orchestration approaches.
  • Translate business workflows into structured agentic pipelines with clear agent roles, tool definitions, and handoff logic.
  • Implement tool-calling agents that integrate with REST APIs, databases, document systems, and external services.
  • Build human-in-the-loop workflows with escalation logic, confidence thresholds, and audit trails.
  • Design and run LLM evaluations: build eval datasets, run structured tests, measure accuracy, and track regressions.
  • Build RAG (Retrieval-Augmented Generation) pipelines with vector databases for knowledge-grounded agent responses.
  • Monitor, debug, and improve agent performance in production using tracing, logging, and structured evaluation.
  • Collaborate with backend engineers to integrate agentic workflows into production APIs and data pipelines.
  • What You Will Need:

  • 2-5 years of total software engineering experience, with at least 1 year building and shipping LLM-powered applications or agentic systems in production.
  • Hands-on experience with LangChain, LangGraph, CrewAI, AutoGen, or equivalent agentic frameworks.
  • Strong proficiency in Python; ability to write clean, testable, production-grade code.
  • Experience building RAG pipelines with vector databases such as Pinecone, ChromaDB, Qdrant, or pgvector.
  • Practical experience with prompt engineering: structured output, tool calling, few-shot prompting, chain-of-thought, and system prompt design.
  • Experience integrating LLMs via APIs from OpenAI, Anthropic, Google, or open-source models (Mistral, LLaMA, etc.).
  • Familiarity with observability tools for LLM systems: LangSmith, Langfuse, Helicone, or equivalent tracing platforms.
  • Working knowledge of REST APIs, async Python, and task queues (Celery, RQ, or similar) for orchestrating multi-step workflows.
  • Preferred Qualifications / Added Advantage:

  • Experience with LLM evaluation frameworks: building eval datasets, running structured tests, tracking prompt regressions.
  • Understanding of multi-agent coordination patterns: supervisor agents, parallel agents, sequential pipelines, and reflection loops.
  • Experience deploying agent systems on cloud infrastructure (AWS, GCP, or Azure) using Docker and CI/CD pipelines.
  • Prior experience in a client-facing or consulting environment where you shipped AI systems for external stakeholders.
  • What We Offer:

  • MacOS laptop.
  • Competitive salary and benefits package.
  • Opportunity to work directly on production AI systems used by real businesses, not internal tools or demos.
  • Collaborative and inclusive work environment.
  • Continuous learning and professional development opportunities.