Jai Dhyani Scott Wofford

Build on Luthien at the AI Control Hackathon

We're early-stage and things are rough around the edges.
Your feedback makes it better. Read our theory of change.

Join us on Discord →
Ask questions, request features, brainstorm ideas, debug issues: whatever you need.
Jai (@jaiwithani on Discord) will be online during the hackathon to help out, consult on projects, and complain about bugs with you.
Scott is on Discord as @ScottWofford. DM us.
Post your project idea on Discord
Quick start the proxy Fork our repo

get started
Use luthien-proxy right now
One command. Sets up the proxy, walks you through config, and starts it.
curl -fsSL https://raw.githubusercontent.com/LuthienResearch/luthien-proxy/main/scripts/install-hackathon.sh | bash
Hack on the repo
Fork luthien-proxy →
Clone your fork, build policies, break things, submit PRs.

project ideas

Project ideas below map to the official hackathon tracks.

// inject a secret motive Track 2: Control Protocol Design
The proxy can instruct the model to do things that don't appear in the conversation. Hidden instructions, secret objectives, self-modifying behavior. Build something sneaky.
The Terrible Secret
Make Claude Code act like it's guarding a terrible secret it's afraid to tell you on every turn. It still writes your code, but something is clearly wrong. Can your teammates figure out what the secret is?
Hidden agenda agent
Inject a secret objective via the proxy. The model completes your tasks normally but subtly works toward a hidden goal. You can watch it happen in the conversation logs.
Self-modifying policy
A policy that live-updates based on the conversation. It writes new code into itself as you talk to Claude Code. No restart needed.
// cool policies Track 2: Control Protocol Design
20 stupid ideas = 20 hilarious policies. Make a meme that represents each commit, then tweet about it and post in Discord.
Golden Gate Claude
Pick any topic. Every response, every variable name, every comment: obsessively about the Golden Gate Bridge (or pasta, or cats, or cryptocurrency). The proxy injects hidden instructions that don't appear in the conversation.
Name the model, get the model
Start your prompt with "sonnet:" or "haiku:" and the proxy automatically routes to that model. No config changes, no restarts. Just type and go.
Answer every prompt as...
A magical anime girl. A pirate captain. A passive-aggressive coworker. Shakespeare. Pick a persona, write the policy in one prompt, watch Claude Code respond in character while still writing working code.
Commit meme generator
Every commit automatically gets a meme that represents it. Your git log becomes art. Tweet the best ones.
// hack us Track 3: Building Better Red Teams
Red team Luthien. Break our policies. Extract secrets we're trying to protect. Every attack is logged. We'll fix what you find and credit you.
Extract the server-side API key
The proxy has a secret API key. You don't have admin access. Can you get the agent to leak it? Try prompt injection, tool-call manipulation, or anything creative. Document your attack.
Red team our policies
Try to get harmful actions past Luthien's built-in defenses (command blocking, DLP, LLM judge). Document which evasion strategies work and which don't. Every attempt is logged.
Just break stuff
Use Luthien for your hackathon project. When it breaks (and it will), post the error + screenshot in Discord. Want to go deeper? Our feedback guide walks you through a structured trial with screen recording.
Structured feedback guide →
← Learn more about Luthien

why we're building this

The bugs you find and the policy improvements you build help us learn how to leverage AI control techniques to solve pain points today and help the labs tomorrow.

Luthien Theory of Change: from building control software for developers, through adoption and feedback, to frontier AI labs deploying control systems Read the full theory of change →