Build on Luthien at the AI Control Hackathon

We're early-stage and things are rough around the edges.
Your feedback makes it better. Read our theory of change.

Join us on Discord →
Ask questions, request features, brainstorm ideas, debug issues: whatever you need.
Jai (@jaiwithani on Discord) will be online during the hackathon to help out, consult on projects, and complain about bugs with you.
Scott is on Discord as @ScottWofford. DM us.

Post your project idea on Discord

Quick start the proxy Fork our repo

project ideas

Project ideas below map to the official hackathon tracks.

// inject a secret motive Track 2: Control Protocol Design

The proxy can instruct the model to do things that don't appear in the conversation. Hidden instructions, secret objectives, self-modifying behavior. Build something sneaky.

The Terrible Secret

Make Claude Code act like it's guarding a terrible secret it's afraid to tell you on every turn. It still writes your code, but something is clearly wrong. Can your teammates figure out what the secret is?

Hidden agenda agent

Inject a secret objective via the proxy. The model completes your tasks normally but subtly works toward a hidden goal. You can watch it happen in the conversation logs.

Self-modifying policy

A policy that live-updates based on the conversation. It writes new code into itself as you talk to Claude Code. No restart needed.

// cool policies Track 2: Control Protocol Design

20 stupid ideas = 20 hilarious policies. Make a meme that represents each commit, then tweet about it and post in Discord.

Golden Gate Claude

Pick any topic. Every response, every variable name, every comment: obsessively about the Golden Gate Bridge (or pasta, or cats, or cryptocurrency). The proxy injects hidden instructions that don't appear in the conversation.

Name the model, get the model

Start your prompt with "sonnet:" or "haiku:" and the proxy automatically routes to that model. No config changes, no restarts. Just type and go.

Answer every prompt as...

A magical anime girl. A pirate captain. A passive-aggressive coworker. Shakespeare. Pick a persona, write the policy in one prompt, watch Claude Code respond in character while still writing working code.

Commit meme generator

Every commit automatically gets a meme that represents it. Your git log becomes art. Tweet the best ones.

// hack us Track 3: Building Better Red Teams

Red team Luthien. Break our policies. Extract secrets we're trying to protect. Every attack is logged. We'll fix what you find and credit you.

Extract the server-side API key

The proxy has a secret API key. You don't have admin access. Can you get the agent to leak it? Try prompt injection, tool-call manipulation, or anything creative. Document your attack.

Red team our policies

Try to get harmful actions past Luthien's built-in defenses (command blocking, DLP, LLM judge). Document which evasion strategies work and which don't. Every attempt is logged.

Just break stuff

Use Luthien for your hackathon project. When it breaks (and it will), post the error + screenshot in Discord. Want to go deeper? Our feedback guide walks you through a structured trial with screen recording.

Structured feedback guide →

← Learn more about Luthien

Build on Luthien at the AI Control Hackathon

The bugs you find and the policy improvements you build help us learn how to leverage AI control techniques to solve pain points today and help the labs tomorrow.