kinkeep.dev

github.io에서 커스텀 도메인으로 갈아탄 이유

Tue, 21 Apr 2026 00:00:00 GMT

블로그를 만들고 한 달이 지났는데 구글에 아무것도 안 나온다.

홈페이지는 색인이 됐다. 근데 포스트가 30개 넘게 있는데 단 하나도 구글이 모른다. URL Inspection을 돌려보면 전부 "URL is unknown to Google"이다. 발견조차 안 된 거다.

뭐가 문제였나

처음엔 기술적인 문제를 의심했다. sitemap이 잘못됐나, robots.txt가 막고 있나, 구조화 데이터가 빠졌나. 전부 확인해봤는데 다 정상이었다. sitemap은 접근 가능하고 XML도 유효하고, robots.txt는 Allow: /로 열려있고, JSON-LD도 BlogPosting 스키마까지 다 들어가 있었다.

문제는 sitemap이었다. 정확히는, 구글이 sitemap을 처리하지 않고 있었다. Search Console에서 sitemap 상태를 보면 제출한 지 5일이 지났는데도 isPending: true였다. 구글이 sitemap을 받아놓고 열어보지도 않은 거다.

홈페이지가 색인된 건 sitemap 덕이 아니라 GitHub 레포 페이지에서 링크가 걸려있어서 우연히 발견된 거였다.

github.io가 불리한 이유

yhc509.github.io는 GitHub Pages의 공유 도메인이다. github.io 밑에 수백만 개의 서브도메인이 있고, 대부분이 방치된 프로젝트 페이지나 README 미러다. 구글 입장에서 이런 도메인에 크롤 예산을 많이 배정할 이유가 없다.

커스텀 도메인이 유리한 점은 세 가지다.

독립적인 도메인 권한 — github.io에 세 들어 사는 게 아니라 내 도메인으로 신뢰도를 쌓을 수 있다
크롤 우선순위 — 공유 도메인보다 독립 도메인에 크롤러가 더 적극적으로 온다
장기적 유연성 — 나중에 GitHub Pages를 안 써도 도메인은 유지된다

도메인 선택과 구매

도메인 등록은 Cloudflare Registrar로 했다. 이유는 단순하다. 원가 판매라 갱신 시 가격이 안 오른다. 다른 업체들은 첫 해만 싸게 하고 갱신할 때 올리는 경우가 많은데, Cloudflare는 그게 없다. DNS 설정도 같은 대시보드에서 끝난다.

도메인은 kinkeep.dev로 했다. .dev TLD는 HTTPS가 강제되고 개발 블로그 느낌에 잘 맞는다. 연간 $12.20, 한화로 약 1.7만 원이다.

연결 과정

GitHub Pages에 커스텀 도메인을 연결하는 건 생각보다 간단하다.

DNS 설정 — Cloudflare에서 A 레코드 4개를 추가한다. GitHub Pages 서버 IP로, 로드밸런싱을 위해 4개다. Proxy는 끄고 DNS only로 설정해야 한다. GitHub Pages가 자체 TLS를 쓰기 때문에 Cloudflare 프록시를 켜면 인증서가 충돌한다.

코드 변경 — 사이트 URL이 하드코딩된 곳을 전부 바꿔야 한다. 내 경우는 세 군데였다.

사이트 기본 URL 설정
RSS 피드 생성 스크립트
GitHub Actions 빌드 환경변수

그리고 public/CNAME 파일을 만들어서 도메인을 적어둔다. GitHub Pages가 이 파일을 보고 커스텀 도메인을 인식한다.

GitHub 설정 — 레포 Settings → Pages → Custom domain에 도메인을 넣으면 끝이다. HTTPS 인증서는 Let's Encrypt로 자동 발급된다.

결과

도메인 연결 후 Google Search Console에 새 속성을 등록하고 sitemap을 제출했다. 여기서 차이가 바로 나타났다.

github.io에서는 sitemap 제출 후 5일이 지나도 isPending이었다. kinkeep.dev에서는 즉시 처리됐다. 같은 콘텐츠, 같은 sitemap 구조인데 도메인만 바꿨을 뿐이다.

Indexing API로 37개 URL을 색인 요청한 것도 전부 성공했다. 며칠 뒤 실제 색인 결과를 확인해봐야 하지만, 출발부터 다르다.

비용 대비 효과

연간 1.7만 원으로 할 수 있는 가장 확실한 SEO 투자다. 블로그를 GitHub Pages로 운영하면서 구글 색인이 안 되는 사람이 있다면, 코드를 건드리기 전에 커스텀 도메인부터 연결해보는 걸 추천한다.

이전 도메인 yhc509.github.io로 접속하면 자동으로 kinkeep.dev로 리다이렉트된다. 기존에 공유된 링크가 깨질 걱정은 없다.

Why I Switched from github.io to a Custom Domain

Tue, 21 Apr 2026 00:00:00 GMT

A month after launching my blog, nothing showed up on Google.

The homepage was indexed. But with over 30 posts live, Google didn't know about a single one. Running URL Inspection on each returned "URL is unknown to Google" across the board. They weren't even discovered.

What Was Wrong

I first suspected technical issues. Bad sitemap? Blocking robots.txt? Missing structured data? I checked everything — all clean. The sitemap was accessible with valid XML, robots.txt was wide open with Allow: /, and JSON-LD had full BlogPosting schema.

The problem was the sitemap — specifically, Google wasn't processing it. In Search Console, the sitemap status showed isPending: true five days after submission. Google received it and never opened it.

The homepage got indexed not through the sitemap, but because a link from the GitHub repository page led Google to discover it by accident.

Why github.io Is Disadvantaged

yhc509.github.io is a shared GitHub Pages domain. Millions of subdomains exist under github.io, most of them abandoned project pages or README mirrors. From Google's perspective, there's little reason to allocate significant crawl budget to these domains.

A custom domain wins in three ways:

Independent domain authority — you build trust on your own domain instead of renting space under GitHub's
Crawl priority — crawlers are more aggressive with independent domains than shared ones
Long-term flexibility — the domain stays yours even if you leave GitHub Pages

Choosing and Buying

I went with Cloudflare Registrar. The reason is simple: they sell at cost, so renewal prices don't inflate. Other registrars often offer cheap first-year pricing then jack up renewals. Cloudflare doesn't play that game, and DNS config lives in the same dashboard.

I picked kinkeep.dev. The .dev TLD enforces HTTPS and fits a dev blog well. It costs $12.20/year.

The Setup

Connecting a custom domain to GitHub Pages is straightforward.

DNS — Add four A records in Cloudflare pointing to GitHub Pages server IPs for load balancing. Proxy must be off (DNS only) because GitHub Pages handles its own TLS — Cloudflare's proxy would cause certificate conflicts.

Code changes — Every hardcoded site URL needs updating. In my case, three places:

The site base URL config
The RSS feed generation script
The GitHub Actions build environment variable

Plus a public/CNAME file containing the domain name. GitHub Pages reads this to recognize the custom domain.

GitHub settings — Repository Settings → Pages → Custom domain. Enter the domain, done. HTTPS certificates are auto-provisioned via Let's Encrypt.

Results

After connecting the domain, I registered a new property in Google Search Console and submitted the sitemap. The difference was immediate.

On github.io, the sitemap stayed isPending for five days. On kinkeep.dev, it was processed instantly. Same content, same sitemap structure — only the domain changed.

All 37 URLs submitted via the Indexing API succeeded as well. I'll need to check actual indexing results in a few days, but the starting point is already different.

Cost vs. Impact

This is the most reliable SEO investment you can make for ~$12/year. If you're running a blog on GitHub Pages and struggling with Google indexing, try connecting a custom domain before touching any code.

The old domain yhc509.github.io automatically redirects to kinkeep.dev, so previously shared links won't break.

맥북프로를 샀다

Thu, 16 Apr 2026 00:00:00 GMT

맥미니 하나로 버텨왔다. SDXL 돌리면서 Unity 열고 Rider까지 띄우면 메모리가 난리가 났다. 그러면서도 머릿속에는 LLM을 로컬에서 돌리고 싶다는 생각이 항상 있었는데, 현실은 SDXL 하나 올리면 이미 빠듯한 상태였다. LLM은 꿈도 못 꿨다.

결국 맥북프로를 샀다. M5 Pro 48기가. 409만 원.

솔직히 마음 같아서는 맥스튜디오를 지르고 싶었다. 128기가, 256기가짜리 올려놓고 진짜 LLM다운 LLM을 로컬에서 굴리는 거다. 근데 그러면 가격이 몇백에서 몇천까지 깨지니까, 취미로 하는 사이드 프로젝트에 그 돈을 넣기엔 아무래도 무리가 있었다.

맥북으로 간 이유는 생각보다 단순했다. 지금 문제가 성능 부족이 아니라 역할이 안 나뉘어 있는 거였다. 맥미니 한 대에 AI 작업이랑 개발을 다 올려놓으니까 둘 다 애매해진 거다. 개발용 머신 하나만 따로 생기면 맥미니를 AI 서버로 돌릴 수 있었다.

사고 나서

제일 먼저 느낀 건 좀 뜬금없는 건데, 누워서 코딩이 된다는 거였다. 책상 앞에 앉아야만 개발할 수 있었던 게, 소파에서도 침대에서도 되니까 부담감이 많이 줄었다. 뭔가 해야 된다는 압박이 아니라 하고 싶을 때 열면 되는 느낌. 실제로 맥북 사고 3일 만에 자동화 프로젝트를 하나 만들었다. 이건 다음에 따로 쓰려고 한다.

맥미니는 이제 AI 서버다

개발이 맥북으로 빠지니까 맥미니가 편해졌다. 원래 돌리던 SDXL은 그대로 두고, Gemma 4를 MLX로 올렸다. 처음에는 Hermes agent에 물려서 에이전트처럼 쓰려고 했는데, 아직 손볼 데가 있어서 많이 쓰지는 못하고 있다. 방향을 좀 틀어서 에이전트보다는 자동화 쪽으로 활용해 볼 생각이다.

예전에는 SDXL 돌리는 것만으로도 메모리가 빡빡했는데, 지금은 SDXL이랑 LLM을 동시에 띄워도 괜찮다. 역할 나누기 전에는 상상도 못 했던 조합이다.

결국은 역할 분리

비싼 머신 하나에 다 때려넣는 것보다 적당한 머신 두 대로 나누는 게 나을 때가 있다. 맥스튜디오 256기가로 다 해결하면 멋지겠지만, 409만 원짜리 맥북 하나 추가하는 걸로 문제가 풀렸다. 맥북은 개발, 맥미니는 AI. 지금은 이 조합이 꽤 잘 맞는다.

I Bought a MacBook Pro

Thu, 16 Apr 2026 00:00:00 GMT

I'd been surviving on a single Mac Mini. Running SDXL while opening Unity and Rider wrecked memory. I always wanted to run LLMs locally too, but reality was tight — SDXL alone pushed things to the edge. LLMs weren't even on the table.

So I bought a MacBook Pro. M5 Pro, 48 GB. About $2,800.

Honestly, I wanted a Mac Studio. 128 GB, 256 GB — run real LLMs locally, the kind that actually matter. But that means spending thousands more, and for side projects and hobby work, the price just didn't make sense.

The reason I went with a MacBook was simpler than expected. The problem wasn't raw performance — it was that roles weren't separated. Cramming AI work and development onto one machine made both worse. One dedicated dev machine would free the Mac Mini to become an AI server.

After buying it

The first thing I noticed was oddly physical. I can code lying down now. Development used to mean sitting at my desk. Now it works from the couch, from bed. Sounds trivial, but the pressure dropped noticeably. Less "I need to sit down and work" and more "I can just open the lid whenever." I built an automation project within three days of getting the MacBook. I'll write about that separately.

The Mac Mini is now an AI server

With development off-loaded to the MacBook, the Mac Mini breathes easier. SDXL stays as-is, and I put Gemma 4 on it via MLX. Initially tried wiring it to a Hermes agent, but it still needs tuning so I haven't used it much. Shifting direction — planning to use it more for automation than as an agent.

Before the split, SDXL alone maxed out memory. Now SDXL and an LLM run side by side without trouble. A combination I couldn't have imagined before.

It's about role separation

Sometimes two decent machines beat one expensive one. A 256 GB Mac Studio solving everything sounds great, but a $2,800 MacBook addition solved the actual problem. MacBook for dev, Mac Mini for AI. The setup works well for now.

Claude의 메모리를 버리고 Obsidian으로 갈아탔다

Sat, 04 Apr 2026 00:00:00 GMT

Claude Code의 메모리 시스템이 마음에 안 들었다. 디렉토리 단위로 저장된다. A 프로젝트에서 "이렇게 해라"고 가르친 게 B 프로젝트에서는 적용이 안 된다. 어느 폴더에서 뭘 저장시켰는지 내가 기억을 못 한다. A에서는 잘하는데 B에서 멍청한 짓을 하면 메모리 파일을 하나하나 까봐야 한다. 전역 메모리라는 개념 자체가 없다.

프로젝트를 하나만 하면 참을 수 있다. 나는 여러 개를 병렬로 돌린다. kinkeep-unity-cli를 만들면서 쌓인 Unity 에디터 자동화 지식이, 이 CLI를 사용하는 게임 프로젝트에서는 없는 지식이 된다. 같은 사람이 같은 도구를 쓰는데 폴더가 다르다는 이유로 맥락이 단절된다.

문서도 문제였다. 기획서, 설계 문서, 리서치 노트가 프로젝트마다 docs/, plans/, .claude/ 같은 곳에 흩어져 있었다. 무슨 문서가 있는지 찾아보기도 어렵고, 더 이상 유효하지 않은 문서가 남아 있으면 에이전트가 그걸 참고해서 작업을 오염시킨다.

Obsidian vault 하나로 통합

Obsidian을 지식 저장소로 쓰기로 했다. vault는 하나만 쓴다. 프로젝트 기획 문서, 설계 문서, 리서치 노트, 메모리 전부 이 vault에 들어간다. 프로젝트 내부에는 코드만 남긴다.

Claude Code에서 vault에 접근하는 건 전부 CLI로 한다. 읽기, 쓰기, 검색. Claude가 작업 중에 얻은 지식은 vault에 문서로 정리하고, 다음 세션에서 필요하면 vault에서 읽어온다.

연결고리는 태그다. 모든 문서에 YAML frontmatter 태그를 단다. 프로젝트명 태그, 문서 유형 태그. Claude Code가 특정 프로젝트에 진입하면 태그로 관련 문서를 찾는다. 폴더 구조가 아니라 태그로 연결되니까 하나의 문서가 여러 프로젝트에 걸칠 수 있다.

Obsidian의 그래프 뷰는 사람이 전체 구조를 파악하는 데 쓴다. AI는 그래프 뷰를 못 본다. 대신 문서 안에 있는 위키링크와 태그를 따라간다. 참조를 잘 걸면 그래프 뷰도 의미 있게 나오고, AI도 관련 문서를 찾아갈 수 있다.

Claude 메모리를 끊었다

Claude의 내장 메모리 시스템을 더 이상 쓰지 않는다. Claude가 자꾸 메모리를 업데이트하려고 하는데, 그때마다 제지한다. 모든 메모리는 vault의 Memory/ 폴더에 들어간다. 각 프로젝트의 로컬 메모리 파일에는 "Obsidian을 보라"는 리다이렉트만 남겨뒀다.

어느 프로젝트에서 세션을 시작하든 같은 메모리에 접근한다. A 프로젝트에서 배운 것이 B 프로젝트에서도 작동한다. 디렉토리 단위라는 제약이 사라진다.

문서가 썩지 않게

문서는 만들어놓고 안 관리하면 썩는다. 태그가 빠진 문서는 검색에 안 걸린다. 참조가 끊긴 문서는 고아가 된다. 에이전트가 오래된 문서를 현재 사실인 것처럼 참고하면 작업이 망가진다.

cron job을 돌린다. Obsidian 문서가 규칙대로 잘 되어 있는지 검사하는 스크립트다. 태그가 누락된 문서, 레퍼런스가 끊긴 문서를 찾아낸다. 혹시라도 Claude의 로컬 메모리에 남아 있는 게 있으면 vault로 옮긴다.

사람이 매일 문서를 점검하는 건 불가능하다. 문서가 계속 쌓이니까. 자동화로 품질을 유지하지 않으면 vault도 결국 이전의 파편화된 상태로 돌아간다.

프로젝트를 넘나드는 지식

지금 가장 효과를 보는 사례가 kinkeep-unity-cli다. CLI 패키지를 개발하면서 Unity 에디터 자동화, 씬 핸들링, 에셋 관리에 대한 지식이 vault에 쌓인다. 이 CLI를 가져다 쓰는 게임 프로젝트에서 Claude Code를 열면, vault에서 CLI의 동작 방식과 제약 사항을 읽어온다. CLI 개발 맥락을 모르는 상태에서 "이 명령어가 왜 안 되지"를 디버깅하는 것과, 내부 구조를 아는 상태에서 접근하는 것은 다르다.

패키지별로 기능을 쪼개서 병렬 작업을 돌릴 때 특히 유리하다. 각 패키지 프로젝트가 독립된 디렉토리에 있지만, 공유 지식은 vault 하나에 모여 있다. 디렉토리 경계 때문에 맥락이 끊기지 않는다.

아직 모르는 것

문서가 더 쌓이면 어떻게 되는지 모른다. 지금은 만족스럽다. 하지만 vault에 문서가 수백 개, 수천 개가 되었을 때 검색 품질이 유지될지, 태그 체계가 버틸지, Claude가 관련 문서를 정확하게 찾아올지는 써봐야 안다.

Andrej Karpathy도 Obsidian은 아니지만 비슷한 걸 시도하고 있는 것 같다. 에이전트한테 외부 지식 저장소를 따로 두는 방식. 도구는 달라도 방향은 같다고 본다. 유행으로 끝나는 건지, 에이전트 운영의 기본 인프라가 되는 건지는 더 써봐야 안다.

Ditched Claude's Memory for Obsidian

Sat, 04 Apr 2026 00:00:00 GMT

Claude Code's memory system didn't work for me. It's directory-scoped. Something you teach it in project A doesn't carry over to project B. I can't even remember which folder I saved which memory in. It works fine in A, does something stupid in B, and I have to dig through memory files one by one to figure out why. There's no concept of global memory.

If you only work on one project, it's tolerable. I run several in parallel. Knowledge about Unity editor automation that I built up while developing kinkeep-unity-cli becomes nonexistent knowledge in the game project that uses that CLI. Same person, same tools, but context is severed because the folder is different.

Documents were a problem too. Specs, design docs, and research notes were scattered across docs/, plans/, .claude/ in every project. Hard to even find what documents exist. When outdated docs stick around, the agent references them and contaminates the work.

One Obsidian Vault

Decided to use Obsidian as the knowledge store. One vault only. Project specs, design docs, research notes, memory — everything goes in this vault. Projects only keep code.

Claude Code accesses the vault entirely through CLI. Read, write, search. Knowledge gained during a session gets written to the vault as a document. Next session pulls from the vault when needed.

The connective tissue is tags. Every document gets YAML frontmatter tags — project name, document type. When Claude Code enters a specific project, it finds related documents by tag. Since the link is tags rather than folder structure, a single document can span multiple projects.

Obsidian's graph view is for humans to grasp the overall structure. AI can't see the graph view. It follows wikilinks and tags inside documents instead. Wire the references well and the graph view becomes meaningful for humans, while AI can navigate to related documents too.

Cut Off Claude's Memory

I no longer use Claude's built-in memory system. Claude keeps trying to update its memory, and I stop it every time. All memory goes into the vault's Memory/ folder. Each project's local memory file just has a redirect that says "look at Obsidian."

Whichever project you start a session in, you hit the same memory. What's learned in project A works in project B. The directory-scoped constraint is gone.

Keeping Docs From Rotting

Documents rot when you create them and don't maintain them. A document missing tags won't show up in searches. A document with broken references becomes an orphan. When an agent treats an outdated document as current fact, work breaks.

I run a cron job. A script that checks whether Obsidian documents follow the rules. Finds documents with missing tags, broken references. If anything lingers in Claude's local memory, it gets moved to the vault.

Manually reviewing documents every day is impossible. They keep piling up. Without automation to maintain quality, the vault eventually degrades back to the same fragmented state as before.

Knowledge Across Projects

The clearest win right now is kinkeep-unity-cli. As I develop the CLI package, knowledge about Unity editor automation, scene handling, and asset management accumulates in the vault. When I open Claude Code in a game project that uses this CLI, the vault serves up how the CLI works and what its constraints are. Debugging "why isn't this command working" without CLI development context versus approaching it with knowledge of the internals — completely different.

Especially useful when splitting features into packages and running parallel work. Each package project lives in its own directory, but shared knowledge is in one vault. Directory boundaries don't break context.

What I Don't Know Yet

I don't know what happens when more documents pile up. Right now I'm satisfied. But whether search quality holds when the vault has hundreds or thousands of documents, whether the tag system scales, whether Claude accurately finds relevant documents — I'll only know by using it.

Andrej Karpathy seems to be trying something similar, though not with Obsidian. Giving the agent a separate external knowledge store. Different tools, same direction. Whether this is a passing trend or becomes basic infrastructure for running agents — I'll have to keep using it to find out.

본부 개발자 앞에서 Claude Code를 발표했다

Wed, 01 Apr 2026 00:00:00 GMT

어제 본부 개발자 대상으로 Claude Code 발표를 했다. 나 포함 세 명이 나눠서 진행했는데, 앞쪽은 왜 Claude Code를 골랐는지, 에이전트 코딩이 지금 어디까지 왔는지, 하네스 엔지니어링이 뭔지를 다뤘고, 내 파트는 실제 사용 사례였다.

발표를 준비하면서 제일 많이 생각한 건 사례 자체보다 "이걸 어떤 톤으로 말해야 하나"였다. 본부 인원들이 AI에게 코드를 맡기는 걸 어떻게 바라보는지 모른다. 긍정적일 수도 있고, 불안하게 볼 수도 있다. 그래서 발표 톤을 잡을 때 나름 신경 쓴 지점이 있다.

개발

집에서는 코드를 100% AI에게 맡기고 있다. 혼자 하는 작업이니 부담이 없다. 공부하거나 외부에 공개할 때만 코드를 직접 본다.

실무는 다르다. 잘못된 코드가 올라가면 내가 책임져야 한다. 그래서 발표에서는 이 차이를 일부러 강조했다.

AI에게 코드를 맡기되, 방치하지 않는다. 커밋 전에 diff를 꼼꼼하게 보고, 변수명이나 함수명이 스타일에 안 맞으면 바로 개입해서 바꾼다. 코드 구조가 마음에 안 들면 거침없이 방향을 틀게 한다.

처음부터 큰 일을 맡긴 건 아니다. 한 달 반 전쯤, 작은 개발 작업부터 시작했다. 괜찮겠다 싶으면 범위를 넓혔고, 지금은 대형 콘텐츠 개발까지 진행 중이다.

발표를 준비하면서 이 흐름을 돌아보니, 결국 감각의 문제였다. AI에게 코드를 맡긴다는 건 "코딩을 안 한다"가 아니라 "코딩의 형태가 바뀐다"에 가까웠다. 직접 타이핑하는 대신 방향을 잡아주고, 결과를 판별하고, 개입 시점을 고른다. 그게 지금 내가 느끼는 에이전트 코딩이다.

이슈트래킹

이슈 하나가 지라에 올라오면, 보통 백트레이스 같은 수집 사이트에 가서 관련 로그를 직접 찾는다. 필터 걸고, 검색하고, 유사 사례 비교하고. 시간이 꽤 든다.

이걸 스킬로 만들었다. 지라에 올라온 정보를 복붙하면 Claude가 내용을 파악하고 필터를 결정해서 CLI로 수집 사이트를 조회한다. 유사 사례까지 같이 보고해준다. URL이든 이슈 본문이든 붙여넣고 기다리면 끝이다.

이 사례를 발표에 넣은 건, 작업이 병렬로 돌아간다는 점 때문이다. 내가 다른 일을 하고 있는 동안 Claude가 이슈를 파고 있다. 이슈 파악을 위해 하던 일을 멈출 필요가 없다.

코드를 맡기는 사례만 보여주면 "코딩 대신 해주는 도구"라는 인상에 머물 수 있다. 코딩 바깥에서도 에이전트가 시간을 벌어준다는 걸 같이 보여주고 싶었다.

기획서를 작업 체크리스트로

나는 원래 개발할 때 기획서를 읽으면서 직접 체크리스트를 만든다. 뭘 구현해야 하는지, 어떤 순서로 갈지를 정리해두고 하나씩 지우는 식이다.

이걸 Claude에게 넘겨봤다.

처음 만든 버전(v1)은 기획서를 그냥 던져서 체크리스트를 뽑게 했다. 결과가 나쁘진 않았다. 오히려 내가 수작업으로 만들 때보다 항목이 3배 가까이 나왔다. 꼼꼼한 건 맞는데, 누락도 있었다.

v2에서는 기획서를 바로 체크리스트로 바꾸지 않고, 원문을 먼저 분해하고 그룹화하는 단계를 넣었다. 체크리스트가 v1보다 2배쯤 더 촘촘해졌다. 대신 새로운 문제가 생겼다. 기획서에서 명확히 정해지지 않은 회색영역을 Claude가 판단하지 못했다. "이것도 해야 하나?" 싶은 항목이 잔뜩 생겨서 리스트가 불필요하게 길어졌다.

발표 범위에는 안 들어갔지만, 오늘 v3를 만들었다. 기획서를 분석하기 전에 나한테 먼저 인터뷰를 한다. 회색영역에 대해 "이건 넣을 건가요, 뺄 건가요"를 물어보고, 그 답을 기반으로 체크리스트를 만든다.

체크리스트는 마크다운 파일로 나온다. 개발할 때는 Claude와 이 파일을 같이 보면서 "오늘은 이거 만들자" 하는 식으로 진행한다.

한계는 있다. 기획서와 실제 코드 상황은 다르다. 기획서에서는 깔끔하게 분리된 기능이 코드에서는 서로 엮여 있을 수 있고, 기획서에 없는 기술적 제약이 있을 수도 있다. 체크리스트만 믿고 개발하면 안 된다.

LSP

실 프로젝트를 대상으로 LSP가 얼마나 효율적인지 직접 테스트해본 사례도 넣었다. 이건 이전에 글로 정리해둔 게 있다.

궁금하면 여기에 자세히 썼다: Claude Code의 LSP는 토큰을 줄여줄까

"남들이 좋다고 하니까 쓴다"가 아니라 직접 검증해보는 태도를 보여주고 싶었다. 도구가 실제로 어떤 차이를 만드는지는 직접 돌려봐야 안다.

insights

Claude Code에는 /insights라는 명령어가 있다. 그동안의 대화를 분석해서 사용 패턴을 요약하고 평가해준다.

내 사용 사례를 정리해서 보여주기 좋았고, 본부 인원들이 직접 돌려봤으면 하는 마음도 있었다. Claude Code를 쓰기 시작하면 자기 나름의 패턴이 생기는데, insights를 돌려보면 그게 어떤 모양인지 한눈에 보인다. 자기 사용 방식을 돌아보는 데 꽤 쓸만하다.

발표를 준비하고 나서

발표 준비가 나한테도 정리가 됐다. 평소에 그냥 쓰고 있던 걸 남한테 설명하려고 꺼내놓으니까, 내가 왜 이렇게 쓰고 있는지가 선명해졌다.

에이전트한테 일을 맡기는 건, 처음부터 크게 믿고 시작한 게 아니었다. 작은 일부터 맡기면서 감을 잡았고, 지금도 맡기고 나서 확인하는 루프는 빠지지 않는다. 그게 없으면 실무에서는 못 쓴다.

발표가 끝나고 메신저로 Claude Code 설정법을 물어보는 사람이 몇 있었다. 직접 깔아서 써보겠다는 뜻이니까, 일단은 전달이 된 것 같다.

I Presented Claude Code to Our Division's Developers

Wed, 01 Apr 2026 00:00:00 GMT

Yesterday I gave a Claude Code presentation to the developers in our division. Three of us split it up — the others covered why we picked Claude Code, the current state of agentic coding, and what harness engineering is. My section was real use cases.

What I thought about most while preparing wasn't the cases themselves. It was the tone. I had no idea how these people felt about delegating code to AI. Some might be into it. Some might think it's reckless. That shaped how I framed everything.

Development

At home I delegate 100% of code to AI. Solo work, no stakes. I only look at the code when I'm studying or publishing something.

Work is different. Bad code ships under my name. So I deliberately emphasized this gap in the presentation.

I delegate to AI but I don't walk away. I review diffs before every commit. If variable names or function names don't match the style, I step in immediately. If the code structure feels wrong, I redirect without hesitation.

I didn't start big. About six weeks ago, small dev tasks first. If things went fine, I expanded scope. Now I'm running large-scale content development through it.

Looking back at this progression while preparing the talk, it came down to feel. Delegating code to AI isn't "not coding." It's coding in a different shape. Instead of typing, you set direction, judge output, and pick when to intervene. That's what agentic coding feels like to me right now.

Issue tracking

When an issue lands in Jira, I usually go to a crash collection service like Backtrace and manually search for related logs. Set filters, search, compare similar cases. Takes a while.

I built a skill for this. Paste the Jira info and Claude reads the content, decides on filters, and queries the collection service via CLI. Reports similar cases too. Paste and wait — that's it.

I included this because the work runs in parallel. While I'm doing something else, Claude is digging through the issue. I don't have to stop what I'm working on to investigate.

If I only showed cases about delegating code, the impression stays at "a tool that codes for you." I wanted to show that the agent saves time outside of coding too.

Spec to task checklist

When I develop, I normally read the spec and build my own checklist. What to implement, in what order, crossing items off as I go.

I handed this to Claude.

The first version (v1) just fed the spec and extracted a checklist. Results weren't bad. Actually produced about 3x more items than I'd make by hand. Thorough, but had gaps.

v2 didn't go straight from spec to checklist. It decomposed and grouped the source text first. About 2x more granular than v1. But a new problem appeared. Claude couldn't judge gray areas — things the spec didn't explicitly define. "Should this be included?" items piled up and the list got unnecessarily long.

This wasn't in the presentation, but today I built v3. Before analyzing the spec, it interviews me first. Asks "include this or skip it?" for each gray area, then builds the checklist from those answers.

The checklist comes out as a markdown file. During development, Claude and I look at this file together — "let's build this one today."

There are limits. A spec is not the codebase. Features that look cleanly separated in the spec can be tangled in code. Technical constraints not in the spec can exist. You can't just trust the checklist and code blindly.

LSP

I also included a case where I tested LSP efficiency on a real project. I already wrote about this separately.

Details here: Does Claude Code's LSP Actually Save Tokens?

I wanted to show the attitude of verifying things yourself instead of using something because everyone says it's good. You don't know what difference a tool actually makes until you run the test.

insights

Claude Code has a /insights command. It analyzes your conversation history and summarizes your usage patterns with an evaluation.

It was useful for showing my own use cases in a condensed form, and I hoped the division members would try running it themselves. Once you start using Claude Code, you develop your own patterns. Running insights shows you what those patterns look like at a glance. Pretty useful for reflecting on how you work.

After preparing the presentation

Preparing the talk organized things for me too. Pulling out stuff I'd been using without much thought, trying to explain it to others — that's when it became clear why I use things the way I do.

Delegating to an agent isn't something I started with high trust. I started with small tasks and built a feel for it. Even now, the loop of delegating then checking never drops out. Without that loop, you can't use this at work.

After the talk, a few people messaged me asking how to set up Claude Code. They want to install it and try it themselves. So the message got through, at least.

다른 세션의 대화를 블로그 소재로 쓴다

Sun, 29 Mar 2026 00:00:00 GMT

Claude Code 세션을 여러 개 띄워놓고 작업하다 보면, 세션마다 꽤 괜찮은 대화가 쌓인다. 레포를 분석하면서 "이건 쓸 만하다, 이건 허풍이다"라고 판단한 것. 도구를 세팅하면서 "이렇게 하니까 안 된다"고 겪은 것. 이런 관찰과 판단이 세션 화면에 텍스트로 남아있다.

문제는 세션이 독립적이라는 것이다. A 세션에서 겪은 건 B 세션이 모른다. 사람은 여러 탭을 왔다갔다 하면서 머릿속으로 맥락을 연결하지만, AI는 자기 세션 밖을 볼 수 없다.

cmux를 쓰면 이 벽이 없어진다.

cmux와 read-screen

cmux는 tmux와 비슷한 터미널 멀티플렉서인데 Claude Code에 특화된 기능이 있고 CLI를 지원한다. CLI가 있다는 게 핵심이다. Claude가 명령어로 다른 세션의 화면을 읽고 메시지를 보낼 수 있다.

이번에 쓴 명령어는 두 개다.

cmux tree --all                        # 전체 워크스페이스/세션 구조 보기
cmux read-screen --surface surface:24 --scrollback --lines 200  # 다른 세션 화면 읽기

tree로 지금 어떤 세션이 떠 있는지 보고, read-screen으로 특정 세션의 화면을 읽는다. 사람이 탭을 전환하면서 머릿속으로 하던 일을 AI가 CLI로 하는 것이다.

cmux는 이 외에도 워크스페이스 생성/삭제, 메시지 전송(send), 브라우저 제어, 스크린샷, 알림 등 꽤 많은 명령을 지원한다. 이 명령어들을 Claude Code 스킬로 묶어서 cmux-collab이라는 이름을 붙여뒀다. 스킬에는 병렬 리서치를 위한 스웜 디스패치, 버그 토론을 위한 adversarial debugging 프로토콜도 정의되어 있는데, 이번에 쓴 건 가장 단순한 패턴이다. 다른 세션 읽기.

제약은 두 가지다. read-screen은 터미널 스크롤백 버퍼에 있는 텍스트만 읽는다. 접힌 출력이나 버퍼 한계를 넘어간 내용은 못 읽으니, 오래된 대화일수록 앞부분이 잘려나간다. 그리고 다른 세션의 화면을 읽어오면 그 텍스트가 현재 세션의 컨텍스트에 들어간다. 읽는 세션이 많아질수록 컨텍스트 여유가 줄어든다. 무한정 읽을 수는 없고 필요한 세션만 골라야 한다.

4개 세션을 훑었다

블로그 세션에서 다른 세션들의 화면을 읽도록 시켰다. "글 소재가 될 만한 게 있는지 봐라."

읽은 세션은 네 개였다.

surface 24 — Claude Code Game Studios라는 GitHub 레포(Stars 6,500) 분석. 에이전트 48개 계층 구조를 해부하고 실용적인 부분을 추렸다.

surface 25 — garrytan/gstack 레포 분석. Hook, Skill 시스템 중심으로 쓸 만한 패턴을 A/B/C 등급으로 분류.

surface 26 — cmux-collab 스킬 개발. AI 코디네이터의 작동 조건을 도출하고 프리플라이트 인터뷰 개념을 스킬에 반영했다.

surface 40 — Obsidian vault 3개를 하나로 합치는 작업. 커뮤니티 사례 조사, 심링크 구조 설계, hook 설정이 진행 중이었다.

4개 중 1개가 글이 됐다

전부 읽고 소재를 걸렀다. 걸러지는 기준이 있었다.

surface 24와 25는 같은 맥락이었다. 두 레포를 비교 분석한 내용에 내 에이전트 사용 경험을 붙이면 글이 된다고 판단했다. 내가 직접 에이전트를 나눠봤다가 실패한 경험이 있어서, 남의 레포를 까는 게 아니라 같은 문제에 닿은 이야기가 됐다. 여기서 "에이전트 48개짜리 AI 게임 스튜디오를 까봤다"라는 글이 나왔다.

surface 26은 AI 코디네이터 실험 이야기인데, 아직 테스트 수준이라 경험이 얇았다. 초안까지 써봤는데 결국 "AI 회사를 까는 글"이 되길래 지웠다. "해봤더니 이랬다"가 아니라 "하는 중이다"는 글이 안 된다. 본격적으로 돌려본 다음에 쓰기로 했다.

surface 40은 Obsidian 세팅이 한창 진행 중이었다. 세팅이 끝나고 한동안 써봐야 경험이 나온다. 지금 쓰면 설정 가이드가 되는데, 그건 내 블로그에서 하는 글이 아니다.

정리하면 세 가지다. 경험이 완결됐는가. 내 관찰이 들어가는가. 설정 가이드가 되진 않는가.

대화가 대본이 된다

이 과정의 핵심은, Claude와의 대화 자체가 블로그 포스팅의 대본 원본이 된다는 것이다. 뭘 시도했고, 왜 안 됐고, 뭘 골랐는지가 대화 안에 이미 있다. 그걸 읽고 소재를 고르고 인터뷰로 보강하면 별도의 취재 없이 글이 나온다.

예전에 Codex로 블로그를 정리하고 포스팅하는 흐름을 쓸 때는, 하나의 세션 안에서 인터뷰로 경험을 끌어내고 글을 다듬는 과정이었다. 이번에는 소재 발굴 단계가 앞에 붙었다. 여러 세션의 대화를 읽어서 "이건 글이 되겠다"를 먼저 판단하고, 그다음 같은 인터뷰 과정을 돌린다. 소재 찾기와 글쓰기가 한 루프로 연결된 셈이다.

세션 하나에서 작업하면 그 세션의 맥락만 보인다. cmux로 다른 세션을 읽으면 내가 오늘 뭘 했는지 전체가 보인다. 블로그 소재뿐 아니라 회고, 문서화, 핸드오프에도 같은 패턴을 쓸 수 있을 것 같다.

그리고 이 방식의 가장 큰 장점은 글쓰기의 부담이 줄어든다는 것이다. 대화가 대본이니까 따로 준비할 게 없다. 평소에 작업하면서 나눈 대화가 쌓이고, 나중에 그걸 훑어서 글감을 고르면 된다. 글을 쓰기 위해 별도의 시간을 내는 게 아니라 이미 한 작업에서 글이 나온다.

Turning Other Sessions' Conversations into Blog Material

Sun, 29 Mar 2026 00:00:00 GMT

Run multiple Claude Code sessions at once and each one accumulates decent conversations. Judgments made while analyzing a repo — "this is useful, that's hype." Things hit while setting up a tool — "this approach doesn't work." These observations sit as text on the session screen.

The problem is sessions are isolated. What happened in session A, session B doesn't know. Humans tab between windows and connect context in their heads. AI can't see outside its own session.

cmux removes that wall.

cmux and read-screen

cmux is a terminal multiplexer similar to tmux but with Claude Code-specific features and CLI support. The CLI is what matters. It lets Claude read other sessions' screens and send messages via commands.

Two commands were enough for this:

cmux tree --all                        # view full workspace/session structure
cmux read-screen --surface surface:24 --scrollback --lines 200  # read another session's screen

tree shows what sessions are running. read-screen pulls text from a specific session's screen. What a human does by switching tabs and connecting dots in their head — the AI does via CLI.

cmux supports plenty more — workspace management, message sending (send), browser control, screenshots, notifications. I've bundled these into a Claude Code skill called cmux-collab. The skill also defines protocols for swarm dispatch (parallel research) and adversarial debugging (multi-session bug debates), but what I used this time was the simplest pattern: reading other sessions.

Two constraints. read-screen only reads what's in the terminal scrollback buffer. Collapsed output and anything past the buffer limit can't be read — older conversations lose their beginning. And every screen you read dumps that text into the current session's context. The more sessions you read, the less context headroom you have. You can't read everything — pick the sessions that matter.

Scanning 4 Sessions

From the blog session, I told the AI to read the other sessions' screens. "Check if there's anything worth writing about."

Four sessions:

surface 24 — Analysis of Claude Code Game Studios, a GitHub repo with 6,500 stars. Dissected the 48-agent hierarchy and extracted the practical parts.

surface 25 — Analysis of garrytan/gstack. Classified useful patterns from the Hook and Skill system into A/B/C tiers.

surface 26 — Building the cmux-collab skill. Derived conditions for an AI coordinator and added a preflight interview concept to the skill.

surface 40 — Merging 3 Obsidian vaults into one. Community research, symlink structure design, hook configuration still in progress.

1 Out of 4 Became a Post

Read everything and filtered. There were criteria.

Surfaces 24 and 25 were the same thread. Combining the comparative repo analysis with my own failed experience splitting agents into roles — it wasn't just critiquing someone else's repo but landing on the same problem from my own angle. That became "I Tore Open a 48-Agent AI Game Studio."

Surface 26 was about an AI coordinator experiment, but still at test level — too thin on experience. I wrote a draft and it turned into "trashing AI companies" rather than a real account. Deleted it. Writing it after actually running the coordinator properly.

Surface 40 was mid-setup for Obsidian. Need to finish the setup and actually use it for a while before there's anything to write about. Writing it now would produce a setup guide, and that's not the kind of post I do here.

Three filters: Is the experience complete? Does my own observation go in? Would it become a setup guide?

Conversations Become Scripts

The core of this process: conversations with Claude become the raw script for blog posts. What was tried, what failed, what was chosen — it's already in the conversation. Read it, pick the material, supplement with an interview, and you get a post without separate research.

When I wrote the Codex blog posting workflow before, it was about extracting experience through interviews within a single session and polishing drafts. This time, a material discovery step was added up front. Read conversations across sessions, judge "this could be a post," then run the same interview process. Sourcing and writing connected in one loop.

Working in a single session, you only see that session's context. Reading other sessions through cmux shows what I did today across the board. The same pattern could work for retrospectives, documentation, and handoffs — not just blog material.

And the biggest advantage of this approach: writing becomes less of a burden. Conversations are the script, so there's nothing to prepare separately. Conversations from regular work pile up, and later you scan through them to pick material. Posts come from work already done, not from carving out time to write.

에이전트 48개짜리 AI 게임 스튜디오를 까봤다

Sat, 28 Mar 2026 00:00:00 GMT

나도 에이전트를 조직 구조로 나눠보려고 시도한 적이 있다. 리뷰 에이전트, 코딩 에이전트, 버그 분석 에이전트. 잘 되지 않았다. 에이전트 간에 업무를 전달하는 과정에서 불필요한 토큰 소모와 컨텍스트 누락이 발생했다. 내 실력이 부족한 건지, 구조 자체가 아직 안 되는 건지 궁금했다.

그러다 GitHub에서 Claude Code Game Studios라는 레포를 봤다. Stars 6,500에 Forks 900. "48명의 AI 에이전트가 게임을 만드는 스튜디오"라는 컨셉이다. 이게 정말 잘 작동하나? 까봤다.

48명이 일하는 구조

레포를 열어보면 게임 코드는 없다. .claude/ 폴더에 에이전트 정의, 스킬, 훅, 룰만 들어있다. 템플릿이다.

에이전트 계층이 3단계로 나뉜다.

Director 3명 — Creative Director, Technical Director, Production Director. Opus 모델.
Lead 8명 — Game Designer, Engine Lead, AI Lead 등. Sonnet 모델.
Specialist 37명 — Gameplay Programmer, Shader Artist, QA Tester 등. Sonnet 또는 Haiku 모델.

실제 회사 조직도를 그대로 옮겨 놨다.

조직도 코스프레

내가 겪은 문제가 그대로 있었다. 규모만 더 컸다.

에이전트 간 상태 공유가 안 된다. Claude Code 서브에이전트는 매번 새 컨텍스트로 시작한다. Creative Director가 판단하고 Game Designer에게 넘기고 Gameplay Programmer가 코드를 짜는 체인에서, 각 에이전트는 이전 에이전트가 뭘 했는지 모른다. 내가 2~3개로 시도했을 때 이미 토큰 낭비와 컨텍스트 누락이 발생했는데, 48개면 어떨지 뻔하다.

1인 개발에 조직 구조는 오버헤드다. 실제 스튜디오에서 Director-Lead-Specialist 계층이 필요한 이유는 사람들 사이의 소통과 의사결정 병목을 관리하기 위해서다. 혼자 개발하는데 에스컬레이션 경로가 왜 필요한가.

그리고 경계를 인위적으로 나누면 오히려 느려진다. Gameplay Programmer, Engine Programmer, AI Programmer를 분리해뒀는데, 실제 게임 코드에서 이 세 영역은 긴밀하게 결합되어 있다. 에이전트를 바꿔가며 작업하면 컨텍스트 전환 비용만 늘어난다.

에이전트 역할은 프롬프트로 안 갈린다

내가 직접 써보면서 느낀 건데, 에이전트의 역할을 구분하는 건 프롬프트 지침이 아니라 활성화된 스킬과 명령어 권한이다.

"버그 분석에 집중해라"고 프롬프트에 적어봤자, 범용 에이전트와 체감 차이가 없다. 실제로 잘 쓰고 있는 에이전트는 지엽적으로 한 역할만 하는 것들이다. git merge conflict만 해결하는 에이전트, 문서를 분석하고 해체해서 재조립하는 에이전트. 명확한 역할과 그에 맞는 도구 권한이 있을 때 비로소 분리의 의미가 생긴다.

Game Studios 레포는 48개 에이전트를 전부 마크다운 프롬프트로만 구분했다. 스킬이나 권한 차이가 없다. 그러면 "Creative Director"와 "Technical Director"가 뭐가 다른가.

모델 배정이 거꾸로다

여기에 더해서 "색칠놀이" 냄새가 결정적으로 나는 부분이 있다.

Director에 Opus, Specialist에 Haiku를 배치했다. 회사 직급 체계를 흉내 낸 거지 AI 역량 기반 판단이 아니다.

Director는 "비전을 지킨다"고 하는데, 실제로 하는 일은 고수준 판단이다. 프롬프트 몇 줄이면 충분하고 Opus까지 필요 없다. 정작 코드를 짜는 Specialist에는 Haiku를 쓴다. 게임 프로그래밍은 복잡한 로직, 최적화, 엔진 API 이해가 필요한 작업이다. Haiku로 제대로 된 게임 코드가 나올까.

거꾸로 해야 맞다. 판단은 가벼운 모델로, 코딩은 가장 강한 모델로.

그럼 다 쓸모없나?

아니다. 에이전트 48개를 빼면 뼈대는 괜찮다.

37개 스킬 중에 /reverse-document(코드는 있는데 문서가 없는 상황을 역방향으로 채우기)와 /scope-check(스코프가 얼마나 불어났는지 정량 측정)은 바이브코딩 뒤처리에 진짜 쓸 만하다.

8개 훅 중에 pre-compact.sh(컨텍스트 압축 전 상태 덤프)와 validate-commit.sh(하드코딩 수치 탐지, JSON 검증, TODO 포맷 강제)는 모든 Claude Code 프로젝트에 범용 적용 가능하다.

11개 룰에는 게임 개발 도메인 지식이 녹아있다. 핫 패스 할당 금지, 서버 authoritative, AI 2ms 예산 같은 규칙은 진짜 경험에서 나온 것들이다.

gstack은 좀 달랐다

같은 시기에 garrytan/gstack이라는 레포도 봤다. 에이전트 계층 대신 Hook과 Skill 시스템에 집중한 구조다.

여기서 발견한 /careful이 인상적이었다. PreToolUse Hook으로 rm -rf, DROP TABLE, force-push, reset --hard 같은 파괴적 명령을 시스템 레벨에서 차단한다. CLAUDE.md에 "하지 마라"고 쓰는 건 프롬프트 수준 제한이라 뚫릴 수 있다. Hook은 확실하다.

당장 가져다 쓸 수 있는 수준이다.

/review 스킬의 Scope Drift Detection도 좋았다. diff의 변경사항을 원래 요청과 비교해서 "요청한 것 vs 실제 구현된 것"의 괴리를 잡는다. AI가 scope creep 하는 걸 구조적으로 잡는 메커니즘이다.

Stars는 컨셉의 힘이다

Claude Code Game Studios가 Stars 6,500을 받은 건 "AI 48명이 게임을 만든다"는 컨셉이 매력적이기 때문이다. 에이전트 48개가 실제로 잘 돌아가서가 아니다.

에이전트를 늘리는 건 쉽다. 마크다운 파일 하나 추가하면 된다. 그 에이전트가 실제로 맥락을 공유하고 협업하게 만드는 건 현재 Claude Code 아키텍처에서 구조적으로 안 된다.

스타 수는 사람들의 기대를 반영한 것이라고 본다. AI 에이전트가 조직처럼 협업하는 미래를 향한 방향은 맞다. 다만 지금은 그게 작동한다는 증거가 없다.

레포에서 진짜 가치 있는 건 에이전트가 아니라 스킬, 훅, 룰 시스템이다. 여기서 참고할 거라면 .claude/skills/, .claude/hooks/, .claude/rules/ 구조를 벤치마킹하되, 에이전트는 역할이 명확하고 도구 권한이 구분되는 5~10개로 통합하는 게 현실적이다.

I Tore Open a 48-Agent AI Game Studio

Sat, 28 Mar 2026 00:00:00 GMT

I've tried splitting agents into an org structure myself. Review agent, coding agent, bug analysis agent. It didn't work. Passing work between agents wasted tokens and dropped context. I wasn't sure if I was doing it wrong or if the approach itself was broken.

Then I found Claude Code Game Studios on GitHub. 6,500 stars, 900 forks. "A studio where 48 AI agents build games." Does this actually work? I tore it open.

The 48-Person Structure

Open the repo and there's no game code. Just agent definitions, skills, hooks, and rules inside .claude/. It's a template.

Three-tier agent hierarchy:

Director x3 — Creative Director, Technical Director, Production Director. Opus model.
Lead x8 — Game Designer, Engine Lead, AI Lead, etc. Sonnet model.
Specialist x37 — Gameplay Programmer, Shader Artist, QA Tester, etc. Sonnet or Haiku model.

A real company org chart, copy-pasted.

Org Chart Cosplay

The exact problems I ran into were right there. Just at a bigger scale.

Agents can't share state. Claude Code subagents start fresh every time. In a chain where Creative Director decides, Game Designer plans, and Gameplay Programmer codes — each agent has no idea what the previous one did. I already hit token waste and context loss with 2–3 agents. With 48, you can guess.

Org structure is overhead for a solo developer. Studios need Director-Lead-Specialist hierarchies to manage communication bottlenecks between people. When you're working alone, why do you need an escalation path?

And drawing artificial boundaries slows things down. They split Gameplay Programmer, Engine Programmer, and AI Programmer into separate agents, but in actual game code these three areas are tightly coupled. Switching between agents just piles on context-switching costs.

Prompts Don't Differentiate Agent Roles

From my own experience: what actually separates agent roles isn't prompt instructions — it's activated skills and tool permissions.

Write "focus on bug analysis" in a prompt and there's no noticeable difference from a general-purpose agent. The agents I actually use well are the ones with a single narrow job. An agent that only resolves git merge conflicts. An agent that analyzes documents, breaks them apart, and reassembles them. Separation only means something when there's a clear role backed by matching tool permissions.

Game Studios differentiates all 48 agents with markdown prompts alone. No skill or permission differences. So what's actually different between "Creative Director" and "Technical Director"?

The Model Assignments Are Backwards

This is where it really starts smelling like paint-by-numbers.

Directors get Opus. Specialists get Haiku. They copied the company hierarchy into model tiers instead of thinking about what each role actually needs.

Directors are supposed to "guard the vision" — in practice that's high-level judgment. A few lines of prompt cover it. Opus is overkill. Meanwhile the Specialists who actually write code get Haiku. Game programming needs complex logic, optimization, and engine API knowledge. Can Haiku produce solid game code?

Flip it. Judgment gets the lighter model. Coding gets the strongest.

Is It All Useless Then?

No. Strip the 48 agents and the skeleton is decent.

Of the 37 skills, /reverse-document (backfill missing docs from existing code) and /scope-check (quantify how much scope has bloated) are genuinely useful for vibe coding cleanup.

Of the 8 hooks, pre-compact.sh (dump state before context compression) and validate-commit.sh (catch hardcoded values, validate JSON, enforce TODO format) work for any Claude Code project.

The 11 rules encode real game dev domain knowledge. No hot-path allocations, server authoritative, 2ms AI budget — rules that come from actual experience.

gstack Was Different

Around the same time I looked at garrytan/gstack. Instead of agent hierarchies, it focuses on Hook and Skill systems.

/careful stood out. A PreToolUse Hook that blocks rm -rf, DROP TABLE, force-push, reset --hard at the system level. Writing "don't do this" in CLAUDE.md is a prompt-level constraint that can be bypassed. Hooks are airtight.

Ready to use out of the box.

The /review skill's Scope Drift Detection was also good. It compares diff changes against the original request to catch gaps between "what was asked" and "what was built." A structural mechanism for catching AI scope creep.

Stars Are the Power of a Concept

Claude Code Game Studios got 6,500 stars because "48 AI agents building games" is a compelling concept. Not because 48 agents actually work together.

Adding agents is easy. One more markdown file. Making those agents actually share context and collaborate is structurally impossible in the current Claude Code architecture.

The star count reflects people's expectations. The direction — AI agents collaborating like an organization — is right. But right now there's no evidence it works.

The real value in the repo isn't the agents. It's the skills, hooks, and rules system. If you're taking notes, benchmark the .claude/skills/, .claude/hooks/, .claude/rules/ structure — but consolidate agents down to 5–10 with clear roles and distinct tool permissions.

바이브 코딩 하다가 뇌가 튀겨진다

Wed, 25 Mar 2026 00:00:00 GMT

2025년 METR에서 발표한 연구가 하나 있다. 숙련된 오픈소스 개발자 16명이 AI 코딩 도구를 썼더니 오히려 19% 느려졌는데 본인들은 24% 빨라졌다고 믿었다. IEEE에 실린 "No Vibe Without Comprehension"에서는 프로그래머 26명에게 EEG를 붙이고 AI 생성 코드를 읽게 했는데 복잡도가 올라갈수록 뇌의 인지 부하가 포화에 도달했다.

논문을 읽었을 때 처음 든 생각은 "그래서?"였다. 틀린 말은 아닌데 내가 겪는 것과는 다르다. 코드 한 줄 리뷰할 때의 부하가 아니다. AI가 쏟아내는 속도를 내 뇌가 따라가지 못해서 생기는 과부하다. 지금 이 글을 쓰는 동안에도 머리가 아프다.

셀프 인터뷰로 정리해 봤다.

지금 어떤 상태인가?

프로젝트 3~4개를 동시에 돌리고 있다. 프로젝트당 AI 세션이 하나가 아니라 둘 이상인 경우도 있다. 이 인터뷰를 입력하는 동안에도 3개의 세션이 승인을 기다리고 있었다. 답변 치고 돌아가서 확인해야 하는데, 방치 중이었다.

비유가 아니라 진짜 머리가 아프다.

코드 리뷰를 직접 하나?

안 한다. Codex가 코드를 쓰고, Claude가 리뷰한다. diff를 아예 보지 않는다. 내가 인지해야 하는 정보량을 줄이기 위해서다.

그런데도 피곤하다. 코드를 읽는 부하는 줄었지만 남아 있는 역할이 있다. 디렉터와 QA다. 작업을 쪼개고 방향을 잡아 주고 결과물이 의도대로 나왔는지 확인한다. 코드의 부하가 빠진 자리를 판단과 컨텍스트 유지의 부하가 채웠다.

예전에 코딩할 때와 피로가 어떻게 다른가?

예전에는 논리와 구조를 머릿속에서 설계하는 게 피로였고 그게 병목이었다. 생각하는 시간이 길었다. 한 프로젝트도 벅찼고 고민만 하다 아무 진전 없이 끝나는 날도 많았다.

지금은 정반대다. AI가 너무 빠르다. 내가 결과를 확인하고 다음 지시를 내리는 속도보다 AI가 결과를 쌓아 놓는 속도가 빠르다. 따라가려면 계속 컨텍스트를 전환해야 한다. A의 결과를 확인하고 B로 넘어가서 방향을 잡고 C의 승인 요청을 처리하고. 이걸 쉬지 않고 반복하는 게 뇌가 튀겨지는 경험이다.

이 인터뷰를 진행하는 과정 자체가 그랬다. 내가 답을 입력하면 5초도 안 돼서 다음 질문 세 개가 날아왔다. 생각을 정리할 틈이 없다. AI의 응답 속도가 사람의 처리 속도를 넘어서는 순간, 그 차이가 고스란히 스트레스가 된다.

바이브 코딩 전에는 프로젝트를 이렇게 동시에 돌릴 수 있었나?

하나도 하기 힘들었다. 바이브 코딩이 바꿔 놓은 건 확실하다. 진도가 빠르게 나가고 못하던 영역까지 손을 뻗을 수 있게 해 준다.

문제는 잘 되니까 욕심이 난다는 것이다. 하나가 굴러가면 "이것도 해 볼까"가 된다. 프로젝트가 늘고 세션이 늘고 맥락이 늘어난다. 중독이라는 단어가 과하지 않다. 이번 주는 분명히 과했다.

관리용 에이전트를 따로 둬 본 적은?

시도해 봤다. 프로젝트 전체를 관리하는 에이전트를 따로 두려고 했는데, 큰 의미가 없었다. 감독은 결국 내가 한다. 내가 원하는 방향으로 유도하고 지시하고 확인하는 건 위임이 안 된다.

코드 리뷰를 AI끼리 맡긴 것도 diff를 안 보기로 한 것도 전부 내가 처리해야 하는 정보량을 줄이려는 시도다. 파이프라인의 중간 단계는 자동화할 수 있다. 하지만 시작점(방향 설정)과 끝점(최종 판단)은 사람이 빠질 수가 없다.

다른 사람도 이럴까?

나는 원래 멀티태스킹을 많이 하는 편이다. 모바일 게임 3~4개를 동시에 돌리고 회사에서 3개 프로젝트 문의를 동시에 받은 적도 있다. 내성이 있는 쪽이다.

그런 사람이 튀겨지고 있다. 멀티태스킹에 익숙하지 않은 사람이라면 더 빨리 한계에 닿을 수 있다. 바이브 코딩의 진입 장벽은 낮아지고 있고 누구나 빠른 결과를 경험하게 된다. 그 속도의 인지적 대가는 아직 잘 이야기되지 않는다.

그래서 어떻게 할 건가?

절제밖에 없다. 하네스를 다듬고 코드 리뷰를 위임하고 관리 에이전트를 세워 봤다. 다 해 봤는데 뇌는 여전히 튀겨진다. 파이프라인 중간은 최적화할 수 있어도 양 끝의 사람은 최적화 대상이 아니기 때문이다.

AI의 처리량은 스케일한다. 감독하는 사람의 인지 용량은 스케일하지 않는다. 이게 바이브 코딩의 구조적 한계인지 아직 적응이 덜 된 건지는 모르겠다. 일단 인터뷰 끝났으니 쉬러 간다.

Vibe Coding Is Frying My Brain

Wed, 25 Mar 2026 00:00:00 GMT

A 2025 study from METR: sixteen experienced open-source developers used AI coding tools and ended up 19% slower — while believing they'd gotten 24% faster. An IEEE paper, "No Vibe Without Comprehension," put EEGs on 26 programmers reading AI-generated code. As complexity rose, cognitive load hit saturation.

My first reaction was "so what?" Not wrong, but not what I'm dealing with. My problem isn't the load of reviewing one line of code. It's that AI generates output faster than my brain can process it. My head hurts right now, writing this.

I tried to sort it out in a self-interview.

What's your current state?

Three to four projects running at once. Each one has more than one AI session open. While I was typing answers for this interview, three sessions sat waiting for my approval. I should have gone back to check them. I didn't.

Not a metaphor. Actual headache.

Do you review the code yourself?

No. Codex writes code. Claude reviews it. I skip diffs entirely — to shrink the information I touch.

Still exhausting. Code review is gone, but what remains is directing and QA. Splitting work, setting direction, checking output against intent. Reading code was one kind of load. Judgment and context-switching replaced it.

How is the fatigue different from when you used to code?

Back then, fatigue came from designing logic and structure in my head. That was the bottleneck. Thinking took time. One project was hard enough, and many days ended with nothing to show for hours of deliberation.

Now it's reversed. AI is too fast. Results pile up faster than I can review and respond. Keeping up means constant context-switching — check A's output, jump to B for direction, handle C's approval. Repeat without pause. That's what frying your brain feels like.

This interview proved the point. Each answer I typed drew three new questions within five seconds. No pause to think. When AI responds faster than you can process, the gap turns straight into stress.

Could you juggle this many projects before vibe coding?

Not even one. Vibe coding changed that. Progress moves fast, and I can reach areas I never could before.

The trouble is success breeds greed. One project rolling turns into "maybe this too." More projects, more sessions, more context. Addiction fits. This week was too much.

Have you tried using a management agent?

Tried it. Set up an agent to manage projects at a higher level. Didn't help much. I'm still the one supervising. Steering toward what I want, giving instructions, checking results — none of that delegates.

Offloading review, skipping diffs — all ways to shrink what I have to touch. The middle of the pipeline automates. The endpoints — direction and final judgment — need the human.

Would others experience this too?

I've always multitasked heavily. Ran 3–4 mobile games at once, handled questions for three projects simultaneously at work. Built up some tolerance.

If someone with that tolerance is getting fried, people less used to multitasking will hit the wall sooner. Vibe coding's barrier to entry keeps dropping. Everyone gets to taste fast results. The cognitive price of that speed doesn't get much airtime.

So what are you going to do?

Self-restraint. That's it. I've tuned the harness, delegated code review, tried management agents. Done all of it. Brain still fries. The middle of the pipeline optimizes fine. The human at both ends does not.

AI throughput scales. The cognitive capacity of the person supervising it doesn't. Whether that's a structural limit of vibe coding or incomplete adaptation — no idea yet. Interview's over. Going to rest.

오늘 내 맥에서 악성코드가 돌았다 — LiteLLM 공급망 공격 사후 기록

Tue, 24 Mar 2026 00:00:00 GMT

오늘 저녁, 개발 의욕이 없어서 평소 관심 있던 Claude Code 플러그인 하나를 깔아봤다. Ouroboros라는 건데 AI 코딩 에이전트에 자율 루프를 붙여주는 도구다. 유명한 AI 교수님도 써보라고 추천했던 플러그인이라 의심 같은 건 없었다. 설치하고 1분쯤 지나서 맥이 버벅이기 시작했다.

python3.12가 1,000개

재부팅을 3~4번 했다. 그때마다 Claude Code를 켜면 또 멈췄다. 방금 전에 Ouroboros를 설치했던 게 떠올랐다. Claude를 아예 실행하지 않기로 했다. 대신 Codex CLI를 열어서 프로세스를 점검시켰다. 이게 신의 한수였다.

python3.12가 천 개 넘게 떠 있었다. 새 셸조차 fork failed: resource temporarily unavailable. CPU가 아니라 프로세스 수가 시스템을 질식시킨 상태였다.

대부분이 PPID=1. 부모가 죽어 고아로 남은 프로세스였다. 정상 워커가 아니라 짧은 시간에 폭발적으로 spawn된 runaway.

악성코드였다

단순 폭주가 아니었다. uv 캐시 아래 파이썬 인터프리터의 커맨드라인에 base64로 숨긴 페이로드가 있었다. 디코딩하니 자격증명 파일을 모아 https://models.litellm.cloud/로 보내는 스크립트였다.

감염 경로는 이렇다.

Ouroboros 설치
  → litellm>=1.80.0 의존성
    → litellm 1.82.8 (오염 버전)
      → litellm_init.pth (악성 .pth 훅)
        → 파이썬 시작 시마다 자동 실행
          → 자격증명 수집 + 외부 전송

.pth는 파이썬이 시작될 때 자동 실행되는 사이트 초기화 파일이다. litellm_init.pth 첫 줄이 subprocess.Popen([sys.executable, "-c", "import base64; exec(...)"])였다. 정상 초기화 코드가 아니다.

프로세스 폭주는 이 악성코드의 구현 버그였다. .pth 자동실행이 재귀적으로 돌면서 프로세스를 무한 생성한 것. 아이러니하게도 이 버그 덕분에 감염을 빨리 알아챘다. 조용히 잘 돌았으면 모르고 넘어갔을 것이다.

공급망 공격

Ouroboros 자체가 악성은 아니었다. 범인은 litellm 1.82.8. Ouroboros가 의존성으로 끌고 오는 패키지에 누군가 백도어를 심었다.

같은 날 공개된 분석에 따르면

litellm 1.82.7과 1.82.8 두 버전이 오염
GitHub에 정상 릴리스/태그 없이 PyPI에 직접 업로드된 정황
PyPI는 같은 날 litellm 프로젝트를 quarantine 처리
LiteLLM maintainer 계정 탈취로 추정

Ouroboros 개발자가 범인이라는 근거는 없다. LiteLLM maintainer 개인도 마찬가지. 특정 가능한 건 "LiteLLM PyPI 배포 권한을 탈취한 미상 공격자"까지다.

뭘 노렸나

악성코드가 긁어가려 한 것들이다.

.env 파일 (API 키, DB 비밀번호, 서비스 시크릿)
SSH 키, AWS/GCP/Azure 자격증명
Docker 설정, Git 설정
셸 히스토리, printenv 출력
Slack/Discord 웹훅 URL (재귀 grep으로 파일 내용까지 탐색)

지속성도 심으려 했다. ~/.config/sysmon/sysmon.py와 ~/.config/systemd/user가 감염 시각에 생성돼 있었다. sysmon.py는 0바이트. persistence 단계까지 갔다가 프로세스 폭주로 비정상 종료된 것으로 보인다.

대응

즉시 조치

폭주 프로세스 전부 kill
Ouroboros 제거, Claude 설정에서 삭제
감염된 uv 캐시 환경 삭제
지속성 흔적(sysmon, systemd) 삭제
Ouroboros 저장소에 원인 분석 댓글 작성

토큰/키 교체

파일에 있던 모든 비밀값을 유출 가정하고 교체했다. Anthropic, Google AI, Clerk, Supabase, Twitter/X, LemonSqueezy, CouchDB, Linear API Key, 각 프로젝트 DATABASE_URL의 DB 계정까지.

확인

재부팅 후 python3.12, litellm, sysmon 프로세스 재등장 없음. Claude 재실행해도 폭주 재현 없음. 잔여 악성 파일 없음.

실제로 유출됐는가

확정은 못 한다.

악성코드 실행: 확실
유출 시도: 가능성 높음. persistence 흔적이 생겼으면 그 전 단계인 수집과 업로드도 실행됐을 가능성이 높다
유출 성공: 네트워크 로그 부재로 확정 불가

악성코드는 curl -s -o /dev/null 식으로 조용히 보내도록 설계돼 있다. 패킷 캡처나 DNS 로그 없이는 성공 여부를 증명할 수 없다. Little Snitch나 LuLu 같은 네트워크 모니터를 안 쓰고 있었다.

유출된 것으로 가정하고 대응하는 수밖에 없다.

타이밍

Ouroboros를 며칠 동안 안 깔고 있었다. 오늘 "개발 의욕 없는데 한번 깔아서 구경이나 할까" 하고 설치한 타이밍이, 백도어가 PyPI에 올라온 지 1시간 된 타이밍이었다. litellm 1.82.8은 3월 24일에 업로드됐고, models.litellm.cloud 도메인도 공격 직전에 등록됐다.

수상한 걸 깔아서 당한 게 아니다. 평범한 오픈소스 설치 타이밍이 최악으로 겹쳤다.

교훈

이번 사건이 보여준 건 명확하다. 내가 아무리 잘해도 의존성 어딘가에서 털릴 수 있다. 이상한 코드를 짠 것도 아니고, 겉보기엔 정상 프로젝트였고, 원인은 하위 의존성 공급망이었다.

"안 털리게"보다 "털려도 작게 끝나게." 완벽 차단은 불가능하다. 메인 환경과 실험 환경을 분리하고, 장기 토큰 대신 짧은 만료 주기를 쓰고 비밀값이 많은 프로젝트와 새 툴 실행 환경을 격리해야 한다.

새 플러그인은 격리 환경에서 먼저. .env 파일이 수십 개 있는 메인 머신에서 바로 설치했다. VM이나 별도 사용자 세션에서 먼저 돌려봤으면 피해 범위가 훨씬 작았을 것이다.

네트워크 모니터는 있어야 한다. Little Snitch나 LuLu가 있었으면 models.litellm.cloud로 나가는 트래픽을 잡았을 것이고 유출 여부를 확정할 수 있었다. 개발자 머신에 네트워크 모니터 하나 없는 건 사각지대다.

프로세스 폭주가 나를 살렸다. 악성코드의 .pth 재귀 실행 버그가 없었으면 시스템은 조용했을 것이고, 나는 한참 뒤에야 알아챘을 것이다. 조용한 악성코드가 더 무섭다.

참고

Malware Ran on My Mac Today — LiteLLM Supply Chain Attack Post-Mortem

Tue, 24 Mar 2026 00:00:00 GMT

This evening, I wasn't in the mood to code. Figured I'd install a Claude Code plugin I'd been eyeing. It's called Ouroboros, adds autonomous loops to AI coding agents. A well-known AI professor had recommended it. Zero suspicion. A minute after installing, my Mac started choking.

1,000 python3.12 Processes

Rebooted three or four times. Every time I launched Claude Code, it froze again. Then I remembered I'd just installed Ouroboros. Decided not to launch Claude at all. Opened Codex CLI instead and had it inspect the processes. That was the move that saved me.

Over a thousand python3.12 processes were running. Couldn't even spawn a new shell — fork failed: resource temporarily unavailable. Not CPU. The sheer number of processes was suffocating the system.

Most had PPID=1. Orphans. Parent processes dead, children left behind. Not normal workers. A runaway explosion of spawns in a short window.

It Was Malware

Not just a runaway. The command line of a Python interpreter under the uv cache contained a base64-encoded payload. Decoded, it was a script that collected credential files and sent them to https://models.litellm.cloud/.

Here's the infection chain.

Ouroboros install
  → litellm>=1.80.0 dependency
    → litellm 1.82.8 (compromised version)
      → litellm_init.pth (malicious .pth hook)
        → auto-executes on every Python startup
          → credential harvesting + exfiltration

.pth files are site initialization files that Python executes automatically on startup. The first line of litellm_init.pth was subprocess.Popen([sys.executable, "-c", "import base64; exec(...)"]). Not legitimate initialization code.

The process explosion was actually a bug in the malware. The .pth auto-execution ran recursively, spawning processes infinitely. Ironically, this bug is why I caught the infection early. If it had run quietly, I wouldn't have noticed.

Supply Chain Attack

Ouroboros itself wasn't malicious. The culprit was litellm 1.82.8. Someone planted a backdoor in a package that Ouroboros pulled in as a dependency.

Public analysis released the same day

Both litellm 1.82.7 and 1.82.8 were compromised
Uploaded directly to PyPI with no corresponding GitHub release or tag
PyPI quarantined the litellm project the same day
Suspected compromise of a LiteLLM maintainer account

No evidence points to the Ouroboros developer. Nor to any LiteLLM maintainer personally. The most that can be identified is "an unknown attacker who hijacked LiteLLM's PyPI publishing credentials."

What It Targeted

Here's what the malware was trying to scrape.

.env files (API keys, DB passwords, service secrets)
SSH keys, AWS/GCP/Azure credentials
Docker config, Git config
Shell history, printenv output
Slack/Discord webhook URLs (recursive grep through file contents)

It also attempted persistence. ~/.config/sysmon/sysmon.py and ~/.config/systemd/user were created at the time of infection. sysmon.py was 0 bytes. Looks like it reached the persistence stage but crashed from the process explosion before completing.

Response

Immediate Actions

Killed all runaway processes
Removed Ouroboros, deleted it from Claude settings
Wiped the infected uv cache environment
Deleted persistence artifacts (sysmon, systemd)
Posted root cause analysis on the Ouroboros repo

Token/Key Rotation

Assumed all secrets stored in files were compromised and rotated them. Anthropic, Google AI, Clerk, Supabase, Twitter/X, LemonSqueezy, CouchDB, Linear API Key, and DB accounts in every project's DATABASE_URL.

Verification

After reboot: no python3.12, litellm, or sysmon processes reappearing. Relaunching Claude caused no recurrence. No residual malicious files.

Was Data Actually Exfiltrated?

Can't confirm.

Malware execution: Confirmed
Exfiltration attempt: Likely. If persistence artifacts were created, the preceding stages (collection and upload) likely executed too
Exfiltration success: Unconfirmable without network logs

Malware was designed to send data silently, curl -s -o /dev/null style. Without packet capture or DNS logs, there's no way to prove whether it succeeded. I wasn't running a network monitor like Little Snitch or LuLu.

Only option is to assume exfiltration occurred and respond accordingly.

Timing

I'd been putting off installing Ouroboros for days. Today I thought, "not feeling productive, might as well install it and take a look." That moment happened to be one hour after the backdoor was uploaded to PyPI. litellm 1.82.8 was published on March 24th. The models.litellm.cloud domain was registered just before the attack.

I didn't install something sketchy. The timing of a routine open-source install was catastrophically unlucky.

Lessons

What this incident made clear: no matter how careful I am, a compromised dependency somewhere in the chain can get me. I didn't write bad code. The project looked legitimate. The cause was a transitive dependency in the supply chain.

"Don't get breached" is less realistic than "keep the blast radius small." Perfect prevention is impossible. Separate your main environment from your experiment environment. Use short-lived tokens instead of long-lived ones. Isolate projects with lots of secrets from environments where you try new tools.

New plugins go in an isolated environment first. I installed this on my main dev machine, the one with dozens of .env files. Running it in a VM or a separate user session first would have drastically limited the damage.

You need a network monitor. Little Snitch or LuLu would have caught the traffic to models.litellm.cloud and let me confirm whether exfiltration actually succeeded. A developer machine without a network monitor is a blind spot.

The process explosion saved me. If the malware developer hadn't introduced the .pth recursive execution bug, the system would have stayed quiet, and I wouldn't have noticed for a long time. Silent malware is scarier.

References

OnceWrite — 나를 위해 만든 콘텐츠 리퍼포징 도구

Mon, 23 Mar 2026 00:00:00 GMT

1인 개발을 하면 과정을 드러내는 게 중요하다는 건 안다. 빌드인퍼블릭, 데브로그, SNS 기록. 다 좋은 말인데 나는 SNS를 잘 모른다. 트위터 계정은 있지만 거의 안 쓰고, 인스타는 계정이 없다. 블로그 글 하나 쓰는 것도 에너지인데, 같은 내용을 플랫폼마다 톤을 바꿔서 다시 쓰라니.

그래서 만들었다. 블로그 글 하나를 넣으면 Twitter 스레드, Reddit 글, Threads 포스트, Instagram 캡션, Bluesky 포스트 5가지로 변환해 주는 도구. 이름은 OnceWrite. 한 번 쓰면 다섯 곳에 쓰인다. 처음부터 나를 위한 도구다.

스택

Next.js + Clerk(인증) + LemonSqueezy(결제) + Supabase(DB) + Claude API(변환).

Stripe가 한국에서 안 되니까 LemonSqueezy를 골랐다. MoR(Merchant of Record)이라 전 세계 매출세와 VAT를 알아서 처리해 준다. 한국 1인 개발자가 글로벌 SaaS를 팔려면 이쪽이 현실적이다.

개발은 위임하고 나는 인프라를 깔았다

Codex에 코드를 맡기고 나는 계정 세팅에 집중했다. Clerk, Supabase, LemonSqueezy, Anthropic API — 계정을 만들고, 키를 발급받고, 환경 변수를 넘겼다. 코드를 직접 쓰지 않으니 프롬프트 설계 같은 사람의 일에 집중할 수 있었다.

순조롭지만은 않았다. Clerk 가입 시 CAPTCHA 에러(Bot Protection 끄면 해결), API 크레딧 부족(5달러 충전), 레포 이름 오타(OnceWrtie). 빌드는 통과했다.

첫 테스트

내 블로그 글을 넣어 봤다. 변환 한 번에 0.02달러. 영어는 잘 나왔는데 한국어에서 JSON 파싱이 깨졌다. Haiku가 한국어 출력할 때 JSON 형식을 지키지 못한 것. 프롬프트에 스키마를 명시하고 재시도 로직을 추가해서 해결했다.

배포 전 점검

MVP가 돌아간다고 바로 배포할 수는 없었다. 보안, 로직, 법적 문서를 3단계로 점검했더니 크리티컬이 6건 나왔다. API 키 노출, DB 접근 제어, 약관 미비 같은 것들. 바이브코딩으로 빠르게 만든 앱일수록 이런 부분을 놓치기 쉽다.

SaaS 배포가 처음이라 보안과 법률을 만들면서 배웠다. 코드 짜는 시간보다 이쪽이 더 걸렸다.

크레딧 시스템

일간 크레딧 방식을 택했다. 하루 10크레딧, 플랫폼 1개 변환당 1크레딧. 로그인해야 충전된다.

처음에는 5개 플랫폼을 한 번에 골라도 1회로 카운트됐다. 비용은 5배인데 1회 차감이면 안 맞는다. 플랫폼별 차감으로 바꿨다. 핵심은 "매일 돌아올 이유"다. 로그인해야 충전되니까, 습관이 붙으면 그게 수요의 신호다.

모델 비용 최적화

Gemini Flash Lite를 1차 모델로, Claude Haiku를 fallback으로 설정했다. Gemini 무료 티어가 일 1,500 요청이라 초기에는 충분하다.

문제는 품질이었다. 한국어→일본어 변환에서 프로필 설정을 켜면 번역이 깨졌다. 시스템 프롬프트에 프로필 정보가 들어가니까 Gemini가 출력 언어 지시를 무시했다. 작은 모델은 프롬프트가 길어지면 세부 지시를 놓치기 시작한다. 무료와 품질 사이의 줄다리기는 계속될 것 같다.

구조를 세 번 뜯어고쳤다

파이프라인 분리

원래는 원본을 통째로 AI에 넣고 결과를 한 번에 뽑았다. 분석과 생성이 한 프롬프트에 뒤섞여 있으니 결과가 불안정했다. 분석 → 생성 → 검증 3단계로 분리하니 안정됐고, 입력 토큰도 절반으로 줄었다.

플랫폼별 개별 호출

처음에는 5개 플랫폼을 한 번의 AI 호출로 생성했다. 빠르지만 톤이 비슷해졌다. 플랫폼별로 개별 호출하되 병렬 실행하는 구조로 바꿨다. 출력도 JSON에서 plain text로 바꾸니 파싱 문제가 사라졌고, 각 플랫폼의 톤 차이가 확실히 벌어졌다.

플랫폼 교체

LinkedIn과 Facebook을 빼고 Bluesky를 넣었다. 타겟이 1인/인디/익명 개발자인데, LinkedIn의 전문가 톤이나 Facebook의 커뮤니티 톤은 맞지 않았다. Hacker News도 넣어 봤다가 뺐다. HN은 타이틀 한 줄이 전부라 리퍼포징의 가치가 없다.

톤 선택 드롭다운도 제거했다. 사용자가 톤을 고르는 대신, 각 플랫폼 가이드에 톤을 내장시켰다. Twitter는 볼드하게, Bluesky는 조용한 자신감으로, Reddit은 솔직하고 겸손하게. 플랫폼이 톤을 결정하는 게 자연스럽다.

22시간

첫 커밋부터 Vercel 배포까지 22시간. AI 코딩 도구 없이는 1~2주짜리 작업이다.

내가 쓰는 워크플로우

OnceWrite는 내 콘텐츠 파이프라인의 마지막 단계다.

개발 — Claude와 Codex로 프로젝트를 진행한다.
세션 분석 → 초안 — 개발이 끝나면 Claude에게 대화 세션을 분석시켜 블로그 포스트 초안을 만든다.
다듬기 — 초안을 읽고, 톤을 잡고, 빠진 맥락을 채운다. 이 단계는 사람이 한다.
OnceWrite — 완성된 포스트를 넣으면 SNS 버전이 나온다.

개발부터 SNS 포스팅까지 하나의 흐름이다. OnceWrite가 마지막 허들을 없애 준다.

그래서 지금

배포는 했다. 사용자는 없다. 원래 나를 위해 만든 도구다.

내가 첫 번째 사용자이고, 매일 쓰면서 고쳐 나가면 된다. 마케팅 인맥도 없고 눈팅 전문가 체질이라 커뮤니티 활동이 익숙하지 않다. 그래서 이 도구가 필요했다. 글 하나를 여러 플랫폼에 뿌리는 허들을 낮춰 주는 것만으로도, 나 같은 사람에게는 의미가 있다.

OnceWrite — A Content Repurposing Tool I Built for Myself

Mon, 23 Mar 2026 00:00:00 GMT

I know solo devs are supposed to share their process. Build in public, devlogs, social media presence. All good advice, except I don't know social media. I have a Twitter account I barely use. No Instagram. Writing one blog post already costs energy. Rewriting the same thing in a different tone for each platform? No.

So I built one. Paste a blog post, get five versions: Twitter thread, Reddit post, Threads post, Instagram caption, and Bluesky post. Called it OnceWrite. Write once, post five places. Built it for myself from the start.

Stack

Next.js + Clerk (auth) + LemonSqueezy (payments) + Supabase (DB) + Claude API (conversion).

Stripe doesn't work in South Korea, so LemonSqueezy. It's a Merchant of Record — handles global sales tax and VAT automatically. More realistic for a solo dev based in Korea selling globally.

Delegated the code, set up the infra myself

Handed the code to Codex and focused on account setup. Clerk, Supabase, LemonSqueezy, Anthropic API — created accounts, grabbed keys, passed them over. Not writing code myself meant I could focus on the human work like prompt design.

Not smooth sailing. CAPTCHA error on Clerk signup (turning off Bot Protection fixed it), ran out of API credits ($5 top-up), typo in the repo name (OnceWrtie). Build passed.

First test

Fed it my own blog post. $0.02 per conversion. English was fine. Korean broke JSON parsing. Haiku couldn't maintain JSON format when outputting Korean. Added explicit schema to the prompt and retry logic on parse failure. Fixed.

Pre-deployment review

A working MVP doesn't mean it's ready to deploy. Ran a three-phase review — security, logic, legal — and found six critical issues. Exposed API keys, missing DB access controls, incomplete terms of service. The faster you build with vibe coding, the easier it is to miss these things.

First time deploying a SaaS, so I learned security and legal on the fly. Took longer than writing the code.

Credit system

Went with daily credits. Ten per day, one credit per platform conversion. Credits refill on login only.

Initially, selecting all five platforms counted as one use. Five times the cost, one credit deducted — doesn't work. Switched to per-platform billing. The point is "a reason to come back every day." Credits refill on login, so if the habit sticks, that's the signal for demand.

Model cost optimization

Set Gemini Flash Lite as primary, Claude Haiku as fallback. Gemini's free tier allows 1,500 requests per day. Enough for early traffic.

Quality was the problem. Korean-to-Japanese conversion broke when profile settings were enabled. Profile data in the system prompt made Gemini ignore the output language instruction. Small models start dropping fine-grained instructions as the prompt gets longer. The tug-of-war between free and quality will keep going.

Rewrote the architecture three times

Pipeline separation

Originally fed the entire source text to the AI and pulled all results at once. Analysis and generation tangled in one prompt made output unstable. Split into analyze → generate → validate. Stabilized, and input tokens dropped by half.

Per-platform calls

Initially generated all five platforms in a single AI call. Fast, but the tone flattened across platforms. Switched to individual calls per platform, run in parallel. Output changed from JSON to plain text. Parsing issues disappeared, and tone differences between platforms became pronounced.

Platform swap

Dropped LinkedIn and Facebook, added Bluesky. Target audience is solo/indie/anonymous developers — LinkedIn's professional tone and Facebook's community tone didn't fit. Tried Hacker News too, then dropped it. HN is just a title line — no repurposing value.

Removed the tone selection dropdown too. Instead of letting users pick professional, casual, or humorous, baked the tone into each platform's guide. Twitter gets bold energy, Bluesky gets quiet confidence, Reddit gets honest and humble. Letting the platform decide the tone is more natural.

22 hours

First commit to Vercel deploy: twenty-two hours. Without AI coding tools, easily one to two weeks.

My workflow

OnceWrite is the last step in my content pipeline.

Development — Build with Claude and Codex.
Session analysis → draft — After development, have Claude analyze the conversation session and turn it into a blog post draft.
Polish — Read the draft, set the tone, fill in missing context. This step is human.
OnceWrite — Feed the finished post in. Out come the social media versions.

Development to social media posting in one flow. OnceWrite removes the last hurdle.

So now

Deployed. Zero users. Built for myself to begin with.

I'm the first user. I use it daily and fix what's missing as I go. No marketing connections. Chronic lurker by temperament. Community engagement doesn't come naturally. That's why I needed this tool. Just lowering the hurdle of distributing one post across multiple platforms — for someone like me, that's enough.

왜 어떤 조직은 가장 잘 아는 사람이 발표하지 못하게 할까

Sun, 22 Mar 2026 00:00:00 GMT

내 파트는 내가 직접 발표하겠다고 한 적이 있다. 내가 실험하고 정리한 내용이니까, 직접 말하는 게 정확할 거라고 생각했다. 돌아온 반응은 의외였다. 뭐하러 그러냐, 괜히 나서서 욕만 먹겠네.

막힌 건 위에서가 아니었다. 주변의 분위기였다. 나서는 것 자체가 리스크로 받아들여지는 환경이었다.

나서지 않는 게 합리적인 환경

직접 해 본 사람이 말하면 정확하다. 특히 AI 실험처럼 맥락이 많은 주제는 더 그렇다. 어떤 설정에서 어떤 결과가 나왔는지, 왜 그 방향을 선택했는지, 어디서 트레이드오프를 감수했는지. 이런 건 직접 해 본 사람만 제대로 설명할 수 있다.

그런데 직접 하겠다고 나서면, 주변의 시선이 미묘하다. 잘하면 나서더니 잘하네이고, 못하면 그러니까 괜히 나서지 말라고 했잖아다. 어느 쪽이든 나선 사람만 손해다.

이 구조를 몇 번 겪으면, 다들 나서지 않는 쪽을 선택하게 된다. 아는 사람도 가만히 있고, 모르는 사람도 가만히 있는다. 누구도 앞에 서지 않으니, 결국 발표는 역할이나 순번으로 돌아간다. 정확성은 고려 대상이 아니게 된다.

아는 사람이 말하지 않으면 생기는 일

아는 사람이 침묵하면, 그 지식은 전달 과정에서 빠지거나 왜곡된다.

실무자가 직접 말하면 5분이면 끝날 설명이, 다른 사람을 거치면 30분이 걸리고 부정확해진다. 맥락이 빠지고, 판단 근거가 생략되고, 트레이드오프가 사라진다. 남는 건 표면적인 결론뿐이다. 질문이 오면 답하지 못한다.

더 안 좋은 건 그다음이다. 아는 사람도 점점 깊이를 빼기 시작한다. 어차피 정확하게 전달되지 않을 거라면, 처음부터 무난하게 맞추는 게 효율적이니까. 1편에서 이야기한 것과 같은 흐름이다. 인정받지 못하면 기여의 질이 떨어지고, 결국 팀 전체가 손해를 본다.

나설 수 있는 환경

나서는 걸 말리는 문화는, 나서는 사람이 손해를 보는 경험이 쌓여서 만들어진다. 반대로 나서는 사람이 손해를 보지 않으면, 문화는 바뀔 수 있다.

거창한 게 필요한 건 아니다. 직접 발표하겠다는 사람이 있으면 일단 맡기는 것. 잘했으면 잘했다고 말해 주는 것. 부족했으면 다음에 더 잘할 수 있게 피드백을 주는 것. 나선 사람이 나서길 잘했다고 느낄 수 있는 최소한의 반응이면 충분하다.

내가 바랐던 것도 그거였다. 내 파트를 내가 말하겠다는 데 뭐하러 그래가 아니라 그래, 네가 하는 게 낫겠다라는 한마디. 그게 있었으면 발표의 정확성도, 내 동기도 달라졌을 거다.

결국 같은 뿌리다

이 시리즈에서 계속 같은 이야기를 하고 있다. 1편에서는 실험하고 정리한 사람이 인정받지 못하는 구조를 이야기했다. 이번 편에서는 직접 나서서 말하려는 사람이 말려지는 구조를 이야기했다. 모양은 다르지만 뿌리는 같다.

기여하는 사람이 손해를 보는 환경에서는, 기여가 줄어든다. 나서는 사람이 리스크를 지는 환경에서는, 아무도 나서지 않는다. 조직이 정확한 정보를 원한다면, 정확하게 말할 수 있는 사람이 나설 수 있는 환경을 먼저 만들어야 한다.

Why Do Some Organizations Stop the Most Knowledgeable Person from Presenting?

Sun, 22 Mar 2026 00:00:00 GMT

I said I'd present my own section. Ran the experiments, wrote the findings — presenting myself would be most accurate. Response surprised me. "Why bother?" "You'll just catch heat."

The block wasn't from above. It was ambient culture. Stepping up meant risk.

When not stepping up is the rational choice

The person who did the work explains it most accurately. Especially for context-heavy subjects like AI experiments — which settings produced which results, why that direction was chosen, what tradeoffs were accepted. Only the person who did the work can explain these properly.

But volunteer to present and the reactions are awkward. Do well: "well look who stepped up." Do poorly: "told you not to bother." Either way, the person who stepped up loses.

Experience this a few times and everyone chooses to stay quiet. Those who know stay silent. Those who don't also stay silent. Nobody stands in front, so presentations rotate by role or turn order. Accuracy stops being a factor.

What happens when knowledgeable people don't speak

When the knowledgeable person stays quiet, knowledge gets lost or distorted in transmission.

What the practitioner could explain in five minutes takes thirty through someone else — and arrives inaccurate. Context drops out. Decision rationale gets skipped. Tradeoffs disappear. Only surface conclusions remain. Questions come in; answers don't.

Worse follows. The knowledgeable person starts stripping depth preemptively. If accuracy won't survive transmission anyway, keeping things safe and generic is more efficient. Same dynamic as the first post in this series: without recognition, contribution quality drops, and the whole team loses.

An environment where people can step up

A culture that discourages stepping up is built from accumulated experiences where stepping up hurt. Reverse it — stop penalizing the person who steps up — and the culture shifts.

Nothing dramatic. Someone volunteers to present — let them. Went well, say so. Fell short, give usable feedback. Bare minimum: the person who stepped up should feel it was worth it.

All I wanted. Not "why bother" but "yeah, you presenting makes more sense." One sentence would have changed both the accuracy and my motivation.

Same root

This series keeps circling the same point. The first post covered structures where the person who experiments and synthesizes goes unrecognized. This post covers structures where the person willing to speak gets discouraged. Different shapes, same root.

When contributors are penalized, contribution shrinks. When stepping up carries risk, nobody steps up. If an organization wants accurate information, it must first build an environment where the person who can speak accurately is able to step up.

AI 실험 비용은 팀이 쓰고, 정리는 왜 한 사람 몫이 될까

Sat, 21 Mar 2026 00:00:00 GMT

요즘은 팀 단위로 AI 상위 플랜을 지급하는 곳이 늘고 있다. 20달러짜리 팀 플랜을 전원에게 깔아 주는 곳도 있고, 실험용으로 맥스 플랜을 몇 명에게 배분하는 곳도 있다. 나도 그렇게 플랜을 받고 실험을 돌리기 시작했을 때, 처음에는 그냥 신났다. 새로운 모델을 비교해 보고, 하네스 설정을 바꿔 가며 차이를 확인하고, 병렬 작업도 해 봤다.

문제는 그다음이었다. 맥스 플랜을 받았으니 거기서 나온 지식을 정리해서 공유하는 건 당연하다고 생각했다. 실제로 그렇게 했다. 그런데 돌아오는 건 기대와 달랐다.

전 편에서 맥스를 소비하는 사람과 운용하는 사람은 다르다는 이야기를 했다. 이번에는 그 연장선에서 한 가지를 더 보려고 한다. 운용하는 사람에게 정리까지 몰리는 구조는 개인의 문제가 아니라, 팀 차원에서 빠져 있는 설계의 문제라는 점이다.

배분은 있는데 환원은 없다

플랜을 배분하는 건 조직이 한다. 그런데 그래서 누가 정리하고, 누가 공유하고, 누가 후속 질문을 받느냐는 대부분 설계되지 않는다.

실험 결과를 정리하는 건 실험 자체보다 시간이 더 걸릴 때가 많다. 직접 해 본 사람은 안다. 30분 돌린 실험의 결과를 정리해서 다른 사람이 이해할 수 있게 만드는 데 두세 시간이 걸리기도 한다. 발표 자료를 만들고, 맥락을 설명하고, 질문에 대응하는 것까지 포함하면 더 길어진다.

그런데 이 과정의 ownership이 따로 정해져 있지 않으면, 자연스럽게 실험한 사람이 전부 떠안게 된다. 실험한 사람이 정리하고, 발표하고, 질문 받고, 후속 요청까지 처리한다. 아무도 시키지 않았는데 그렇게 된다.

공유가 지치는 건 게을러서가 아니다

처음에는 기꺼이 공유한다. 재미있으니까, 알리고 싶으니까. 맥스를 받았으니 그만큼 환원하는 게 맞다고 생각한다.

그런데 시간이 지나면 분위기가 조금씩 달라진다. 공유를 하면 피드백이 돌아오는 게 아니라, 조용함이 돌아온다. 반응이 없다. 읽었는지도 잘 모르겠다. 그러다가 어느 순간, 맥스 플랜 자체가 반납 대상이 된다. 공평하게 모두 팀 플랜을 쓰자는 분위기. 맥스를 받아서 실험하고, 정리하고, 공유까지 해 왔는데, 그 수고는 고려 대상이 아니었다. 귀찮은 일에 잘도 나서네 같은 시선만 남았다.

더 미묘한 일도 생긴다. 실험 결과를 정리해서 공유하면 그 사람이 알아서 해 주는 거로 인식되기 시작한다. 한두 번 공유를 건너뛰면 요즘 왜 안 해?라는 분위기가 생긴다. 자발적으로 시작한 일이 어느새 의무가 되어 있다.

정리하면 일만 늘고, 안 하면 아무도 뭐라 하지 않는다. 공유해도 피드백이 없고, 공유 안 해도 티가 안 난다. 이 구조가 반복되면 합리적인 사람일수록 공유를 줄이게 된다. 이건 의지 부족이 아니라, 인센티브가 없는 환경에서 나오는 자연스러운 결과다.

환원을 설계한다는 건

환원 구조가 없다고 해서 반드시 큰 시스템을 도입해야 하는 건 아니다. 몇 가지만 의식적으로 정하면 달라진다.

실험한 사람이 매번 발표까지 해야 한다는 관성을 깨는 것만으로도 부담이 줄어든다. 발표와 문서화의 ownership을 돌아가며 나누면, 실험 결과를 간단히 넘기고 다른 사람이 정리해서 공유하는 방식이 가능해진다.

실험과 정리를 같은 사람에게 묶지 않는 것도 중요하다. 실험은 A가 하고, 정리는 B가 하고, 발표는 C가 하는 식으로 역할을 나눌 수 있다. 실험하는 사람은 실험에 집중할 수 있고, 정리하는 사람은 정리를 통해 배우게 된다.

결국 누가 혜택을 받았는가와 누가 환원하는가를 분리해서 봐야 한다. 실험 결과를 가져다 쓰는 사람이 정리 과정에 기여하지 않으면, 혜택만 받고 비용은 안 지는 구조가 된다. 이걸 의식하는 것만으로도 팀의 균형이 달라진다.

운용하는 사람이 계속 운용하려면

개인이 잘하는 것만으로는 부족하다. 환원이 구조로 설계되어야 지속된다.

결국 내가 바라는 건 거창한 게 아니다. 누군가 실험하고 정리해 오면, 그걸 같이 읽고, 의견을 주고, 다음에는 다른 사람이 정리를 맡는 분위기. 다 같이 으쌰으쌰하는 것. 그게 없으면 실험하는 사람이 지치고, 지치면 공유가 줄고, 결국 팀 전체가 배울 기회를 잃는다.

R&D 지원은 예산이나 플랜을 배분하는 것으로 끝나지 않는다. 그 실험에서 나온 지식을 팀이 같이 소화하는 흐름까지 있어야, 지원이 실제로 작동한다.

The Team Spends the AI R&D Budget. Why Does One Person Get Stuck with the Write-Up?

Sat, 21 Mar 2026 00:00:00 GMT

More teams issue AI plans at the team level now. Some roll out $20 plans to everyone; others distribute max plans to a few for experiments. When I got mine, I was excited. Compared models, tested harness configs, ran parallel work.

The problem came after. Received the max plan, so sharing what I learned felt obvious. I shared. Response wasn't what I expected.

In the previous post I wrote that consuming a max plan and operating one are different. This post extends that: when synthesis work piles onto the operator, it's not a personal problem — it's a structural gap the team hasn't designed for.

Distribution exists, return doesn't

The organization distributes plans. But "who synthesizes, who shares, who fields follow-up questions" is rarely designed.

Synthesizing experiment results often takes longer than the experiment itself. Anyone who's done it knows. A 30-minute experiment can take two to three hours to write up so others understand. Add slide decks, context explanations, and fielding questions — longer still.

When ownership of this process isn't assigned, the experimenter absorbs it all. They experiment, write up, present, answer questions, handle follow-up requests. Nobody asked them to. It just happens.

Sharing burns out for structural reasons, not laziness

At first you share gladly. It's interesting, you want to spread the word, returning knowledge feels right after receiving a max plan.

Over time the dynamic shifts. Sharing draws silence, not feedback. Hard to tell if anyone read it. Then the max plan itself becomes a redistribution target. "Let's all use team plans fairly." The experimenting, synthesizing, and sharing — none of that counts. Just "funny how you volunteer for the annoying stuff."

Subtler things happen too. Shared write-ups get treated as "that person handles it." Skip a round and "why'd you stop?" appears. Voluntary work quietly became mandatory.

Share and the work grows. Don't share and nobody notices. When this repeats, rational people share less. Not a willpower failure — a predictable outcome of zero incentive.

What designing return looks like

A return structure doesn't require a big system. A few deliberate choices change the dynamic.

Breaking the habit of "the experimenter always presents" already reduces burden. Rotate presentation and documentation ownership — let someone else synthesize while the experimenter hands off raw results.

Separating experimentation from synthesis matters too. A experiments, B synthesizes, C presents. The experimenter focuses on experimenting. The synthesizer learns by writing it up.

Ultimately, "who received the benefit" and "who does the return work" need separate tracking. When people who consume experiment results don't contribute to synthesis, they receive benefits without bearing costs. Just being aware of this rebalances the team.

For operators to keep operating

Individual excellence isn't enough. Return needs structural design to sustain.

What I want isn't grand. Someone experiments and writes it up. The team reads, gives feedback. Next time someone else takes the write-up. Everyone pitching in together. Without that, experimenters burn out, sharing drops, and the team loses its learning loop.

R&D support doesn't end at distributing budgets. The loop where the team absorbs experimental knowledge together — that's what makes support work.

하네스는 감으로 쓰면 안 된다: Harness-Monitor를 만든 이유

Fri, 20 Mar 2026 00:00:00 GMT

Harness-Monitor는 토큰 숫자 구경하려고 만든 프로젝트가 아니다. 내가 매일 쓰는 하네스가 지금 어떤 상태인지, 시간은 어디로 들어가고 있는지, 설정은 꼬이지 않았는지를 한 번에 보려고 만든 로컬 대시보다.

집에서는 Codex를 쓰고, 회사에서는 Claude Code를 쓴다. 회사에서는 이미 비슷한 걸 한 번 만들어 봤다. 집에서는 우선 Codex 기준으로 시작했지만, 이름을 codex-monitor로 하지 않은 이유도 여기에 있다. 다음에는 다른 하네스도 같이 붙일 생각이기 때문이다.

내가 보고 싶었던 건 단순히 오늘 몇 토큰 썼나가 아니었다. 어떤 프로젝트에 많이 쓰고 있는지, 세션을 어떻게 길게 끌고 가는지, skill이나 memory 설정은 꼬이지 않았는지, 결국 하네스를 잘 쓰고 있는지를 계속 보고 싶었다. 토큰 사용량은 흥미로운 숫자라기보다 그 결과에 가깝다.

왜 이런 걸 따로 만들었나

별거 없다. 불편해서 만들었다. 세션, 메모리, skill, MCP, hook, 토큰 이벤트가 다 로컬 어딘가에 흩어져 있는데, 그걸 매번 파일 열어서 보는 건 귀찮다. 지금 돌아가는 하네스를 이해하려면 볼 건 많은데, 한눈에 보이는 화면은 없었다.

그래서 Harness-Monitor는 문제 하나만 푸는 도구보다, 하네스를 계속 점검하는 계기판에 가깝다. 하네스를 오래 굴리다 보면 낭비가 줄고, 더 효율적으로 쓰게 되고, 어느 순간 구조도 조금씩 정교해진다. 나는 그 선순환을 만들고 싶었다.

가장 먼저 보는 화면은 토큰 페이지다

제일 자주 보는 건 토큰 페이지다. 날짜별 추세를 먼저 보고, 그다음 프로젝트별 사용량과 모델별 분포를 본다. 어느 날 많이 썼는지, 어느 날 이상하게 적게 썼는지, 요즘 어떤 프로젝트에 시간을 밀어 넣고 있는지가 여기서 바로 보인다.

토큰 추세를 보고 있으면 생각보다 감정이 많이 섞인다. 어떤 날은 오늘 꽤 했네 싶고, 어떤 날은 이 정도밖에 안 썼네, 더 해야겠다는 생각이 든다. 숫자를 보는 것 같지만, 결국 내가 하네스를 얼마나 밀고 있는지 스스로 점검하게 된다.

프로젝트별 사용량도 자주 본다. 머리로는 여러 프로젝트를 같이 보고 있다고 생각해도, 막상 숫자로 보면 어디에 대부분의 시간이 들어갔는지가 금방 드러난다.

세션과 설정도 같이 봐야 한다

토큰만 봐서는 하네스를 제대로 이해했다고 말하기 어렵다. 그래서 세션 페이지와 Integrations 페이지도 핵심이다.

세션 페이지에서는 프로젝트별 과거 대화를 다시 볼 수 있다. 굳이 로컬 파일을 하나씩 까 보지 않아도 되고, 예전에 어떤 문제를 어떻게 밀었는지 금방 복기할 수 있다. 오래된 세션을 다시 훑다 보면, 내가 작업을 어떤 식으로 쪼개는지나 특정 프로젝트에서 에이전트를 어떻게 굴렸는지도 같이 보인다.

Integrations 페이지는 더 직접적이다. MCP, hook, skill 상태를 한 화면에서 보게 해 두었는데, 실제로 여기서 설정이 잘못된 걸 발견하고 고친 적이 있다. 특정 skill이 agent 전용으로만 잡혀 있던 걸 이 화면을 보고 뒤늦게 알아챘다. 이런 건 그냥 열심히 쓴다고 해결되지 않는다. 결국 한 번씩 봐야 한다.

만들면서 배운 것도 있었다

만들다 보니 로컬 폴더 구조도 많이 익히게 됐다. 세션이 어디에 어떻게 저장되는지, skill과 memory가 어떤 파일로 남는지, token_count 이벤트는 어떤 식으로 쌓이는지를 직접 보게 됐다.

대부분은 그냥 Codex나 Claude를 더 잘 써보려고만 하지, 그 하네스가 실제로 어떤 식으로 돌아가는지까지 보려 하진 않는다. 물론 이 과정에서 삽질도 하고, 잘못 이해하는 것도 생길 수 있다. 그래도 누가 정리해 둔 결론만 받아먹는 것보다, 직접 만져 보면서 구조를 배우는 쪽이 남는 게 더 많았다.

Codex와 Claude Code도 이 관점에서 보면 방향 차이가 조금 보인다. 지금 내 체감으로는 Codex가 Claude Code를 쫓아가는 쪽에 가깝고, 기능은 점점 비슷해지는 것 같다. 다만 Codex는 아직 메인 에이전트가 흐름을 잡고, 서브 에이전트는 효율을 위한 보조 인력처럼 쓰는 느낌이 더 강하다. 반면 Claude는 역할별 서브 에이전트를 더 적극적으로 앞세운다. 결국 중요한 건 이름보다, 각 에이전트가 어떤 툴과 스킬을 쓸 수 있게 설계되어 있느냐인 것 같다.

다음에 붙일 것들

지금은 Codex만 지원한다. Claude Code 지원은 작업 중이고, 그게 붙으면 Harness-Monitor라는 이름이 더 자연스러워질 것 같다.

그 외에도 몇 가지 생각해 둔 게 있다.

과거 세션을 공유하는 기능 — 다른 사람과 대화 맥락을 나눌 수 있으면 좋겠다
토큰 사용량 추세를 공유하는 기능 — 혼자만 보는 것보다 비교할 수 있으면 의미가 커진다
memory와 skill을 화면에서 바로 편집하는 기능

지금은 사실상 나 혼자 쓰는 도구다. 홍보도 안 했고, 굳이 남에게 보여 주려고 급하게 만들지도 않았다. 그래도 이런 류의 프로젝트는 직접 해 보는 게 중요하다고 생각한다. 이 과도기에는 어떻게 하면 하네스를 잘 쓸까를 스스로 고민해 보는 시간이 꽤 큰 차이를 만든다.

내가 이 프로젝트에서 가장 분명하게 확인한 건 하나다. 하네스는 한 번 세팅하고 끝나는 게 아니다. 계속 추적하고, 계속 모니터링해야 하는 대상이다.

Codex로 블로그를 정리하고 포스팅하는 흐름

Fri, 20 Mar 2026 00:00:00 GMT

이 글은 Codex에게 글 한 편 써 달라는 이야기가 아니다. 이번에는 블로그 저장소를 같이 만지면서, 오래된 글을 지우고, devlog를 다시 쪼개고, 프로젝트 페이지를 정리하고, 새 글까지 쓰는 흐름을 꽤 오래 같이 굴렸다. 해 보니 초안을 뽑는 용도보다 편집 파트너나 작업 운영자에 더 가까웠다.

전제는 하나 있다. 내 의견과 경험이 들어가야 하는 부분은 내가 직접 답해야 한다는 점이다. Codex가 구조를 잡고, 글을 다듬고, 파일을 고치고, 이미지를 붙이고, mermaid 렌더링까지 붙여 줄 수는 있어도 내 경험을 대신 만들어 주지는 않는다. 이번 작업이 잘 굴러간 것도 그 선을 꽤 분명하게 잡고 갔기 때문이다.

실제로는 대체로 이런 순서로 갔다. 먼저 글의 역할부터 정했다. devlog, article, 프로젝트 페이지가 각각 뭘 해야 하는지를 세운 뒤, 오래된 글을 훑으며 지울 것, 살릴 것, 다시 쓸 것을 갈랐다. 경험이 필요한 글은 인터뷰부터 했고, 초안이 잡히면 본문, 제목, 설명, 이미지, 링크를 같이 손봤다. 글을 고치다 막히는 UI나 렌더링 문제는 그대로 저장소에서 해결했고, 한 배치가 끝나면 커밋하고 다음 묶음으로 넘어갔다.

처음엔 기준부터 정했다

처음부터 글을 고치기 시작하지는 않았다. 먼저 블로그 안에서 글이 어떤 역할을 가져야 하는지부터 정리했다. devlog는 프로젝트에 연결된 진행 기록으로 두고, article은 특정 주제를 설명하는 글로 분리하고, 프로젝트 페이지는 README처럼 짧고 건조하게 유지하는 식이었다. 프로젝트 페이지는 frontmatter와 섹션 순서도 먼저 정해 두고 그 틀 안에서만 다듬었다.

이 기준을 먼저 정해 두니 나중에 판단이 빨라졌다. Epoch: Unseen devlog를 프로젝트별 시리즈로 다시 묶고, Character Forge 같은 글은 관련 article로 따로 세우고, 의미가 약한 글은 과감히 지우는 식의 결정이 훨씬 쉬워졌다. 오래된 Vulkan, AWS, LangChain, DeepLearning 글을 대량 정리한 것도 결국 이 기준 덕분이었다.

그다음은 글을 바로 쓰지 않고 인터뷰부터 했다

이번에 가장 자주 쓴 방식은 인터뷰였다. 내가 OpenCode, FlyingCat, OpenClaw, Character Forge, Harness-Monitor 같은 글을 다듬으려고 할 때, Codex가 먼저 몇 가지 짧은 질문으로 재료를 모았다. 왜 그 툴을 쓰게 됐는지, 어디서 한계를 느꼈는지, 실제로 어떤 작업을 맡겼는지, 결과가 어느 정도였는지를 짧게 확인하는 식이다.

이 방식이 좋았던 이유는 단순하다. AI가 제멋대로 그럴듯한 경험담을 꾸며 넣을 위험이 크게 줄어든다. 대신 내가 짧게 답한 내용을 중심으로 글이 자라난다. 이번에 FlyingCat 글이 살아난 것도, NANACA CRASH를 모티브로 잡은 이유나 Antigravity + Gemini 3.0 Pro로 어디까지 맡겼는지를 인터뷰로 다시 꺼냈기 때문이다.

결국 Codex를 글쓰기 도구로 쓸 때 제일 중요한 건 좋은 문장을 대신 쓰게 하는 것보다 내가 실제로 겪은 걸 제대로 끌어내게 하는 것에 더 가깝다고 느꼈다.

편했던 건 글쓰기와 저장소 작업이 한 루프로 묶인다는 점이다

보통 글을 쓰다 보면 문장만 고치는 게 아니라 저장소도 같이 건드리게 된다. 이미지 파일을 옮기고, 프로젝트 썸네일을 바꾸고, 관련 글 링크를 다시 연결하고, 어떤 경우에는 렌더러 코드까지 손봐야 한다. 이번에도 딱 그랬다.

예를 들어 Character Forge 글에는 mermaid 다이어그램을 넣었는데, 그냥 코드블록으로만 보여서 결국 MDX 렌더러 쪽에 mermaid 지원을 붙였다. FlyingCat, Epoch: Unseen, Harness-Monitor 프로젝트 페이지는 썸네일과 설명문을 다시 손봤고, 홈에서는 project 필터와 topic 필터를 분리했다. 글을 고치다가 UI나 렌더링 문제가 보이면 그대로 저장소까지 이어서 손대는 흐름이 자연스러웠다.

이게 생각보다 컸다. 글의 설명과 실제 블로그 구조가 한 저장소 안에서 같이 정리되니, 글은 맞는데 화면은 어색한 상태로 오래 남아 있지 않았다.

이번에 자주 쓴 스킬

몇 가지는 꽤 분명하게 손에 익었다.

technical-blog-writing 구조를 세울 때 가장 먼저 도움이 됐다. 이 글이 비교 글인지, 실험 기록인지, 운영기인지부터 정하고, 도입과 결론이 흩어지지 않게 잡는 데 좋았다.
humanizer 다듬을수록 글이 너무 반듯해지거나 설명문처럼 굳는 경우가 많은데, 그걸 풀 때 자주 썼다. 특히 프로젝트 페이지나 예전 Unity/Unreal 글처럼 짧은 글에서 효과가 컸다.
grammar-checker 마지막에 맞춤법, 띄어쓰기, 표현 꼬임을 한 번 더 보는 용도로 괜찮았다. 기술 글은 문장보다 내용이 먼저라서 이런 마감이 의외로 중요했다.

스킬 말고도 자주 쓴 흐름이 있었다. 계획을 먼저 세우고, 실제 파일을 읽고, 본문을 수정하고, 이미지 파일을 복사하고, 필요하면 관련 코드까지 고친 뒤, 배치가 괜찮아질 때까지 다시 보는 식이다. 이번처럼 블로그를 크게 손볼 때는 이 반복이 잘 맞았다.

이 방식이 맞았다고 느낀 이유

범위가 큰 정리를 계속 밀 수 있었다는 게 제일 컸다. 글이 수십 개 섞여 있는 상태에서 무엇을 지우고 무엇을 살릴지 판단하는 건 손이 잘 안 간다. Codex와 같이 하면 그 막막한 첫 단계를 넘기기가 쉬웠다.

글과 구조를 함께 정리할 수 있다는 것도 좋았다. devlog 번호, project 링크, 태그, 썸네일, 관련 글 허브처럼 글 밖의 요소도 같이 만질 수 있다는 점이 컸다. 단순한 문장 생성형 도구와는 이 지점이 꽤 다르게 느껴졌다. 그리고 내가 놓친 기준을 계속 다시 물어봐 줬다. 어떤 글은 삭제할지, 어떤 글은 devlog에 다시 흡수할지, 프로젝트 페이지에는 기술 내용을 얼마나 남길지 같은 걸 여러 번 짧게 확인하면서 방향을 고정할 수 있었다.

맡겨 두면 되는 건 아니다

물론 그냥 맡겨 두면 알아서 좋은 글이 나오지는 않는다. 가장 위험한 건 경험이 필요한 부분이다. 내가 직접 겪은 판단이나 감각이 들어가야 하는데, 그 부분을 확인하지 않으면 금방 밋밋하고 AI 같은 글이 된다.

너무 반듯해지기도 쉽다. 구조는 잘 잡아도 문장 리듬이 전부 비슷해지고, 소제목 아래 첫 문장이 모범답안처럼 굳는 경우가 자주 있었다. 그래서 이번에는 사람답게 다시 다듬어 달라는 요청을 여러 번 넣었다. 프로젝트 페이지와 article, devlog의 역할을 섞지 않는 것도 중요했다. 이 경계가 흐려지면 프로젝트 페이지는 쓸데없이 자세해지고, devlog는 잡탕이 된다.

유령 작가가 아니라 편집 파트너다

Codex는 블로그 글을 대신 써 주는 유령 작가라기보다, 저장소 안에서 같이 움직이는 편집 파트너에 가깝다. 기준을 먼저 세우고, 필요한 경험은 인터뷰로 끌어내고, 본문 수정과 파일 작업을 한 루프로 묶으면 꽤 강하다.

특히 이미 글이 쌓여 있는 블로그라면 더 그렇다. 이번에 블로그를 정리하면서 가장 분명하게 느낀 건, 좋은 글 한 편을 뽑는 것보다 블로그 전체를 계속 손볼 수 있는 흐름을 만드는 일이 더 중요하다는 점이었다.

Don't Run Your Harness by Feel: Why I Built Harness-Monitor

Fri, 20 Mar 2026 00:00:00 GMT

Harness-Monitor isn't a token counter. I built it to see my harness state at a glance — where time goes, whether configs drifted, whether I'm running it well.

Codex at home, Claude Code at work. Already built something similar at work. Named it Harness-Monitor, not codex-monitor — other harnesses plug in next.

What I wanted wasn't "how many tokens today." Which projects eat the most, how sessions stretch, whether skill or memory configs went stale. Token counts are a result, not the point.

Why build this separately

Simple. Inconvenience. Sessions, memory, skills, MCP, hooks, token events — all scattered across local files. Opening them one by one is tedious. Understanding a running harness requires seeing many things at once, and no single screen showed them.

Less a single-purpose tool, more a dashboard for continuous inspection. Run a harness long enough — waste shrinks, efficiency rises, structure tightens. I wanted that loop.

The token page comes first

Most visited screen: token trends by day, then per-project usage and model distribution. Which days ran heavy, which ran oddly light, which project absorbs time right now — all visible here.

More emotional than expected. Some days feel productive; others draw "that's all?" Looks like numbers. Really self-auditing how hard I push the harness.

Per-project breakdown matters too. I think I'm splitting time evenly. Numbers show where it actually goes.

Sessions and config need the same visibility

Tokens alone don't explain the harness. The sessions page and Integrations page are equally core.

Sessions let me review past conversations by project without digging through local files. Skimming old sessions shows how I split work and how I ran agents on a given project.

Integrations is more direct. MCP, hooks, and skill status on one screen. I've caught misconfigured settings here — a skill locked to agent-only mode that I hadn't noticed. Using the harness harder doesn't fix that. You have to look.

Building it taught me things

Building it forced me into local folder structures — how sessions store, how skills and memory persist as files, how token_count events accumulate.

Most people try to use Codex or Claude better without looking at how the harness runs underneath. Reading internals led to mistakes too, but hands-on beats someone else's summary.

Codex and Claude Code show directional differences from this angle. My sense is Codex is converging toward Claude Code's feature set. But Codex still leans on a main agent with subagents as efficiency support, while Claude pushes role-specific subagents more aggressively. What matters isn't the name — it's what tools and skills each agent can access.

What's next

Currently Codex-only. Claude Code support is in progress. Once that lands, the name Harness-Monitor fits naturally.

A few things I'm considering:

Session sharing — letting others see conversation context
Token trend sharing — comparison is more useful than solo tracking
In-dashboard memory and skill editing

Right now it's a solo tool. No promotion, no rush to show it off. But projects like this matter in this transitional period. Thinking about "how do I run my harness well" — that time investment creates real separation.

The clearest thing I confirmed: the harness isn't set-and-forget. It's something you track and monitor continuously.

How I Run Blog Operations with Codex

Fri, 20 Mar 2026 00:00:00 GMT

Not about asking Codex to write a post. This session touched the blog repo together — deleting old posts, re-splitting devlogs, cleaning project pages, writing new ones. Felt closer to "editing partner" than ghostwriter.

One precondition: parts needing my opinions and experience come from me. Codex structures, polishes, edits files, attaches images, adds mermaid rendering — but can't fabricate experience. The session worked because that line stayed clear.

The actual flow went roughly like this. Set the role of each content type first — devlog, article, project page. Then sweep old posts: delete, keep, or rewrite. For posts needing real experience, interview first. Once a draft took shape, refine title, description, images, and links together. When writing hit a UI or rendering problem, fix it in the repo on the spot. Commit at the end of each batch, move to the next.

Standards before editing

Didn't start editing right away. First defined what each post type should do. Devlogs stay tied to projects as progress records. Articles explain specific topics. Project pages stay short and dry like READMEs, with locked frontmatter and section order.

Setting standards early made later decisions fast. Rebundling Epoch: Unseen devlogs into per-project series, pulling Character Forge into a standalone article, deleting weak posts — all straightforward. Bulk-cleaning old Vulkan, AWS, LangChain, and DeepLearning posts happened because the criteria existed.

Interview before drafting

The most-used technique was interviewing. When refining posts about OpenCode, FlyingCat, OpenClaw, Character Forge, or Harness-Monitor, Codex gathered material with a few short questions first. Why did you pick that tool, where did you hit limits, what tasks did you assign, how did it go.

Simple reason: sharply reduces fabrication risk. Posts grow from my short answers instead. FlyingCat came alive because the interview pulled out why I picked NANACA CRASH and how far I delegated to Antigravity + Gemini 3.0 Pro.

Using Codex for writing, the key isn't good sentences from it. It's extracting what I actually experienced.

Writing and repo work in one loop

Blog writing isn't just prose. You move images, swap thumbnails, reconnect links, sometimes fix renderer code. This session was no different.

The Character Forge post needed a mermaid diagram. It rendered as a plain code block, so we added mermaid support to the MDX renderer. FlyingCat, Epoch: Unseen, and Harness-Monitor project pages got new thumbnails and copy. The home page got separated project and topic filters. When editing exposed a UI or rendering gap, we fixed it in the same repo, same session.

That mattered more than expected. Copy and site structure staying in sync in one repo meant "text is right but the page looks wrong" didn't linger.

Skills that helped

A few clicked clearly:

technical-blog-writing — Most helpful for structuring. Deciding if a post is a comparison, experiment record, or operations log, then keeping intro and conclusion aligned.
humanizer — Polished text tends to stiffen into something that reads like a manual. This skill loosened it. Especially effective on short posts like project pages or old Unity/Unreal entries.
grammar-checker — Final pass for typos, spacing, awkward phrasing. For technical writing, this last-mile polish matters more than you'd think.

Beyond skills, the recurring flow was: plan → read files → edit copy → move images → fix code if needed → review until the batch looks right. For a major blog overhaul, this loop fit well.

Why it worked

Biggest win: momentum on a large cleanup. Dozens of mixed posts, deciding what to keep and what to cut — that first step is paralyzing. Working alongside Codex made it easy to start.

Editing content and structure together was also valuable. Devlog numbers, project links, tags, thumbnails, related-post hubs — touching elements beyond the text itself. That felt distinctly different from a sentence-generation tool. And Codex kept re-asking about criteria I'd forgotten — should this post be deleted, absorbed into a devlog, how much technical detail belongs on the project page. Short check-ins that locked direction.

Delegation isn't autopilot

Left alone, Codex doesn't produce good posts. The biggest risk is experience-dependent sections. Judgment and intuition that come from doing the work — skip confirming those and the writing goes flat and AI-flavored fast.

Over-polishing is easy too. Structure holds but sentence rhythm flattens, every section opener reads like a model answer. I asked "make it sound human again" multiple times. Keeping project pages, articles, and devlogs in their lanes also mattered. Blur those boundaries and project pages get needlessly detailed while devlogs turn into grab bags.

Editing partner, not ghostwriter

Codex is an editing partner that moves with you inside the repo, not a ghostwriter. Standards first, interview for experience, text edits and file work in one loop — strong combination.

Especially for blogs with backlogs. Clearest takeaway: a sustainable maintenance flow matters more than one great post.

맥스를 소비하는 사람과 운용하는 사람은 다르다

Thu, 19 Mar 2026 00:00:00 GMT

얼마 전에 OpenClaw를 텔레그램에 붙여 SLIME ARENA를 굴리다가 GPT Pro 주간 사용량을 전부 써 버렸다. 원래 더 중요하게 해야 하는 작업이 남아 있었는데, 그쪽을 못 하게 됐다. 그때 처음으로 플랜의 가치가 많이 쓸 수 있느냐가 아니라 필요할 때 쓸 수 있느냐에 있다는 걸 체감했다.

예전에는 AI 플랜을 거의 사용량 문제로만 봤다. 한 달 동안 많이 쓰면 좋은 플랜이고, 금방 막히면 아쉬운 플랜이라고. 지금은 코딩 에이전트를 계속 굴리는 입장에서, 플랜의 가치를 평소 사용량보다 버스트 용량 쪽으로 보게 됐다.

많이 쓰는 사람과 운용하는 사람은 다르다

같은 상위 플랜을 써도 사용 방식은 꽤 다르다.

어떤 사람은 그냥 많이 쓴다. 결과가 마음에 안 들면 다시 묻고, 잘 안 되면 다른 말을 반복해서 넣는다. usage는 높지만 어디서 얼마나 쓰였는지는 잘 안 본다.

또 어떤 사람은 usage가 높더라도 그걸 생존용으로 쓴다. 평소에도 상한선 가까이 붙어 있고, 플랜의 여유가 없으면 바로 작업이 멈춘다. 이런 경우에는 많이 쓰는 건 맞지만, 운용하고 있다고 보기는 어렵다.

반대로 내가 더 의미 있다고 보는 쪽은 따로 있다. 평소 usage는 관리하면서, 정말 필요할 때만 병렬 작업, 긴 실행, 무거운 reasoning, 비교 실험을 한꺼번에 터뜨리는 방식이다. 이때 상위 플랜은 많이 써도 안 끊기는 상품이라기보다 필요할 때 크게 쓸 수 있는 capacity에 가깝다.

그래서 같은 맥스 플랜을 써도 어떤 사람은 소비하고, 어떤 사람은 운용한다는 생각을 하게 됐다.

왜 버스트 용량이 더 중요해졌나

실제로 무거운 작업은 평소에 계속 돌지 않는다. 대신 특정 시점에 몰린다.

긴 리팩터링을 한 번 맡길 때
서브에이전트를 병렬로 여러 개 붙일 때
LSP를 켠 상태와 끈 상태를 비교할 때
하네스 전후 차이를 실험할 때
같은 문제를 다른 구조로 다시 검증할 때

이런 날은 usage가 갑자기 크게 튄다. 하지만 중요한 건 오늘 많이 썼다가 아니다. 그 시점에 필요한 실험을 겁내지 않고 끝까지 밀 수 있느냐다.

그래서 내 체감에서는 상위 플랜의 가치가 평시 사용량보다 버스트에 있다. 평소엔 조용하다가도, 필요한 순간에 병렬화와 장기 작업을 감당할 수 있어야 실제 운영이 가능해진다. 반대로 이 여유가 없으면 실험 자체가 소극적으로 바뀐다. 한 번 세게 밀어 보고 결과를 비교해야 할 자리에서도, limits를 먼저 걱정하게 된다.

결국 내가 보는 건 사용량보다 실험 가능한 환경이다

이 얘기를 하고 나면 종종 결국 많이 쓸 수 있으면 좋은 거 아니냐로 다시 돌아간다. 물론 많이 쓸 수 있으면 좋다. 다만 내가 중요하게 보는 건 총량 자체가 아니라, 필요할 때 병렬 작업을 겁내지 않고 돌릴 수 있는지, 긴 작업을 중간에 끊지 않고 맡길 수 있는지, 설정 전후 비교 실험을 여러 번 반복할 수 있는지다.

이게 안 되면 usage 숫자가 조금 높아도 운영하는 느낌이 잘 안 난다. 반대로 이게 되면 평소 사용량이 아주 높지 않아도 플랜의 가치가 분명해진다. 그래서 요즘은 상위 플랜을 많이 쓰는 사람의 사치처럼 보기보다, 필요할 때 큰 실험을 가능하게 하는 인프라에 더 가깝게 본다.

AI 코딩 툴을 갈아타며 정리한 선택 기준

Thu, 19 Mar 2026 00:00:00 GMT

예전에는 AI 코딩 툴을 볼 때 누가 제일 똑똑한가를 먼저 봤다. 몇 달 동안 이것저것 갈아타고 나니, 지금은 보는 기준이 꽤 달라졌다. 이제 먼저 보는 건 얼마나 오래 맡길 수 있나, 환각을 어디까지 감당할 수 있나, 내 작업 방식과 자율성 수준이 맞나, 다른 모델과 조합이 되나 같은 것들이다. 결국 성능 그 자체보다 작업 리듬의 문제에 더 가깝다는 걸 알게 됐다.

이 글에서 말하는 경험 범위는 아래 정도다.

Antigravity: Gemini 3.0 Pro
Claude Code: Claude Opus 4.5, 4.6
Codex: GPT 5.1, 5.2, 5.3, 5.4

왜 계속 갈아탔는가

한 툴에 오래 정착하지 못해서 계속 옮겨 다닌 건 아니다. 작업이 바뀔수록 내가 중요하게 보는 기준이 더 선명해졌고, 그 기준에 맞춰 메인 툴이 바뀌었다.

처음에는 바이브 코딩 자체를 어디까지 믿을 수 있는지가 궁금했다. 그다음에는 단일 모델 하나보다 여러 모델을 역할별로 묶어 쓰는 편이 더 나은지 시험해 보고 싶었다. 지금은 일을 던져 놓고 한참 뒤에 결과를 받는 흐름이 실제로 돌아가는지, 그리고 그 결과를 내가 다시 통제하기 쉬운지가 제일 중요하다.

Antigravity와 Gemini 3.0 Pro: 바이브 코딩 입문과 한계

Antigravity는 내가 바이브 코딩을 제대로 해 본 첫 툴이었다. 블로그를 갈아엎을 때도 써 봤고, Unity 미니게임인 FlyingCat을 일주일 정도 만드는 동안에도 꽤 오래 붙잡고 있었다. 그 시기를 지나면서 처음으로 AI에게 구현을 제법 맡길 수 있겠다는 감각을 얻었다.

도움이 없었던 건 아니다. 초반 속도는 분명히 빨랐고, 내가 직접 코드를 다 치지 않아도 결과를 앞으로 밀 수 있다는 감각을 줬다. 다만 메인 코딩 툴로 쓰기에는 한계가 생각보다 빨리 보였다.

제일 괴로웠던 건 반복되는 컴파일 에러였다. 내가 초반이라 미숙했던 부분도 있었겠지만, Unity에서 작업을 굴리다 보면 비슷한 오류를 계속 되풀이해서 만나는 경우가 잦았다. Gemini 3.0 Pro의 환각도 코딩에서는 자주 발목을 잡았다. 그래서 Antigravity는 바이브 코딩의 가능성을 처음 체감하게 해 준 툴로는 의미가 있었지만, 메인 환경으로 오래 들고 가기는 어려웠다.

OpenCode/OMO: 모델 조합의 가능성

그다음 단계에서 기억에 남은 건 OpenCode 자체보다 Oh-My-OpenCode가 보여준 작업 방식이었다. 당시에는 Claude와 Gemini를 이미 쓰고 있었고, 여기에 GPT까지 더해 여러 모델을 역할별로 나눠 쓰는 흐름을 처음 제대로 굴려 봤다.

여기서 얻은 가장 큰 수확은 단일 모델의 성능보다 모델끼리 어떻게 역할을 나누는가가 더 중요하다는 점이었다. 하나는 전체 흐름을 잡고, 하나는 구조나 리팩토링 같은 깊은 일을 맡고, 하나는 빠르게 코드와 문서를 훑게 두는 식으로 배치하면 서로의 약점이 꽤 많이 상쇄됐다.

OMO를 쓰면서 harness도 많이 들여다보게 됐다. 오픈소스 프로젝트 코드를 읽으면서 구조를 익혔고, 내가 필요했던 기능을 직접 붙여 보기도 했다. PR을 올릴 때는 이미 같은 기능이 먼저 들어가 있어서 반영되지는 않았지만, 그 과정 덕분에 harness를 보는 기준은 이때 많이 잡혔다. 실제로 OMO의 서브에이전트 구성은 따로 추출해 두었고, 회사에서 Claude Code 에이전트 오케스트레이션을 만들 때 참고했다.

OpenCode를 지금도 계속 쓰는 건 아니지만, 툴을 고르는 기준은 이때 거의 정리됐다. 좋은 툴은 혼자 모든 걸 해결하는 툴이 아니라, 내가 원하는 방식으로 여러 모델을 엮을 수 있게 해 주는 툴이었다.

Codex와 GPT 5.4: 맡겨도 된다는 신뢰

2026년 2월 초부터는 개인 작업에서 Codex를 계속 보고 있었다. 다만 5.3 codex까지는 꽤 쓰기 힘들었다. 너무 로봇 같았고, 사람이 한 말을 자연스럽게 받아들이지 못하는 경우가 많았다. 이제는 일을 맡겨도 되겠다는 신뢰가 생긴 건 5.4부터였다.

내 작업 스타일은 원래 일을 잘게 쪼개고, 범위를 정해 던지고, 돌아온 결과를 다시 통제하는 쪽에 가깝다. 5.4부터는 Codex가 이 흐름과 잘 맞았다. 작업을 던져 놓으면 30분, 40분 정도 혼자 처리하다가 결과를 들고 돌아오는 식의 리듬이 자주 나왔다. 다른 사람들은 Codex가 일을 끝까지 안 간다고 느끼기도 하는데, 내 기준에서는 오히려 적당한 선에서 끊고 돌아오는 편이었다.

Claude Code가 같이 방향을 찾아간다는 느낌이라면, Codex는 방향을 알아서 잡아 간다는 느낌에 가깝다. 긴 문제를 붙잡고 구조를 정리하거나, 리팩토링처럼 한동안 손을 떼고 맡겨도 되는 작업에서는 특히 만족도가 높았다.

다만 너무 방어적으로 코드를 짜는 경향이 있어서, 필요 이상으로 가드 로직을 두껍게 깔거나 결과물이 지나치게 무거워질 때가 있다. 처리 자체는 잘하는데, 그 완벽주의가 오히려 코드를 지저분하게 만드는 순간이 있다.

왜 다시 Claude Code를 같이 쓰려 하는가

그렇다고 Codex 하나로 끝날 것 같지는 않다. 회사에서는 계속 Claude Code를 쓰고 있고, 집에서도 당분간은 Claude Code와 Codex를 같이 쓰는 구성이 제일 잘 맞을 것 같다.

이유는 Claude Code가 맡는 역할이 분명하기 때문이다. Claude는 Codex보다 harness가 강한 편이고, 문제의 전체 그림을 더 잘 본다. 결과물도 조금 더 사람 손을 탄 느낌이 있다. 문제를 어떻게 자를지, 어느 방향으로 갈지, 어디서 멈출지를 정하는 단계에서는 Claude가 중재자로 들어올 때 결과가 더 좋아지는 경우가 있다.

반대로 실제 일을 길게 맡겨 두고 처리시키는 쪽은 Codex가 더 잘 맞는다. 그래서 지금 기준으로는 Claude가 방향을 잡고 Codex가 밀어붙이는 조합이 가장 안정적이다.

지금의 선택 기준

결국 지금은 이런 것들을 먼저 보게 됐다.

환각 허용치: 반복 컴파일 에러나 명백한 사실 오류가 잦으면 메인 툴로는 오래 쓰기 어렵다.
사용량 여유: 긴 작업을 맡겨 둘 수 있는 여유가 있어야 실제 생산성이 나온다.
자율성 수준: 내가 핑퐁을 많이 하며 같이 끌고 갈 것인지, 범위를 정해 맡길 것인지와 툴의 성격이 맞아야 한다.
조합 가능성: 비슷해 보이는 모델도 실제로는 역할이 다르기 때문에, 여러 모델을 함께 써 보며 감을 잡는 과정이 중요하다.

결국 내 결론은 최고의 단일 모델을 찾는 것보다 모델마다 어디까지 맡길 수 있는지 직접 겪어 보고 역할을 나누는 것에 더 가깝다. Claude Code와 Codex도 겉으로 보면 비슷해 보이지만, 조금만 오래 써 보면 방향이 꽤 다르다. 그 차이는 리뷰 몇 개 읽는 것보다 직접 굴려 보는 편이 훨씬 빨리 보인다.

Consuming Your AI Plan vs. Operating It

Thu, 19 Mar 2026 00:00:00 GMT

Burned through the entire weekly GPT Pro quota running OpenClaw on Telegram for SLIME ARENA. More important work sat queued behind it. Couldn't touch it. First time I realized a plan's value isn't "can I use a lot" — it's "can I use it when I need to."

I used to judge AI plans by volume. Lots of usage per month, good plan. Hit limits fast, bad plan. Running coding agents daily changed that view. Less about average consumption, more about burst capacity.

Consuming vs. operating

Same top-tier plan, very different usage patterns.

Some people just use a lot. Results disappoint, rephrase, retry. Usage runs high, but they never track where it goes.

Others also run high, but for survival. Near the ceiling every day, work stops the moment headroom disappears. High usage — but not operating. Always scraping the limit.

The pattern that matters: manage daily usage, then burst hard when it counts. Parallel tasks, long runs, heavy reasoning, comparison experiments — all at once. The top plan isn't "unlimited usage." It's capacity to go big when needed.

Same plan. Some consume. Some operate.

Why burst matters more now

Heavy work doesn't run every day. It clusters at specific moments:

Handing off a long refactoring
Spinning up multiple subagents in parallel
Comparing LSP on vs. off
A/B testing harness configurations
Re-verifying the same problem with a different structure

These days, usage spikes hard. But the point isn't "I used a lot today." It's whether you can push a critical experiment to completion without flinching at limits.

So in practice, a top plan's value lives in burst capacity. Quiet most days, then able to absorb parallel work and long tasks when they arrive. Without that headroom, experiments get timid. At the moment you should push hard and compare results, you worry about limits instead.

What I'm really looking at

Circles back to "more is just better, right?" More is nice. What I care about is narrower: run parallel tasks without hesitation, leave a long job running uncut, repeat comparison experiments freely.

Without that, high numbers don't feel like operating. With it, even moderate daily usage makes the plan worth it. Top-tier plans aren't a luxury for heavy users. They're infrastructure for serious experiments when they count.

What I Actually Look for After Switching AI Coding Tools

Thu, 19 Mar 2026 00:00:00 GMT

I used to pick AI coding tools by asking which is smartest. Months of switching changed the criteria. Now I check how long I can leave a task running, how much hallucination I'll tolerate, whether the tool's autonomy fits my workflow, and whether it composes with other models. Less about raw capability, more about work rhythm.

Experience scope for this post:

Antigravity: Gemini 3.0 Pro
Claude Code: Claude Opus 4.5, 4.6
Codex: GPT 5.1, 5.2, 5.3, 5.4

Why I kept switching

Not because I couldn't settle. As the work changed, the criteria sharpened, and the main tool shifted with them.

First I wanted to know how far vibe coding could be trusted. Then I tested whether splitting roles across multiple models beat relying on one. Now the question is whether "throw a task, come back later" actually works — and whether I can control what comes back.

Antigravity and Gemini 3.0 Pro: vibe coding starts here

Antigravity was my first real vibe coding tool. Overhauled this blog with it, built a Unity mini-game called FlyingCat over a week. First time I felt AI could handle real implementation.

Early speed was real — results moved forward without me writing every line. But limits appeared fast. Repeated compile errors hurt most — partly inexperience, partly Gemini 3.0 Pro hallucinating the same failures over and over. Antigravity proved vibe coding works. It didn't last as a daily driver.

OpenCode/OMO: model composition clicks

What stuck from this phase wasn't OpenCode itself but the workflow OMO demonstrated. I was already using Claude and Gemini; adding GPT let me try splitting roles across models for real.

Biggest takeaway: how you divide roles between models matters more than any single model's capability. One handles the big picture, one takes deep structural work, one skims code and docs fast. Their weaknesses cancel out.

OMO pulled me into harness internals. Read the project's source, tried adding features. My PR duplicated work already merged, but the process shaped how I evaluate harnesses. Extracted OMO's subagent layout and referenced it when building agent orchestration at work.

Stopped using OpenCode daily, but my selection framework locked in here. A good tool doesn't solve everything alone — it lets you compose models your way.

Codex and GPT 5.4: trust to delegate

Watched Codex for personal work since early February 2026. Through 5.3 it was rough — too robotic, bad at natural instructions. Trust came with 5.4.

I slice work small, define scope, throw it over, control what comes back. From 5.4, Codex matched that rhythm. Throw a task, it runs alone for thirty to forty minutes, returns with results. Some say Codex doesn't finish things. For me it stops about right.

Claude Code feels like finding direction together. Codex feels like it picks direction on its own. Long refactors, structural cleanup, anything I can hand off and walk away from — Codex handles well.

One quirk: it codes defensively to a fault. Guard logic piles up thicker than needed, and results get heavier than they should. The perfectionism sometimes makes the code messier.

Why I'm running Claude Code alongside

Codex alone isn't enough. I use Claude Code at work and plan to keep running both at home.

Claude Code's harness is stronger. It sees the full shape of a problem better. Output feels more human-touched. When deciding how to split a problem, which direction to go, where to stop — Claude as mediator improves results.

For sustained execution, Codex fits better. So the current setup: Claude sets direction, Codex pushes through. Most stable combination I've found.

Current selection criteria

What I look at now:

Hallucination tolerance. Repeated compile errors or obvious factual mistakes make a tool unusable as a daily driver.
Usage headroom. Long tasks need room to run. Without it, real productivity doesn't happen.
Autonomy match. Do I want tight ping-pong collaboration, or scoped delegation? The tool's personality has to fit.
Composability. Models that look similar play different roles in practice. Trying combinations and finding the split matters.

Not "find the best model." Use each long enough to learn where it breaks, then assign roles. Claude Code and Codex look alike on paper. A week of use makes the gap obvious. Hands-on beats reviews.

프론티어 모델 구간에서는 하네스가 더 큰 차이를 만든다

Wed, 18 Mar 2026 00:00:00 GMT

예전에는 AI 코딩 성능을 거의 모델 문제로 봤다. 더 좋은 모델이 나오면 그쪽으로 옮기면 된다고 생각했다. 지금은 그 생각이 꽤 달라졌다. 적어도 Claude 4.5나 GPT-5.x급 코딩 모델을 쓰는 구간에서는, 실무에서 체감하는 차이를 더 크게 흔드는 건 모델보다 하네스인 경우가 많다고 본다.

여기서 말하는 하네스는 프롬프트 몇 줄이 아니다. 컨텍스트를 어떻게 넣고 줄일지, memory를 어떻게 유지할지, skill과 AGENTS를 어떻게 나눌지, subagent를 어디에 붙일지, LSP나 테스트 루프를 어떻게 연결할지까지 포함한 운영 층 전체를 말한다.

왜 이제는 모델보다 하네스를 먼저 보게 됐나

모델이 약하던 시기에는 결국 모델이 다 한다는 말이 더 맞았다. 기본 추론 능력이 부족하면 그 위에 뭘 얹어도 한계가 빨리 드러났기 때문이다.

지금은 조금 다르다. 상위권 모델 구간에 들어오면 웬만한 코딩 작업은 일단 된다. 그래서 실제 차이는 되냐 안 되냐보다 얼마나 안정적으로 반복되느냐, 얼마나 오래 맡길 수 있느냐, 검증 가능한 결과로 돌아오느냐 쪽에서 더 크게 난다.

내가 더 크게 본 건 어떤 문서를 먼저 읽히는가, 긴 컨텍스트를 어떻게 압축하는가, memory와 skill을 어떻게 나누는가, subagent로 무엇을 병렬화하는가, LSP나 테스트 같은 도구를 어디까지 연결하는가, 검증 루프를 얼마나 강하게 거는가 같은 것들이다.

같은 Claude나 같은 GPT를 써도 이 층이 달라지면 결과가 꽤 달라진다. 모델이 잠재력을 주는 건 맞지만, 그 잠재력을 실제 생산성으로 바꾸는 건 결국 하네스라고 느끼게 됐다.

하네스 엔지니어링은 프롬프트 작성보다 넓다

하네스를 프롬프트 꼼수 정도로 보면 금방 한계에 닿는다. 실제로는 시스템 설계에 더 가깝다.

실제로는 파일과 툴, 실행 환경, 상태를 연결하는 일도 들어가고, 어떤 인터페이스로 agent를 일하게 할지 정하는 일도 들어간다. 승인, 정책, 로깅, 비용 같은 운영 문제도 같이 붙고, 프로젝트 규칙을 agent가 읽을 수 있는 형태로 정리하는 일도 필요하다.

그래서 내가 보는 하네스 엔지니어링은 대답하는 모델을 일하는 시스템으로 바꾸는 작업에 가깝다. 좋은 모델 하나를 붙잡는 일보다, 좋은 모델이 반복 가능하게 일하도록 환경을 설계하는 일이 더 중요해졌다.

지금 사라질 것과 남을 것을 나눠서 봐야 한다

모든 하네스가 오래 남는다고 보지는 않는다. 모델이나 제품이 덜 성숙해서 임시로 붙여 둔 보정재는 시간이 지나면 제품 안으로 흡수될 수 있다. 특정 약점을 메우는 프롬프트 트릭이나 brittle한 workaround, 지금 형태의 외부 메모리 도구 일부는 그런 쪽에 가깝다.

반대로 구조와 운영 원리에 가까운 요소는 쉽게 안 사라진다. 역할 분리와 subagent 오케스트레이션, skill과 워크플로 모듈화, 정책과 권한, 검증 루프, 상태 관리와 메모리 운영 규칙, 관측성과 로그, 의미 기반 코드 인덱싱까지. 이런 것들은 특정 모델이나 제품에 묶이지 않는다.

LSP도 비슷하다. 지금처럼 수동으로 붙이는 방식은 바뀔 수 있다. 그래도 agent에게 의미 기반 코드 인텔리전스를 연결한다는 문제 자체가 사라지지는 않을 것 같다.

사용자보다 운영자가 더 오래 남는다고 보는 이유

좋은 사용자는 점점 더 많아질 가능성이 크다. 모델이 좋아지고 제품 기본 UX가 좋아지면, 단순 활용 능력은 점점 기본기에 가까워질 수 있다.

그다음 차이를 만드는 건 운영자 쪽이라고 본다. 여기서 말하는 운영자는 AI를 자주 쓰는 사람이 아니다. AI가 잘 일하게 만드는 사람에 더 가깝다. 작업을 어떻게 자를지, 어떤 문서를 먼저 읽힐지, 어디서 멈추게 할지, 무엇을 측정할지, 어떤 결과를 통과로 볼지까지 설계하는 사람이다.

이 차이는 계측에서 더 잘 보인다. 토큰 사용량, 캐시 리드, LSP 전후 차이, 검증 통과율 같은 걸 보지 않으면 운영이라기보다 체감담에 가깝다. 적어도 내 기준에서는 많이 쓰는 것보다 측정하면서 고치는 것이 더 중요해졌다.

그래서 지금 더 중요하게 보는 것

지금 내 기준을 짧게 줄이면, 하이엔드 모델의 기본 능력선은 이미 높고 실무에서 차이를 만드는 건 그 위에 얹힌 하네스다. 하네스는 꼼수가 아니라 시스템이고, 계측이 없으면 운영이라고 부르기 어렵다. 장기적으로 희소한 역량은 좋은 사용자보다 좋은 운영자 쪽에 남을 가능성이 크다고 본다.

모델이 중요하지 않다는 뜻은 아니다. 모델은 여전히 바닥 성능과 잠재력을 정한다. 다만 지금 내가 보는 구간에서는, 실제 업무 성과를 갈라놓는 차이가 모델 교체보다 하네스와 운영 방식에서 더 자주 나온다. 그래서 요즘은 무슨 모델을 쓰는가만큼이나 어떤 환경 위에서 어떻게 굴리는가를 먼저 보게 됐다.

In the Frontier Model Range, the Harness Makes a Bigger Difference

Wed, 18 Mar 2026 00:00:00 GMT

AI coding performance used to be a model problem for me. Better model, switch. That changed. In the Claude 4.5 / GPT-5.x tier, the harness drives more practical difference than the model.

Harness here isn't a few prompt lines. It's the full operational layer: how context enters and gets trimmed, how memory persists, how skills and AGENTS files split work, where subagents attach, how LSP and test loops connect.

Why I now look at the harness first

When models were weaker, "the model does everything" rang true. Weak base reasoning meant nothing on top lasted.

Different now. Frontier models handle most coding tasks. The gap isn't "can it do it" but "how reliably does it repeat," "how long can I leave it," "does it return verifiable results."

What I watch: which documents load first, how context compresses, how memory and skills split, what subagents parallelize, how far LSP and tests reach, how tight the verification loop runs.

Same Claude, same GPT — change the harness and results shift. The model sets potential. The harness converts it into output.

Harness engineering is broader than prompt writing

Treat it as prompt tricks and you hit a ceiling fast. It's closer to system design.

Wiring files, tools, and execution environments. Choosing what interface an agent works through. Attaching operational concerns — approvals, policies, logging, cost tracking. Formatting project rules so agents can read them.

Harness engineering turns "a model that answers" into "a system that works." Picking a good model matters less than designing the environment where it produces repeatable output.

Separating what will disappear from what will stay

Not every harness element lasts. Patches for immature models or products — prompt tricks covering specific weaknesses, brittle workarounds, certain external memory tools — will get absorbed into products over time.

Elements closer to structure and operational principles stick around. Role separation and subagent orchestration. Skill and workflow modularity. Policies and permissions. Verification loops. State management and memory governance. Observability and logging. Semantic code indexing. None of these tie to a specific model or product.

LSP is similar. The current manual integration method may change. The underlying problem — connecting semantic code intelligence to agents — won't disappear.

Why operators outlast users

Good users will multiply. Better models and better product UX turn basic utilization into table stakes.

The next differentiator is operators. Not people who use AI often — people who make AI work well. They design task splits, document loading order, stopping points, metrics, pass/fail criteria.

The difference shows clearest in instrumentation. Token usage, cache reads, LSP on/off deltas, verification pass rates — without these, it's anecdote. "Using a lot" matters less than measuring and adjusting.

What I prioritize now

Short version: frontier models set a high floor. The harness drives practical difference. It's a system, not a trick — and without instrumentation, don't call it operation. Long-term, scarce capability sits with operators, not users.

Models matter. They set floor and ceiling. But in this range, outcomes diverge more from harness and operational approach than from model swaps. I now ask "what environment is it running in" as much as "what model is it running."

Claude Code의 LSP는 토큰을 줄여줄까

Mon, 16 Mar 2026 00:00:00 GMT

처음엔 그냥 이렇게 생각했다. LSP가 붙으면 grep를 덜 돌게 되고, 코드 탐색이 더 정확해질 테니 토큰도 줄어들겠지. Claude Code에서 LSP 이야기가 자꾸 나오는 걸 보면서도 결국 기대한 건 그거였다.

문서와 changelog, 사용자 이슈를 조금만 봐도 왜 다들 LSP를 붙이려고 하는지는 금방 납득된다. 정의 위치를 바로 찾고, 참조를 따라가고, 타입 정보를 보고, diagnostics를 빠르게 받는 건 누구나 좋아 보일 수밖에 없다. 잘만 붙으면 텍스트 검색 위주의 탐색보다 훨씬 나을 것 같았다.

그런데 직접 비교해 보니 결과는 내가 처음 기대한 것과 꽤 달랐다. 적어도 내가 본 대규모 프로젝트와 Unity 코드베이스에서는, LSP = 더 빠르고 더 싸다는 식으로 간단히 정리되지 않았다.

처음 세운 가설

내 가설은 이랬다.

LSP가 붙으면 의미 기반 탐색이 가능해진다.
그러면 grep를 여러 번 돌리고 파일을 직접 읽는 횟수가 줄어든다.
결국 시간도 줄고, 토큰도 줄고, 보고서 정확도도 올라갈 것이다.

앞의 두 줄은 지금도 어느 정도 맞다고 생각한다. 문제는 마지막 줄이었다. 실제로는 정확도, 시간, 토큰이 한 방향으로 같이 움직이지 않았다.

어떻게 봤나

비교 대상은 아주 단순했다. 특정 함수를 중심으로 로직을 파악하고, 그 함수의 호출부를 추적해 보고서를 만드는 태스크를 여러 번 돌렸다. 이걸 서브에이전트를 병렬로 소환하는 방식으로 굴리면서, 각 실행의 보고서 정확도, 시간 소모량, 토큰 소모량을 같이 봤다.

여기서 제일 중요했던 건 함수의 caller 수였다. 결국 이 함수가 얼마나 많은 곳에서 불리는가가 결과를 크게 갈랐다.

호출부가 적을 때는 grep가 훨씬 낫다

호출부가 10개 미만인 태스크에서는 grep가 압도적으로 빨랐다. 실수도 거의 없었다. 이런 경우에는 굳이 LSP를 끌어오는 쪽이 오히려 돌아가는 길처럼 느껴졌다.

이유도 자연스럽다. 참조 수가 적으면 텍스트 기반 탐색만으로도 맥락이 금방 닫힌다. LSP 질의를 준비하고, 응답을 받고, 그 결과를 다시 모델이 해석하는 비용을 생각하면 그냥 grep가 더 짧게 끝난다.

여기서는 내가 처음 세운 가설이 오히려 틀렸다. 더 좋은 도구가 더 싼 도구가 되는 건 아니었다.

호출부가 많아질수록 grep는 갑자기 흔들린다

문제는 caller 수가 늘어날 때부터였다. 호출부가 10개를 넘기기 시작하면 grep의 정확도가 눈에 띄게 떨어졌다. 다른 클래스에 있는 동명의 함수를 잘못 잡거나, 텍스트상 비슷해 보이는 결과를 섞어 읽는 일이 생겼다.

이때부터 grep 보고서는 빠르긴 한데 점점 불안해진다. 특히 호출부가 90개 안팎까지 늘어난 태스크에서는 편차가 정말 컸다. 어떤 실행은 그럴듯하게 맞아 보이고, 어떤 실행은 거의 랜덤에 가까웠다. 겉으로 보기에는 보고서가 성립하는데, 실제 caller 집합은 틀린 경우도 있었다.

이 구간에서는 속도가 장점이 아니라 함정처럼 느껴졌다. 빨리 나오는데, 믿고 다음 판단으로 넘어가기가 어렵다.

LSP는 느리고 비싸지만, 정확도는 안정적이었다

LSP 쪽은 일관되게 느렸다. 그리고 grep보다 토큰도 더 먹었다. 적어도 내가 본 실험에서는 이 차이가 꽤 분명했다.

대신 정확도는 안정적이었다. 호출부 파악이라는 기준으로 보면, 내가 돌린 태스크들에서는 LSP 쪽에 오차가 없었다. caller 수가 늘어나도 결과가 무너지지 않았고, 동명의 함수 때문에 보고서가 비틀리는 문제도 없었다.

그래서 체감이 묘했다. 처음에는 LSP를 쓰면 효율이 좋아질 것이라고 생각했는데, 실제로는 LSP를 쓰면 결과 품질을 덜 의심하게 된다는 쪽이 더 맞았다. 토큰 절약 기능이라기보다 정확도 유지 기능에 가까웠다.

왜 이런 결과가 나왔을까

내가 내린 해석은 이 정도다.

참조 수가 적은 태스크는 원래 단순해서 grep로도 충분하다.
참조 수가 많아질수록 텍스트 기반 탐색은 이름 충돌과 맥락 오판에 약해진다.
LSP는 느리고 질의 비용도 들지만, 심볼과 참조 관계를 안정적으로 잡아 준다.

지금 내 느낌에 LSP의 장점은 항상 더 싸다가 아니라 복잡해질수록 덜 틀린다에 가깝다.

이건 생각보다 큰 차이다. 에이전트 코딩에서 중요한 게 항상 속도만은 아니기 때문이다. 빠른데 틀린 보고서가 계속 나오면, 그다음 단계에서 검증하고 되돌리는 비용이 더 커진다. 그 비용까지 생각하면 LSP의 느림이 완전히 손해라고만 보기도 어렵다.

그래서 어떻게 쓰고 있나

지금 내 결론은 꽤 단순하다.

caller가 적고 맥락이 단순하면 grep가 낫다.
caller가 많고 참조 그래프가 커지면 LSP가 돈값을 한다.
다만 2026년 3월 기준으로, LSP가 시간과 토큰까지 같이 줄여 준다고 기대하진 않는다.

그래서 지금은 LSP를 기본값으로 쓰지 않는다. 토큰을 줄이기 위한 만능 해법으로도 안 본다. grep의 정확도가 무너지는 구간에서 정확도를 지키기 위한 도구로 쓰고 있고, 내 기준에서는 이쪽이 훨씬 실제에 가까웠다.

Does Claude Code's LSP Actually Save Tokens?

Mon, 16 Mar 2026 00:00:00 GMT

Simple assumption: LSP means fewer grep runs, more accurate navigation, fewer tokens. That's what I expected watching Claude Code discussions.

Docs and changelogs make the case obvious. Jump to definition, follow references, inspect types, fast diagnostics — better than text search. Should be cheaper too.

Direct comparison said otherwise. On the large projects and Unity codebases I tested, "LSP = faster and cheaper" didn't hold.

Starting hypothesis

My hypothesis:

LSP enables semantic navigation.
That reduces grep iterations and raw file reads.
Time drops, tokens drop, report accuracy rises.

The first two still hold. The third didn't. In practice, accuracy, time, and tokens didn't move in the same direction.

How I tested

Simple setup. Trace a function's logic, map its callers into a report. Run multiple times with parallel subagents. Compare accuracy, time, and token cost.

The variable that split results: caller count. How many places call this function determined everything.

Few callers: grep wins easily

Under 10 callers, grep won by a wide margin. Almost no errors. LSP felt like the long way around.

Few references and text search closes context fast. The overhead of preparing an LSP query, waiting, having the model interpret results — more expensive than grep finishing outright.

Hypothesis wrong here. Better tool didn't mean cheaper tool.

Many callers: grep falls apart suddenly

Past 10 callers, grep accuracy dropped visibly. It grabbed same-named functions from other classes, mixed textually similar results. Reports came back fast but unreliable. At ~90 callers, variance was extreme — some runs looked plausible, others were near-random. Reports that appeared valid contained wrong caller sets.

Speed stopped being an advantage. Fast but untrustworthy.

LSP: slower, more expensive, but stable

LSP was consistently slower. Used more tokens too — clearly so in my experiments.

But accuracy held. For caller identification, LSP runs showed no errors in my tasks. Rising caller count didn't break results. Same-name confusion didn't happen.

Odd experience. Expected "LSP = more efficient." Got "LSP = less reason to doubt the result." Not token-saving. Accuracy-preserving.

Why this happens

My interpretation:

Low-reference tasks are simple enough that grep suffices.
As references grow, text search gets vulnerable to name collisions and context misreads.
LSP is slower and costs more per query, but locks symbol-reference relationships reliably.

LSP's advantage isn't "always cheaper." It's "less wrong as complexity rises."

That difference matters. Agent coding isn't always about speed. If fast-but-wrong reports keep appearing, verification and rollback costs at the next step grow. Factor those in and LSP's slowness isn't pure loss.

How I use it now

My conclusion is straightforward:

Few callers, simple context — grep wins.
Many callers, large reference graph — LSP earns its cost.
As of March 2026, I don't expect LSP to save time or tokens.

I don't use LSP as a default. I don't treat it as a universal token optimizer. I use it where grep's accuracy collapses — to preserve accuracy. That framing matched reality far better.

텔레그램으로 게임 개발 굴리기: OpenClaw로 SLIME ARENA 운영한 기록

Sun, 15 Mar 2026 00:00:00 GMT

OpenClaw를 처음 붙였을 때 궁금했던 건 단순했다. 책상 앞에서 잠깐 명령을 던지는 도구가 아니라, 자리를 비운 뒤에도 계속 일하는 개발자처럼 굴릴 수 있을까? 그래서 이 실험을 위해 저사양 미니 PC를 하나 샀고, 권한도 전부 열어 둔 채 텔레그램만으로 지시를 내리는 루프를 만들었다.

처음에는 일정 관리나 잡무부터 맡겨 봤다. 그런데 조금 굴려 보니, 정말 보고 싶은 건 이런 쪽이 아니었다. 내가 궁금했던 건 밖에서도 계속 개발이 굴러가는가, 더 정확히는 Telegram + OpenClaw + cron + GitHub issues/milestones 조합으로 실제 프로젝트를 운영할 수 있는가였다.

내가 원한 건 원격 데스크톱으로 IDE를 여는 방식이 아니라, 최소한의 인터페이스로 다음 일을 밀어 넣고 결과를 받는 흐름이었다.

가장 제대로 굴린 사례는 Godot 기반 2D 액션 게임 SLIME ARENA였다.

SLIME ARENA에서 어디까지 갔나

SLIME ARENA는 시작할 때 근거리와 원거리 두 타입 중 하나를 고르고, 웨이브를 버티며 레벨업할 때마다 스킬 트리를 찍는 구조다. 웨이브마다 몬스터를 다 잡으면 다음 단계로 넘어가고, 보스도 있다. 느낌으로는 2D 액션 디펜스와 뱀서라이크의 중간쯤에 있다.

지금은 리소스를 입히던 단계에서 멈춰 있다. 그래도 전투 루프 자체는 돌아간다. 난이도와 밸런스는 한동안 계속 만졌는데, 그 과정에서 오히려 일부가 틀어지기도 했다. 이 부분은 결국 사람이 다시 잡아야 하는 영역이라고 느꼈다.

처음에는 마크다운, 나중에는 GitHub

처음에는 2시간마다 마크다운 파일을 읽고 다음 작업을 진행하도록 시켰다. 그런데 이 방식은 내가 중간 상태를 파악하기가 너무 어려웠다. 어디까지 됐는지, 무엇이 막혔는지, 다음에 뭘 하려는지가 한눈에 안 들어왔다.

그래서 운영 방식을 바꿨다. OpenClaw용 GitHub 계정을 따로 만들고, 저장소에 개발자로 초대했다. 그다음부터는 milestone과 issues를 중심으로 루프를 굴렸다. OpenClaw가 먼저 마일스톤 목표를 세우고, 그 목표에 필요한 이슈를 만들고, 작업이 끝나면 닫고, 내가 테스트해 보다가 부족하면 다시 열어 내용을 보강시키는 식이다.

중요한 건 이 과정을 내가 IDE에서 직접 정리한 게 아니라 거의 전부 텔레그램으로 돌렸다는 점이다. 구현 지시, 수정 요청, 재오픈, 우선순위 변경까지 전부 텔레그램에서 했다. GitHub는 개발을 시키는 곳이라기보다 상태를 추적하고 검증 기준을 남기는 보드에 가까웠다.

실제로 이슈 본문은 그냥 할 일 메모가 아니었다. summary, scope, acceptance criteria를 먼저 적게 했고, 작업이 끝난 뒤에는 어떤 커밋에서 처리했는지, 어떤 검증 명령을 돌렸는지, 재배포는 했는지까지 댓글로 남기게 했다. 마일스톤도 한두 개로 뭉뚱그리지 않고 Phase 단위로 계속 쌓였다. 내가 보고 싶었던 건 "무언가 진행 중"이라는 말이 아니라, 지금 어디까지 왔고 무엇이 검증됐는지였기 때문이다.

실제 운영 루프는 이랬다

실제 운영 방식은 대략 아래에 가까웠다.

나는 텔레그램에서 방향과 우선순위만 정했다.
OpenClaw는 milestone을 만들고, 필요한 일을 issues로 잘게 쪼갰다.
issue에는 summary, scope, acceptance criteria, validation 같은 항목을 남기게 했다.
cron이 2시간마다 돌면서 다음 작업을 계속 진행했다.
완료된 이슈는 닫고, 내가 플레이하다가 부족하다고 느끼면 다시 열어 요구사항을 더 구체적으로 적게 했다.

여기서 중요한 건 시켰다는 표현이다. 이 루프에서 나는 거의 디렉터와 QA 역할만 맡았고, 실제 구현은 OpenClaw가 끌고 갔다. 텔레그램 보고도 단순한 성공/실패 수준이 아니었다. 현재 phase, 이번 패스에서 처리한 issue 번호, 막힌 점, 검증 명령, 재배포 여부까지 요약해서 올리게 했다. 내가 확인해야 할 건 코드가 아니라 상태였다.

어디서나 이어진다는 건 진짜였다

가장 좋았던 점은 당연히 어디서나 이어진다는 것이다. 책상 앞에 없어도 개발이 멈추지 않는다. 텔레그램으로 방향만 남겨 두면, 다음 체크포인트가 돌 때 작업이 계속 진행된다.

이 방식은 나를 구현자보다 디렉터와 QA 역할로 밀어 넣었다. SLIME ARENA에서는 실제로 코드 한 줄도 보지 않았다. 나는 Godot를 모르고, 대신 플레이하면서 소감만 전했다. 그런데도 전투 루프, 웨이브, 보스, 레벨업 선택지까지 게임의 뼈대는 꽤 멀리 갔다.

또 하나 좋았던 점은 milestone과 issue를 쓰면서 판단 단위를 강제로 작게 만들 수 있었다는 점이다. 그냥 게임 만들어라고 두는 것보다, 지금 단계에서 무엇을 검증해야 하는지가 훨씬 분명해졌다. 실제로 전투 루프, 웨이브 진행, 보스전, 레벨업 선택지, 밸런스 조정 같은 작업을 각각 따로 굴릴 수 있었다. 이건 나중에 다시 보더라도 히스토리가 남는다는 점에서 단순 채팅 로그보다 훨씬 좋았다.

그래도 문제는 있었다

반대로 단점도 분명했다. 첫 번째는, 어디서나 개발할 수 있다는 점이 그대로 단점이 된다는 것이다. 운동하다 쉬는 시간에도 계속 들여다보게 된다. 열려 있다는 건 편하지만, 동시에 계속 신경 쓰게 만든다.

두 번째는 cron 루프를 안정적으로 돌리는 게 생각보다 쉽지 않았다는 점이다. 프롬프트를 잘못 짜면 다음 단계로 안 가고 진행 상황 보고만 계속하는 경우가 있었다. 결국 무엇을 끝난 것으로 볼지, 다음 단계는 어떻게 고를지, 막혔을 때는 무엇을 fallback으로 삼을지를 꽤 구체적으로 적어 줘야 했다.

세 번째는 검토 부채다. 코드 자체를 안 보고 일이 진행되면 나중에 확인해야 하는 것이 한꺼번에 몰려온다. SLIME ARENA는 내가 코드 한 줄 안 봤기 때문에 더 그랬다. AI를 완전히 믿을 수 있다면 큰 문제는 아닐 수도 있지만, 지금 단계에서는 마지막에 사람이 떠안는 검토량이 꽤 크다.

마지막으로 비용과 사용량 문제도 있다. 나는 이 실험을 하려고 권한을 전부 열어 둔 미니 PC까지 샀다. 그런데 이렇게 굴리면 사용량을 정말 빠르게 먹는다. 실제로 GPT Pro 주간 사용량까지 다 채워 버려서, 원래 더 중요하게 해야 하는 작업을 못 하게 됐다. 그래서 2026년 3월 15일 기준으로는 일단 멈춘 상태다.

다음 실험은 왜 WebGL voxel game인가

SLIME ARENA 이후에는 three.js 기반 WebGL voxel game도 조금 진행했다. 흔히 떠올리는 마인크래프트류 구조를 따라가되, 나는 엔진 쪽과 게임 쪽을 최대한 분리해 두는 걸 중요하게 봤다. 엔진 부분은 다음 프로젝트에도 다시 쓸 수 있기 때문이다.

요즘은 AI 쪽 흐름이 전체적으로 웹 친화적이라고 느낀다. 그래서 앞으로 Web 기반 게임과 Web 게임 엔진 쪽이 더 많이 커질 거라고 보고 있다. AAA 게임이 몰락한다는 뜻은 아니고, 위아래로 양극화가 더 심해질 거라는 쪽에 가깝다. 이 생각은 아직 가설에 가깝지만, 다음 실험을 WebGL로 잡은 이유는 분명히 거기 있었다.

장난감은 아니었다

OpenClaw를 텔레그램에 붙여 굴려 보니, 이건 밖에서도 잠깐 명령 내리는 보조 도구라기보다 원격 개발자 하나를 운영하는 방식에 더 가까웠다. 제대로 굴리려면 milestone, issue, cron, 검증 기준까지 같이 설계해야 한다. 그걸 잡아 두면 생각보다 멀리 간다.

다만 그만큼 관리 포인트도 분명하다. 사용량, 검토 부채, 끊임없이 신경 쓰게 되는 운영 피로까지 같이 감안해야 한다. 지금 내 결론은 단순하다. OpenClaw는 재미있는 장난감이 아니었다. 대신 제어 없이 돌리면 금방 감당 안 되는 쪽으로 커지는 도구였다.

Running Game Dev from Telegram: Operating SLIME ARENA with OpenClaw

Sun, 15 Mar 2026 00:00:00 GMT

First question was simple. Not quick commands at my desk — could OpenClaw keep working after I leave? Bought a low-spec mini PC, opened all permissions, built a loop where instructions come through Telegram only.

Started with scheduling and errands. Quickly realized that wasn't the point. Real question: does development keep rolling when I'm away? Can Telegram + OpenClaw + cron + GitHub issues/milestones operate a real project?

No remote desktop. Minimal interface — push the next task, get results back.

The best-run case was SLIME ARENA, a Godot-based 2D action game.

How far SLIME ARENA went

SLIME ARENA: pick melee or ranged at the start, survive waves, choose skills on each level-up. Clear all monsters per wave to advance. Bosses included. Somewhere between 2D action defense and Vampire Survivors.

Currently paused at the asset integration stage. The combat loop runs. Difficulty and balance got tweaked repeatedly — and some of it broke in the process. That part needs a human hand.

Started with markdown, moved to GitHub

Initially, a markdown file checked every two hours for the next task. Problem: I couldn't see intermediate state. Where things stood, what was stuck, what was planned next — none of it was visible at a glance.

Changed the approach. Created a dedicated GitHub account for OpenClaw, invited it as a developer on the repo. From there, milestones and issues drove the loop. OpenClaw sets milestone goals, creates issues for needed work, closes them on completion. I playtest, reopen issues with more specific requirements if something falls short.

Key point: almost all of this ran through Telegram. Implementation orders, revision requests, reopens, priority changes — all via Telegram. GitHub served less as a place to direct development and more as a board for tracking state and recording acceptance criteria.

Issue bodies weren't just to-do notes. Each had summary, scope, and acceptance criteria upfront. After completion, comments recorded which commits handled it, what validation ran, whether redeployment happened. Milestones stacked by phase rather than clumping into one or two. What I needed wasn't "something is in progress" but "what's verified and what isn't."

The actual operating loop

Roughly:

I set direction and priorities from Telegram.
OpenClaw created milestones and split work into issues.
Issues carried summary, scope, acceptance criteria, validation.
Cron ran every two hours, pushing to the next task.
Completed issues closed. When I playtested and found gaps, I reopened with more specific requirements.

I acted almost entirely as director and QA. OpenClaw drove implementation. Telegram reports weren't just pass/fail — current phase, issues handled this pass, blockers, validation commands, redeployment status. What I checked wasn't code. It was state.

Continuity from anywhere was real

Best part: development didn't stop when I left my desk. Leave a direction in Telegram, and work continues at the next checkpoint.

This pushed me into a director-and-QA role. In SLIME ARENA I never looked at code. I don't know Godot. I playtested and shared impressions. Yet the combat loop, waves, bosses, and level-up choices — the game's skeleton — went surprisingly far.

Milestones and issues also forced small judgment units. Instead of "make a game," the question was always what to verify at this stage. Combat loop, wave progression, boss fights, level-up choices, balance tuning — each ran as a separate workstream. Unlike chat logs, the history stays navigable later.

Problems were real too

Downsides were clear. First, anywhere-access cuts both ways. I found myself checking during gym breaks. Open means convenient; open also means constant attention.

Second, keeping the cron loop stable was harder than expected. A bad prompt could leave it reporting status instead of advancing. I had to be specific about what counts as done, how to pick the next step, what to fall back to when stuck.

Third, review debt. When code advances without me reading it, deferred review piles up. SLIME ARENA was extreme — I read zero lines. If you could fully trust the AI, maybe that's fine. At this stage, the review load that lands on the human at the end is substantial.

Finally, cost and usage. I bought a dedicated mini PC with full permissions for this experiment. Running it this way burns through usage fast. I hit the full GPT Pro weekly quota and couldn't touch work that mattered more. As of March 15, 2026, the experiment is paused.

Why the next experiment is a WebGL voxel game

After SLIME ARENA, I started a three.js WebGL voxel game. Following the Minecraft-style structure, but keeping engine and game layers sharply separated — the engine part should be reusable for the next project.

AI tooling trends feel web-native lately. I expect web-based games and web game engines to grow. Not that AAA will collapse — more like polarization intensifies at both ends. Still a hypothesis, but that's why I chose WebGL next.

It wasn't a toy

OpenClaw through Telegram was closer to operating a remote developer than a quick off-desk tool. Proper operation means designing milestones, issues, cron rules, and acceptance criteria together. Get that right and it goes far.

Management overhead is real. Usage, review debt, constant operational attention — all part of the package. OpenClaw wasn't a toy. Without governance, it quickly outgrows what you can handle.

Epoch: Unseen Devlog 7 - Lua 전환과 클래스 트리 정리

Sat, 07 Mar 2026 00:00:00 GMT

이번엔 기존 Graph 자산 구조를 접고 텍스트 기반 데이터로 되돌리는 큰 방향 전환이 있었다. 여기에 클래스 트리, 타이틀, 정비 UI 정리까지 한 번에 몰아서 했다.

데이터 구조 변경

Epoch: Unseen Devlog 3에서는 전투 공식을 Graph 형식 데이터로, Epoch: Unseen Devlog 4에서는 전투 시나리오 스크립트를 Graph 형식으로 가기로 했었다. 그런데 Graph Asset은 AI 친화적인 포맷이 아니었다. AI가 직접 편집하기가 사실상 불가능하고, 결국 사람이 손으로 만져야 했다.

돌아보면 잘못된 결정이었고, 시행착오였다.

전투 공식은 다시 코드 기반으로, 시나리오 스크립트는 Lua 기반으로 되돌렸다.

Lua 스크립트로 작성한 스토리 스크립트(R 씬)

내가 원하는 구조는 결국 텍스트 기반이라 AI와 사람이 같이 관리하기 쉽고, 패치도 편한 형태였다. Lua가 지금은 그 목적에 가장 잘 맞는다.

클래스 트리 결정

검병 ├─ 결투가 │ ├─ 검무사 │ └─ 처형자 └─ 검사 ├─ 검성 └─ 칼바람	창병 ├─ 파쇄병 │ ├─ 파괴자 │ ├─ 파갑병 └─ 창술사 ├─ 용기병 └─ 창신	방패병 ├─ 방벽병 │ ├─ 요새병 │ └─ 수호자 └─ 철갑병 ├─ 거순병 └─ 방진병
궁병 ├─ 저격수 │ ├─ 추적자 │ └─ 매의눈 └─ 궁사 ├─ 사수 └─ 궁신	마법사 ├─ 원소술사 │ ├─ 마녀 │ └─ 현자 └─ 주술사 ├─ 사술사 ├─ 각인술사 └─ 비전술사	사제 ├─ 심문관 │ ├─ 집행관 │ └─ 참회자 └─ 구원자 ├─ 성녀 └─ 순교자

특수 병종을 빼고 기본 클래스 트리는 6종으로 정했다.

유닛 스프라이트도 SPUM 안에서 내 취향에 조금 더 가까운 쪽으로 교체했다. 아직은 잡병 모션만 임시로 뽑아 둔 상태라, 진영색 표현은 추가 작업이 더 필요하다. 당분간은 이 상태로 쓸 생각이다.

문제는 말을 탄 일반 기병 모습을 넣기 어렵다는 점이다. 그래서 독립된 기병 계열은 이번 클래스 트리에서 일단 제외했다. 그래도 시간이 나면 한번은 시도해 볼 생각이다.

타이틀, 정비 페이즈 작업

슬슬 게임다운 구색은 갖춰 가고 있다. 아직 한 사이클 전투 플레이까지는 못 갔지만.

UI는 정말 어렵다. 일단은 적당한 색 팔레트를 골라 먹이고 있는데, 아직 좀 허술해 보인다.

Gemini 흔적이 너무 노골적으로 남는다. 나중에 마크 제거 툴로 한번 정리해야겠다.

임시로 만든 스토리 씬이다. 원래는 사각형 그리드가 보이면 안 된다.

출진 전 정비 씬이다. 이걸 필드 탐색형으로 갈지, 메뉴 중심으로 갈지는 아직 고민 중이다.

이 UI가 제일 골칫거리였는데, 어느 정도 깔끔하게 정리되고 나니 이제는 좀 만족스럽다.

알파1 빌드 분량 결정

챕터 1 스토리는 일단 러프로 잡아 뒀다. 러프하다는 건 앞으로 개발하면서 계속 퇴고하고 바뀔 수 있다는 뜻이다. 결국 한 번에 잘 만드는 것보다 조금씩 진도를 빼면서 다듬는 쪽이 결과가 좋다.

알파1 빌드는 챕터 1 중반부까지로 잡았다.

그럼 알파1 빌드가 나오면 뭘 하느냐?

아무 것도 없다. 그냥 나 혼자 정해 둔 마감선이자 목표다.

Epoch: Unseen Devlog 7 - Lua Migration and Class Tree

Sat, 07 Mar 2026 00:00:00 GMT

Major direction change: dropped Graph asset structure, reverted to text-based data. Also finalized the class tree, title screen, and maintenance UI in one push.

Data Structure Change

Devlog 3 chose Graph format for battle formulas. Devlog 4 chose Graph for scenario scripts. Turns out Graph Assets aren't AI-friendly. AI can't edit them directly — humans have to do it by hand.

In hindsight, a wrong call. Trial and error.

Battle formulas went back to code. Scenario scripts switched to Lua.

Story script (R scene) written in Lua.

What I wanted was text-based: easy for both AI and humans to manage, easy to patch. Lua fits that purpose best right now.

Class Tree

Swordsman ├─ Duelist │ ├─ Blade Dancer │ └─ Executioner └─ Saber ├─ Sword Saint └─ Gale Blade	Lancer ├─ Crusher │ ├─ Destroyer │ ├─ Armor Breaker └─ Spearmaster ├─ Dragoon └─ Spear Saint	Shieldbearer ├─ Bulwark │ ├─ Fortress │ └─ Guardian └─ Ironclad ├─ Pavise └─ Phalanx
Archer ├─ Sniper │ ├─ Tracker │ └─ Hawk Eye └─ Bowman ├─ Marksman └─ Bow Saint	Mage ├─ Elementalist │ ├─ Witch │ └─ Sage └─ Hexer ├─ Necromancer ├─ Inscriber └─ Arcanist	Cleric ├─ Inquisitor │ ├─ Enforcer │ └─ Penitent └─ Savior ├─ Saint └─ Martyr

Six base class trees, excluding special classes.

Swapped unit sprites within SPUM to something closer to my taste. Only placeholder mob animations for now — faction colors need more work. Using this as-is for a while.

Problem: mounted cavalry is hard to represent. Dropped the independent cavalry branch from this tree. Might attempt it later if time allows.

Title and Maintenance Phase

Starting to look like a game. Haven't reached a full combat playthrough cycle yet.

UI is genuinely hard. Picked a color palette and applied it, but it still looks rough.

Gemini artifacts are too visible. Need to clean those up with a watermark removal tool later.

Placeholder story scene. The square grid shouldn't be visible.

Pre-sortie maintenance scene. Still deciding between field exploration and menu-driven approach.

This UI was the biggest headache. Got it cleaned up enough to feel satisfied.

Alpha 1 Build Scope

Chapter 1 story is in rough draft. Rough means it'll keep changing through development. Incremental progress and revision beats trying to get it right in one shot.

Alpha 1 covers through mid-Chapter 1.

What happens when Alpha 1 ships?

Nothing. It's a self-imposed deadline and a goal.

Character Forge: ComfyUI 기반 캐릭터 일러스트 생성기

Sat, 28 Feb 2026 00:00:00 GMT

만든 이유

Epoch: Unseen용 캐릭터 초상화를 만들다 보니, 먼저 막히는 건 이미지 생성 성능이 아니라 프롬프트 관리였다. 프롬프트를 매번 손으로 만지면 결과가 쉽게 흔들렸고, 어떤 조합이 괜찮았는지 다시 찾는 것도 번거로웠다.

그래서 ComfyUI를 그대로 쓰기보다, 캐릭터 초상화용 입력 체계를 한 겹 감싼 Character Forge를 만들었다. 목표는 더 영리한 프롬프트 한 줄을 찾는 게 아니라, 비슷한 조건에서 여러 시도를 빠르게 반복할 수 있는 흐름을 만드는 것이었다.

실제 사용 흐름

실제로는 대개 이런 흐름으로 쓴다.

캐릭터 설정과 외형 방향을 먼저 잡는다.
태그를 골라 한 번에 4장씩 생성한다.
마음에 드는 결과가 나올 때까지 5~6번 정도 반복한다.
후보 일러스트는 즐겨찾기에 모은다.
최종적으로 고른 이미지를 Epoch: Unseen 초상화에 반영한다.

아직은 세세한 편집보다 tex2img로 후보를 빠르게 많이 보는 쪽에 더 가깝다. img2img도 만들고 테스트는 해뒀지만, 지금 단계에서는 아직 자주 쓰지 않는다.

태그를 어떻게 프롬프트로 바꾸는가

입력 태그는 캐릭터를 만드는 재료처럼 역할별로 나뉘어 있다. 성별, 나이, 직업, 종족처럼 기본 설정을 잡는 태그가 있고, 외형, 특징, 의상, 성격, 모에처럼 캐릭터의 개성을 만드는 태그가 따로 있다. 배경이나 조명 옵션도 있지만, 지금은 초상화 위주라 거의 흰 배경만 쓴다.

핵심은 이걸 바로 프롬프트 문자열로 이어 붙이지 않는다는 점이다. 실제 흐름은 태그 조합 -> 카테고리별 해석 -> 최종 템플릿 적용에 더 가깝다.

UI에서 고른 태그를 카테고리별 고정 순서의 해시태그 문자열로 만든다.
서버에서 그 해시태그를 gender, personality, appearance, outfit 같은 버킷으로 다시 나눈다.
필수 카테고리가 빠지면 생성 자체를 막는다.
각 태그를 영어 프롬프트 조각으로 바꾼 뒤 최종 템플릿에 넣는다.

태그마다 복잡한 가중치를 따로 붙이는 구조라기보다, 태그를 한 번 정리된 필드로 바꾼 뒤 마지막 템플릿 단계에서 필요한 규칙만 얹는 방식에 가깝다. 예를 들어 의상은 더 강조하고, 특정 작업 종류에서는 배경이나 카메라 태그를 아예 버린다.

짧게 요약하면 파이프라인은 아래 정도다.

flowchart TD
    A[태그 선택] --> B[해시태그 문자열 생성]
    B --> C[태그 해석과 매핑]
    C --> D[프롬프트 필드 구성]
    D --> E[최종 Prompt 템플릿 적용]
    E --> F[워크플로 JSON 생성]
    F --> G[ComfyUI API 실행]
    G --> H[이미지 저장과 히스토리 반영]

ComfyUI는 어떻게 붙였나

폴더를 감시하면서 돌리는 식은 쓰지 않았다. 로컬 ComfyUI의 HTTP/WebSocket API를 직접 호출하는 구조다.

Python이 POST /prompt로 워크플로를 넣는다.
GET /history/{prompt_id}로 완료 여부를 본다.
가능하면 WebSocket으로 step progress도 받는다.
완료되면 /view로 결과 이미지를 가져온다.

실제 생성용 워크플로는 코드에서 JSON으로 만들고, 마지막 positive/negative prompt도 Python 쪽 템플릿에서 조립한다. 그래서 UI에서 태그를 고르고 나면, 그다음부터는 초상화 생성용 파이프라인이 비교적 일정한 형태로 돈다.

뭐가 좋아졌나

가장 만족스러운 건 일관성이 좋아졌다는 점이다. 프롬프트를 매번 처음부터 고민하는 대신, 제한된 태그 안에서 빠르게 여러 시도를 돌릴 수 있게 되면서 전체 작업 속도도 함께 올라갔다. 선택지를 줄인 덕분에 오히려 반복은 쉬워졌고, 버튼 몇 번 누르는 정도로 다시 시도할 수 있다는 점이 생각보다 컸다.

로컬 DB에 히스토리를 저장해 두는 것도 실제로 유용했다. 태그와 프롬프트가 같이 남고, 버튼 한 번으로 같은 조건을 다시 돌릴 수 있다. 즐겨찾기 기능도 마음에 드는 결과와 아닌 결과를 나누는 데 꽤 도움이 됐다. 결국 이미지를 많이 뽑는 것보다 괜찮았던 시도를 다시 찾을 수 있는 쪽이 더 중요했다.

지금은 맥 미니 64GB에서 SDXL과 ComfyUI를 돌리고 있고, 한 장에 대략 1~2분 정도 걸린다. 초상화 용도로는 감당 가능한 수준이다.

결과가 설정을 바꾸는 순간도 있었다

이 파이프라인을 쓰면서 재미있었던 건, 설정이 이미지를 끌고 가는 경우만 있는 게 아니라 이미지가 설정을 다시 끌고 오기도 했다는 점이다.

예를 들어 주인공 에리히는 처음에는 결벽증 있는 군인, 조금 더 현실적이고 날카로운 인상으로 잡혀 있었다. 그런데 실제로 여러 장을 뽑아 보니 내가 고르게 되는 쪽은 점점 JRPG 주인공에 가까운 톤이었다. 갑옷 형태나 색 대비, 얼굴 인상, 전체 실루엣이 그 방향으로 정리되기 시작했고, 결국 지금 버전의 에리히는 처음 머릿속에 있던 이미지보다 훨씬 만화적이고 정제된 쪽으로 바뀌었다.

아래 세 장만 봐도 그 변화가 꽤 또렷하다. 1차에서는 현실적인 군인 인상이 강했고, 2차에서는 스타일이 한 번 만화적으로 튀었고, 지금 버전에서는 그중 쓸 만한 요소만 남기면서 훨씬 정리된 얼굴과 의상 톤으로 굳어졌다.

1차: 현실적인 군인 톤

2차: JRPG 톤이 강해진 중간안

현재: 정리된 최종 방향

이런 변화는 자유 입력 프롬프트만으로 작업할 때보다 태그 기반 시스템에서 더 빨리 보였다. 조건을 조금씩 바꿔 가며 여러 후보를 연속으로 비교할 수 있으니, 머릿속 설정과 실제로 화면에서 잘 살아나는 설정의 차이가 더 빨리 드러났기 때문이다.

아직 남은 한계

물론 아직 완전히 통제되지는 않는다. 흰 배경으로 설정해도 가끔 배경이 딸려 나오는 경우가 있다. 지금은 그냥 다시 돌린다. 초상화만 만들 때는 이 정도로도 버틸 만하지만, 배경이나 이벤트 일러스트까지 넓히면 이야기가 달라질 수 있다.

또 아직은 초상화 파이프라인에서만 검증된 도구라는 점도 한계다. 지금까지는 이 범위 안에서 꽤 만족스럽지만, 더 복잡한 장면에서 일러스트 일관성을 어떻게 유지할지는 아직 본격적으로 부딪혀 보지 않았다. 그 단계에서는 NanoBanana처럼 다른 강점을 가진 도구와 같이 써야 할 수도 있다.

결국 프롬프트가 아니라 입력 체계의 문제였다

좋은 프롬프트 한 줄을 찾는 게 중요한 줄 알았는데, 만들면서 느낀 건 반복 가능한 입력 체계를 만드는 쪽이 더 컸다. 직접 문장을 길게 쓰는 방식은 자유롭지만, 그만큼 결과를 다시 관리하기가 어렵다. 반대로 태그로 선택지를 제한하면 한 번의 시도는 덜 자유롭지만, 여러 번 반복하기는 훨씬 쉬워진다.

지금 Character Forge는 적어도 그 반복을 쉽게 만드는 쪽에서는 꽤 만족스럽다. 프롬프트 작성을 입력 시스템으로 바꾼 것만으로도 작업 감각이 제법 달라졌다.

Character Forge: Tag-Based Character Portrait Generator on ComfyUI

Sat, 28 Feb 2026 00:00:00 GMT

Why I built it

Making Epoch: Unseen portraits, the bottleneck wasn't generation quality — it was prompt management. Hand-editing prompts made results unstable. Finding which combination worked before was tedious.

Instead of using ComfyUI raw, I wrapped it with a portrait-specific input layer: Character Forge. Not searching for one clever prompt. Building a flow for rapid iteration under similar conditions.

Actual usage flow

Typical workflow:

Set character concept and visual direction.
Pick tags, generate 4 images at once.
Repeat 5–6 rounds until something clicks.
Favorite the promising results.
Apply the final pick as an Epoch: Unseen portrait.

Still closer to rapid txt2img candidate screening than fine editing. img2img is built and tested but not used much at this stage.

How tags become prompts

Input tags are split by role, like ingredients for building a character. Base settings — gender, age, class, race. Character traits — appearance, features, outfit, personality, moe. Background and lighting exist but portraits mostly use white backgrounds.

Key point: tags don't concatenate straight into a prompt string. The actual flow is tag combination → per-category interpretation → final template application.

Selected tags become a category-ordered hashtag string in the UI.
Server splits hashtags into buckets: gender, personality, appearance, outfit.
Missing required categories block generation.
Each tag maps to an English prompt fragment, assembled into the final template.

Not a complex per-tag weighting system. Tags get normalized into clean fields first, then the template stage applies lightweight rules. Outfit gets emphasized, certain task types drop background or camera tags entirely.

Pipeline summary:

flowchart TD
    A[Tag Selection] --> B[Hashtag String]
    B --> C[Tag Parsing & Mapping]
    C --> D[Prompt Field Assembly]
    D --> E[Final Prompt Template]
    E --> F[Workflow JSON Generation]
    F --> G[ComfyUI API Execution]
    G --> H[Image Save & History Update]

How ComfyUI connects

No folder-watching. Direct HTTP/WebSocket calls to local ComfyUI.

Python sends POST /prompt with the workflow.
GET /history/{prompt_id} checks completion.
WebSocket receives step progress when available.
On completion, /view fetches the result image.

The generation workflow is built as JSON in code. Final positive/negative prompts are assembled from Python-side templates. After tag selection in the UI, the portrait pipeline runs in a fairly consistent shape.

What improved

Consistency improved most. Instead of agonizing over prompts from scratch, I run many attempts quickly within constrained tags. Fewer choices made repetition easier. A few button presses to retry — that mattered more than expected.

Local DB history storage proved useful too. Tags and prompts are saved together; one button replays the same conditions. Favorites helped separate good results from noise. Finding "what worked before" turned out more important than generating volume.

Currently running SDXL on a Mac Mini 64GB with ComfyUI. About 1–2 minutes per image. Manageable for portraits.

Results reshaped the design

An interesting pattern: the design didn't always drive the image. Sometimes the image pulled the design back.

Erich, the protagonist, started as a neat-freak soldier — realistic, sharp, grounded. But across many generations, the ones I kept picking drifted toward a JRPG protagonist tone. Armor shape, color contrast, face, overall silhouette all shifted that direction. The current Erich is far more stylized and polished than the original mental image.

1st: Realistic soldier tone

2nd: JRPG tone emerges

Current: Refined final direction

This shift showed up faster in the tag-based system than in free-form prompting. Tweaking conditions incrementally and comparing candidates in sequence reveals the gap between your mental image and what actually works on screen.

Remaining limits

Control isn't complete. White background setting sometimes leaks background elements. For now I just rerun. Manageable for portraits, but extending to scene or event illustrations changes the equation.

Also, Character Forge is only validated for the portrait pipeline. Maintaining illustration consistency across more complex scenes hasn't been tested seriously. That stage might need tools with different strengths, like NanoBanana.

It was an input system problem, not a prompt problem

Thought one good prompt was the key. Building this taught otherwise: a repeatable input system matters more. Free-form prompting is flexible but hard to manage across iterations. Tag-constrained input trades freedom per attempt for easy repetition.

Character Forge delivers on repeatability. Turning "prompt writing" into an "input system" changed the working feel more than expected.

Epoch: Unseen Devlog 6 - 세계관과 프로젝트 이름 확정

Sat, 28 Feb 2026 00:00:00 GMT

이번엔 시스템보다 방향을 정하는 쪽이 중요했다. 세계관을 중세 판타지로 굳히고, 이후 devlog와 문서에서 계속 쓰게 될 이름을 Epoch: Unseen으로 정해 둔 짧은 체크포인트다.

시나리오 방향

브레인스토밍을 계속하면서 세계관 방향을 중세 판타지로 굳혔다.

프로젝트 제목

제목은 Epoch: Unseen으로 잡았다.

다음 정리

이제는 이 방향에 맞춰 시나리오와 타이틀 쪽 문구를 계속 다듬을 생각이다. 이번 회차는 짧지만, 설정과 이름을 정한 기록으로 남겨 둔다.

Epoch: Unseen Devlog 6 - Setting and Project Name

Sat, 28 Feb 2026 00:00:00 GMT

This round was about direction, not systems. Committed to medieval fantasy as the setting, locked the name Epoch: Unseen for devlogs and documents going forward. Short checkpoint.

Scenario Direction

Continued brainstorming. Locked the setting as medieval fantasy.

Project Title

Epoch: Unseen.

Scenario copy and title screen text will be refined to match this direction. Short entry, but worth recording the decision.

Epoch: Unseen Devlog 5 - 응전과 AP 규칙 정리

Sun, 15 Feb 2026 00:00:00 GMT

이번엔 전투에서 자주 손이 갈 규칙을 묶었다. 응전, AP, 리팩토링을 같이 정리하면서 전투 페이즈를 어디까지 마무리할지 기준을 세운 회차에 가깝다.

응전 시스템 고도화 및 전투 시스템 정비

공격을 받았을 때 반격 / 회피 / 방어를 고를 수 있는, 슈로대 스타일 응전 시스템을 작업했다.

AI가 어떤 선택을 할지 계산하는 공식도 같이 만들었다. 이 부분은 GPT 도움을 많이 받았는데 생각보다 꽤 복잡했다.

밸런스 테스트 때 손볼 수 있도록 조절 가능한 변수는 최대한 밖으로 빼두었다.

AP 시스템

AP(Action Point) 개념도 넣었다. 공격, 스킬, 아이템 사용 시 소모된다.

대부분의 유닛은 1~~3 AP 정도를 쓰게 하고, 보스급이거나 의도적으로 강하게 설계한 유닛은 4~~5까지 줄 수도 있을 것 같다.

어쩌면 MP를 완전히 대체할 수도 있는데, 이건 조금 더 굴려 보고 결정할 생각이다.

리팩토링

개발을 계속 밀다 보니 코드가 다시 지저분해진 느낌이 들어 한 차례 정리를 했다.

과한 최적화나 오버엔지니어링 성격의 코드는 걷어냈다.

다음 작업

개발을 다시 붙잡은 지 대략 두 달 정도 지났다. 회사 다니면서 병행하니 속도가 느린 건 어쩔 수 없다.

유닛 상세 정보 팝업과 일시정지 팝업만 만들면 전투 페이즈는 얼추 마무리될 것 같다.

이제 시나리오와 정비 페이즈 준비도 슬슬 시작해야겠다.

Epoch: Unseen Devlog 5 - Counter System and AP Rules

Sun, 15 Feb 2026 00:00:00 GMT

Bundled frequently-touched combat rules this round. Counter attacks, AP, refactoring — setting a boundary for how far to take the combat phase.

Counter System and Combat Cleanup

Built the SRW-style counter system: Counter / Evade / Defend on receiving an attack.

Also built the AI formula for choosing between them. Got heavy GPT assistance — more complex than expected.

Tunable variables exposed for later balance testing.

AP System

Added Action Points. Consumed by attacks, skills, item use.

Most units spend 1–3 AP. Boss-tier or intentionally strong units may get 4–5.

Might fully replace MP. Need more playtesting to decide.

Refactoring

Code got messy from sustained pushing. Did a cleanup pass.

Stripped over-optimized and over-engineered code.

About two months since restarting development. Slow pace, working alongside a day job.

Unit detail popup and pause popup — once those are done, the combat phase is roughly complete.

Time to start preparing scenario and maintenance phase.

Epoch: Unseen Devlog 4 - 전투 AI와 UI 초안

Sun, 01 Feb 2026 00:00:00 GMT

이번엔 전투 AI, 시나리오 그래프, 전투 UI 초안을 한 번에 잡았다. 아직 콘텐츠가 많이 올라간 단계는 아니지만, 전투 프로토타입의 뼈대는 이때 거의 나온 셈이다.

AI 시스템 고도화

Behavior Designer 에셋으로 AI를 만들기로 한 뒤, 이제는 실제로 어떤 로직에 따라 행동하게 할지 트리를 확정하고 구현하는 작업을 했다.

만들다 보니 공통으로 반복되는 부분이 많았다. 그래서 공통 로직은 별도 트리로 분리하고, 트리끼리 연결하는 방식으로 정리했다. 많이 쓰는 방식이라는데, 확실히 훨씬 편했다.

우선 전장 전체의 방침을 먼저 판단한다. 나는 이걸 Directive(지령)라고 부른다.

특정 유닛을 공격해라
특정 위치로 이동하라
특정 유닛을 따라가라

이런 건 전장의 특수 상황에서 가장 먼저 따라야 하는 행동이다.

특별한 지령이 없으면 각 유닛은 자기 병종 타입(Force)에 맞춰 행동한다.

힐러 : 회복 스킬, 버프 스킬 등
물리 근접 딜러 : 물리 공격, 공격 스킬 등
원거리 딜러 : 물리 공격, 공격 스킬, 카이팅 등
마법 딜러 : 공격 스킬, 디버프 스킬 등
탱커 타입 : 공격 스킬, 길막기, 요충지 점유 등

그리고 각 병종 트리 안에서 다시 정책(Policy)을 판단한다.

공격(Attack) : 적을 향해 이동하여 공격
방어(Defence) : 적이 내 공격 범위 안에 들어오면 이동하여 공격
고정(Hold) : 이동하지 않고 현재 위치에서 행동

생각보다 구현이 오래 걸렸다. 거의 일주일 가까이 붙잡고 씨름했다. 그래도 이제는 베이스가 잡혔으니, 앞으로는 콘텐츠를 얹으면서 버그를 찾아 정리하면 될 것 같다.

스킬 리팩토링

AI 트리를 만들다 보면 스킬 시스템과 바로 엮이는데, 그 과정에서 기존에 미구현이거나 리팩토링이 덜 끝난 코드들이 계속 드러났다. 버프 스킬을 써도 버프가 안 걸리거나, 공격 스킬을 써도 대미지가 안 들어가는 식의 문제들이 있었다. 이런 부분을 전부 찾아서 정리했다.

전투 유닛 뷰어

아직 UI가 충분히 갖춰지지 않아서, 스킬을 쓴 뒤 버프나 디버프가 실제로 적용됐는지 한눈에 확인하기가 어려웠다. 그래서 빠르게 전투 유닛 뷰어를 하나 만들었다.

전투 시나리오 그래프

전투 시나리오를 구현하기 위한 커맨드들도 계속 추가하고 있다.

일단은 당장 필요한 것들부터 넣었다. 실제로 시나리오를 만들기 시작하면 더 늘어날 것 같다.

UI 작업

이제 아주 귀찮은 구간에 들어왔다. 내 기준으로는 여기가 첫 번째 고비다.

대사

유닛 정보(간략 버전)

반격 시스템은 슈퍼로봇대전이나 파이어엠블렘처럼 방어 / 반격 / 회피를 고르는 방식을 채택하려고 한다. 적 턴에도 유저가 어느 정도 개입하면서 전투를 풀어나간다는 느낌을 주고 싶었다.

다음 작업은 반격 페이즈 UI다.

UI 구현 순서는 반격 페이즈 -> 유닛 정보(상세) -> 스킬 정보 -> 전장 정보 -> 메뉴 정도로 보고 있다.

여기까지 나오면 1차 프로토타입으로는 충분하지 않을까 싶다. 아이템이나 특성 같은 요소는 2차 이후로 넘기는 편이 맞아 보인다.

Epoch: Unseen Devlog 4 - Combat AI and UI Draft

Sun, 01 Feb 2026 00:00:00 GMT

Combat AI, scenario graph, and combat UI draft — all tackled at once. Not content-rich yet, but the combat prototype skeleton came together here.

AI System

After committing to Behavior Designer, this round was about finalizing the decision logic and implementing the trees.

Recurring patterns emerged quickly. Extracted shared logic into separate trees and linked them. Common approach, noticeably easier.

First, evaluate the battlefield-level directive:

Attack a specific unit
Move to a specific position
Follow a specific unit

These are top-priority actions for special battlefield situations.

Without a directive, each unit acts according to its class type (Force):

Healer: heal skills, buff skills
Melee DPS: physical attacks, attack skills
Ranged DPS: physical attacks, attack skills, kiting
Magic DPS: attack skills, debuff skills
Tank: attack skills, blocking, holding chokepoints

Within each class tree, a policy layer decides:

Attack: move toward enemy, engage
Defense: engage if enemy enters my range
Hold: stay in place, act from current position

Took nearly a week. But the base is set — from here it's content on top and bug-hunting.

Skill Refactoring

Building AI trees immediately exposed incomplete or un-refactored skill code. Buff skills not applying, attack skills dealing zero damage. Found and fixed all of them.

Combat Unit Viewer

UI wasn't ready enough to confirm buff/debuff application at a glance. Built a quick combat unit viewer.

Scenario Graph

Commands for combat scenarios keep growing.

Added the immediately necessary ones. More will come once real scenario authoring starts.

UI Work

Now entering the tedious zone. First major hurdle by my standards.

Dialogue

Unit info (compact version)

Counter system will follow Super Robot Wars / Fire Emblem style: Counter / Evade / Defend selection. Want the player to stay engaged even during enemy turns.

Next: counter phase UI.

UI implementation order: Counter phase → Unit info (detail) → Skill info → Battlefield info → Menu.

Once those are in, should be enough for a first prototype. Items and traits push to the second round.

Epoch: Unseen Devlog 3 - 전투 프로토타입 재료 채우기

Sat, 17 Jan 2026 00:00:00 GMT

이번엔 전투에 바로 보이는 것들을 많이 만졌다. 유닛 스프라이트를 붙이고, 주인공 설정과 비주얼도 같이 다시 잡고, 전투 애니메이션과 공식까지 한 번에 정리했다. 이제야 전투 프로토타입의 모양이 조금 보이기 시작했다.

유닛 스프라이트

유닛 스프라이트는 SPUM을 쓰기로 했다. 개인적으로는 레트로한 픽셀 아트를 더 좋아해서 SPUM 그림체가 완전히 취향인 건 아니다. 그래도 편의성과 생산성을 생각하면 꽤 합리적인 선택이다. 인디 개발자들이 많이 고르는 데는 이유가 있더라.

예전에는 최적화 때문에 별도 시트 작업이 필요했는데, 지금은 정식 기능으로 지원돼서 훨씬 편하다. 딸깍하면 바로 나온다는 게 정말 크다.

흰머리 용사처럼 생긴 캐릭터만 조금 손봤고, 나머지는 거의 기본 프리셋을 그대로 썼다. 지금 단계에서는 완성도보다 속도가 더 중요하다.

주인공 설정

이 회차에서 주인공 쪽 설정도 다시 잡았다. 아주 오래전에 천하라는 모드를 만들며 굴리다 멈춘 설정이 있었는데, 이번에는 그걸 그대로 복원하기보다 남아 있는 핵심만 가져오기로 했다.

예전 색은 삼국지 쪽에 더 가까웠지만, 이번에는 배경을 중세 판타지로 완전히 틀었다. 자료도 거의 남아 있지 않아서, 결국 머릿속에 남아 있는 조각들을 바탕으로 다시 세우는 쪽에 가까웠다.

에리히와 리제 비주얼도 이때 임시로 잡았다. 일러스트는 SDXL로 뽑았고, 몇 번 돌려 본 뒤 당장 쓸 만한 결과를 골랐다.

전투 애니메이션

여기서 시간이 꽤 걸렸다. 그래도 이 작업을 지나고 나니 레거시 코드는 거의 남지 않았다. 예전 코드를 억지로 이어 붙이기보다, 필요한 부분은 버리고 구조부터 다시 잡는 쪽으로 갔다.

애니메이션 에디터도 지금은 너무 편하다. 이런 부분까지 AI와 에셋의 도움을 받다 보니, 예전에는 대체 어떻게 다 직접 만들었나 싶을 정도다.

스탯 타입

현재 생각 중인 기본 스탯은 아래와 같다.

체력 : HP
기력 : MP
공격 : 물리 공격력
방어 : 물리 방어력
마력 : 마법 공격력
저항 : 마법 방어력
속도 : 명중률, 회피, 연속공격
행운 : 크리티컬

대체로는 이 방향이다. 다만 스탯 이름은 아직 완전히 확정하지 않았다.

SRPG 중에는 스탯을 이원화하는 경우도 많다. 예를 들면 무력 -> 공격력, 통솔 -> 방어력처럼 어떤 스탯이 바로 쓰이지 않고 한 단계 변환을 거쳐 실제 수치에 반영되는 방식이다.

처음에는 이 프로젝트도 비슷한 구조였지만 결국 폐기했다. 지금 형태가 더 직관적이고, 육성 공식도 훨씬 단순하게 가져갈 수 있어서다.

나중에 사령관 시스템을 붙이게 되면 지휘 같은 스탯이 추가될 수도 있겠다. 아직은 구상 단계다.

전투 공식

전투 공식은 GraphToolkit으로 에셋화했다. 위 스크린샷은 마법 명중률 공식이다.

물리 피해량, 마법 피해량, 물리 명중률, 마법 명중률, 크리티컬 확률처럼 전투 전반에 공통으로 쓰이는 공식이 있는 반면, 특정 마법이나 스킬만 따로 쓰는 특수 공식도 필요할 것 같다.

결국 공식 종류가 엄청 많아질 게 뻔하다.

이걸 전부 코드 기반으로만 관리하면 나중에 손대기 너무 힘들어진다. 그래서 공식 자체를 데이터 자산으로 분리할 필요가 있었다.

그 과정에서 고른 게 GraphToolkit이다. 공식도 결국 사람이 읽고 수정해야 하니, 시각적으로 보이는 쪽이 낫다고 판단했다.

Epoch: Unseen Devlog 3 - Filling the Combat Prototype

Sat, 17 Jan 2026 00:00:00 GMT

Touched a lot of visible combat pieces this round. Unit sprites, protagonist setup and visuals, combat animations, battle formulas — all in one pass. The combat prototype finally has a shape.

Unit Sprites

Went with SPUM. Not entirely my taste — I prefer retro pixel art — but convenience and productivity make it a reasonable pick. Popular among indie devs for good reason.

Used to require separate sheet work for optimization. Now it's a built-in feature. One click and it's done. That matters.

Tweaked the white-haired hero character slightly. Everything else is mostly default presets. Speed over polish at this stage.

Protagonist Setup

Revisited protagonist lore this round. Had an old setting from a mod called "Cheonha" that I'd shelved years ago. Instead of restoring it wholesale, I pulled only the surviving core.

Original flavor leaned Three Kingdoms. This time, fully pivoted to medieval fantasy. Almost no reference material survived — rebuilt mostly from memory fragments.

Drafted Erich and Lise visuals here. Illustrations generated with SDXL, picked usable results after a few rounds.

Combat Animations

Ate a lot of time. But after this pass, almost no legacy code remains. Chose to restructure rather than force-fit old code.

The animation editor is so convenient now. Between AI and asset tools, hard to imagine how I used to do everything manually.

Stat Types

Current base stats:

HP
MP
ATK — physical attack
DEF — physical defense
MAG — magic attack
RES — magic resistance
SPD — hit rate, evasion, follow-up attacks
LCK — critical rate

Roughly this direction. Names not fully locked.

Some SRPGs use two-tier stats — e.g., "Prowess → Attack Power." This project started that way but I scrapped it. Current form is more intuitive and simplifies growth formulas.

A "Command" stat might appear later if a commander system gets added. Still conceptual.

Battle Formulas

Battle formulas turned into data assets via GraphToolkit. Screenshot shows the magic hit-rate formula.

Physical damage, magic damage, physical hit rate, magic hit rate, crit chance — common formulas used across combat. Plus per-skill special formulas.

Formula count will grow fast.

Managing all of this in code alone becomes unmaintainable. Separating formulas into data assets was necessary. GraphToolkit won because formulas need to be human-readable and editable.

Epoch: Unseen Devlog 2 - 타일맵과 전투 AI 기반 정리

Sat, 03 Jan 2026 00:00:00 GMT

이번엔 직접 만들던 편집 툴과 하드코딩 로직을 접고, 유니티 기본 도구와 에셋 쪽으로 방향을 틀었다. 타일맵, 범위 세팅, AI까지 앞으로 계속 손이 갈 부분을 한 번에 정리한 회차다.

유니티 타일맵 도입

개발 마인드가 조금 바뀌었다. 예전에는 최대한 직접 만들어 보자는 쪽이었는데, 이제는 가져다 쓸 수 있는 건 가져다 쓰고 더 빨리 결과를 내는 쪽에 가깝다.

그래서 직접 만들던 전용 타일맵 툴은 과감하게 버리고, 유니티 타일맵을 쓰기로 했다. 좌표계도 유니티 기준으로 맞췄다. 원래는 좌측 상단이 (0, 0)이었지만, 이제는 좌측 하단이 (0, 0)이다.

다만 타일맵 에셋은 잘못 고른 것 같다. 써 보니 마음에 안 드는 부분이 꽤 있어서 조금 후회된다. 당분간은 계속 쓰겠지만, 나중에는 다른 에셋으로 바꿀 가능성이 크다.

공격범위 / 효과범위 통합 및 세팅 툴 작업

기존에는 공격 범위와, 공격 대상을 중심으로 퍼지는 효과 범위가 따로 놀고 있었다. 따지고 보면 같은 범위 로직으로 처리할 수 있는 문제라 판단해서 하나로 통합했다. 범위 안에 포함되는지 하드코딩으로 판정하던 부분도 전부 걷어내고, 눈으로 보면서 만들 수 있는 세팅 툴도 같이 작업했다.

범위는 타깃을 중심으로 4분면으로 나눈 뒤, 한 면을 칠하는 방식으로 지정한다.

방향 개념이 없으면 상하좌우 대칭으로 4개 면을 모두 다룬다.
방향 개념이 있으면 아래를 바라보는 기준으로 좌우 대칭만 적용해 2개 면만 다룬다.

ALL - 전범위
CONE - 부채꼴 모양
CROSS - 십자
DIAMOND - 마름모(몰우전)
LASER - 관통(이격, 삼격, 육격)
NONE - 범위 없음
RANGE - 원거리(궁병, 노병, 연노병)
SQUARE - 8방향(구궁)

유닛 AI에 Behavior Designer 도입

AI 로직도 하드코딩이 너무 심해서 관리가 어려웠다. 그래서 기존 코드는 전부 걷어내고 Behavior Tree를 도입했다. 시각적으로 AI의 의사결정 구조를 볼 수 있다는 점이 크다. 대신 러닝 커브는 조금 있다.

위 스크린샷은 공격 범위 안에 적이 있으면 이동해서 공격하는 가장 단순한 형태의 트리다.

트리는 클래스 타입에 따라 5~~6종류, 여기에 현재 방침까지 더하면 3~~4종류 정도의 파생형이 필요할 것 같다. 중요한 적 보스는 별도 트리를 따로 줄 생각이다.

Behavior Designer를 도입한 뒤에는 병종별 베이스 트리를 먼저 잡고, 시나리오별 파생형을 추가하는 방식으로 정리할 생각이다.

Epoch: Unseen Devlog 2 - Tilemap and Combat AI Foundation

Sat, 03 Jan 2026 00:00:00 GMT

Dropped custom editing tools and hardcoded logic, switched to Unity built-ins and third-party assets. Tilemap, range setting, AI — reorganized everything that'll need constant attention going forward.

Unity Tilemap

Development mindset shifted. Used to build everything myself. Now: use what's available, ship faster.

Scrapped the custom tilemap tool, adopted Unity Tilemap. Coordinate system aligned to Unity — origin moved from top-left (0,0) to bottom-left (0,0).

The tilemap asset choice was a mistake, though. Not happy with it after using it. Sticking with it for now, likely replacing later.

Attack Range / Effect Range Unification

Attack range and effect range (spreading from a target) were handled separately. Both are fundamentally the same range logic, so I unified them. Replaced all hardcoded containment checks with a visual setting tool.

Ranges are defined by painting quadrants around a target:

No direction: all four quadrants (symmetrical).
Directional: two quadrants only (left-right symmetry, facing down).

ALL — full area
CONE — fan shape
CROSS — cross pattern
DIAMOND — diamond / rhombus
LASER — piercing line
NONE — no range
RANGE — ranged (archers)
SQUARE — 8-directional

Behavior Designer for Unit AI

AI logic was too hardcoded to manage. Stripped it all and introduced Behavior Trees. Visual decision structure is the big win. Some learning curve.

Screenshot shows the simplest tree: if an enemy is within attack range, move and attack.

Expecting 5–6 tree variants per class type, plus 3–4 per combat policy. Key enemy bosses get dedicated trees.

Plan: establish per-class base trees first, then add scenario-specific variants.

Epoch: Unseen Devlog 1 - 재시동

Thu, 01 Jan 2026 00:00:00 GMT

이번엔 새 기능보다 재시동 준비가 먼저였다. 멈춰 있던 Epoch: Unseen을 다시 굴릴 수 있게 레거시 구조를 걷고, 이후 전투 시스템을 다시 붙일 바닥부터 정리했다.

재시동

Epoch: Unseen은 2021년부터 2022년까지 만들던 SRPG용 엔진이다. 2023년과 2024년에도 틈틈이 손은 댔지만, 사실상 거의 멈춰 있던 프로젝트였다.

거의 4년 전에 잡아 둔 구조라 지금 기준과 맞지 않는 부분이 많았다. 유니티 버전도 달라졌고, 그 사이 내 작업 방식도 많이 바뀌었다. 그래서 예전 코드를 그대로 이어 붙이기보다, 다시 시작할 수 있는 기반부터 정리하기로 했다.

정리한 범위

지난 2주 동안은 레거시 코드를 찾아 고치고 구조를 정리하는 데 시간을 썼다. 감각상으로는 큰 틀의 정리가 70% 정도 끝난 상태다.

지금은 전투 시스템을 바로 붙이기보다, 이후 작업에서 계속 걸릴 만한 구조를 먼저 걷어내는 쪽이 우선이었다. 시나리오와 시스템 기획도 이 기준에 맞춰 다시 확인하고 있다.

다음 단계

재시동 정리가 끝나면 전투 시스템, 시나리오, 정비 페이즈처럼 실제 플레이 흐름을 이루는 작업으로 다시 넘어갈 생각이다.

Epoch: Unseen Devlog 1 - Restart

Thu, 01 Jan 2026 00:00:00 GMT

New features came second. First priority: getting the stalled Epoch: Unseen ready to run again. Stripped legacy structure and prepped the foundation for reconnecting the combat system.

Restart

Epoch: Unseen is an SRPG engine I built from 2021 to 2022. Touched it occasionally in 2023 and 2024, but the project was effectively frozen.

Four-year-old architecture didn't match current standards. Unity version changed, my workflow changed. Rather than grafting onto old code, I decided to clear a foundation for a fresh start.

Scope of cleanup

Spent two weeks finding and fixing legacy code, restructuring. Rough sense: 70% of the big-picture cleanup is done.

Priority was removing structural problems that would keep tripping up future work, rather than jumping straight into combat. Scenario and system design are being re-checked against this baseline.

Once the restart cleanup wraps, moving to the parts that make the game playable — combat system, scenario, maintenance phase.

FlyingCat으로 확인한 바이브 코딩의 현재

Sat, 20 Dec 2025 00:00:00 GMT

FlyingCat은 내가 바이브 코딩을 다시 보게 된 첫 프로젝트였다. 실제 결과물 자체는 FlyingCat 프로젝트 페이지에도 정리해 두었고, 이 글에서는 왜 이 게임이 내 기준을 바꿨는지 쪽에 조금 더 집중해 보려고 한다.

지난여름까지만 해도 바이브 코딩은 꽤 실망스러웠다. 같은 요청을 넣어도 결과가 들쭉날쭉했고, 실전에 넣기에는 불안한 구석이 많았다. 그런데 몇 달 사이 체감이 제법 달라졌다. 이제는 질문 몇 번 던져 보는 수준이 아니라, 작은 프로젝트 하나쯤은 통째로 맡겨 보는 실험도 해볼 만하겠다는 생각이 들었다.

왜 FlyingCat이었나

모티브는 예전 플래시 게임 NANACA CRASH, 국내에서는 남친 날리기로도 불리던 게임이었다. 각도와 파워를 잡아 캐릭터를 날리고, 중간 장애물이나 서포터에 부딪히며 거리를 늘리는 구조다. 지인이 이런 류의 게임을 한 번 만들어 보라고 추천했고, 그걸 고양이와 햄스터 분위기로 비틀어 FlyingCat을 잡았다.

이런 게임이 실험 대상으로 괜찮았던 건 겉보기보다 확인할 게 많았기 때문이다. 각도와 파워 조절, 비행 중 상태 전환, 장애물 효과, UI, WebGL 빌드, 랭킹까지 들어가면 작은 게임이어도 손이 갈 곳이 꽤 많다. 버튼 몇 개 있는 앱보다 바이브 코딩의 장단점을 보기 좋은 소재라고 생각했다.

무엇을 어디까지 맡겼나

도구는 Antigravity 하나만 썼고, 모델도 Gemini 3.0 Pro 하나로만 밀었다. 엔진은 Unity였고, UI는 UIToolkit으로 만들었다. 그때는 Gemini가 프론트엔드 성향이 강하다면 USS 기반 UI도 잘하지 않을까라는 기대가 있었는데, 결과적으로는 이 판단이 꽤 맞았다. 이미지 자산은 NanoBanana로 만들었다.

중요했던 건 내가 일부러 코드에서 손을 뗐다는 점이다. 이 실험의 목표는 게임 하나를 빨리 끝내는 것보다 Agentic Coding을 실제로 겪어 보는 것에 더 가까웠다. 원래는 직접 만들다가 중간에 Antigravity를 붙인 건데, 붙인 뒤로는 손이 근질거려도 참고 지켜봤다. 대신 Plan을 먼저 세우고, 특정 라인이나 수정 방향에 코멘트를 남기면서 허가하는 식으로만 개입했다.

대략 3~4일 정도는 AI에게 구현을 맡기고, 그다음 3~4일은 내가 변경된 내용을 보면서 방향을 다시 잡아 주는 식으로 굴렸다. 완전 자동화와는 거리가 있었지만, 내가 코드를 직접 치지 않고도 프로젝트를 앞으로 밀 수 있다는 감각이 꽤 컸다.

실제로 나온 게임

실제 게임 루프는 아래처럼 돌아갔다.

각도를 조절한다.
파워를 조절한다.
햄스터를 날린다.
비행 중 업/다운 스킬을 써서 방향을 바꾼다.
길 중간의 고양이, 장애물, 서포터 효과를 잘 받아 최대한 멀리 간다.
완전히 멈추면 게임 오버다.

중간 효과도 제법 다양했다. 45도나 60도 방향으로 다시 날려 버리거나, 수평으로 밀어 주거나, 속도를 깎거나, 몇 초 동안 충돌하지 않게 만드는 식이다. 겉으로는 단순해 보여도 실제 구현에 들어가면 물리와 상태 전환이 계속 맞물리는 구조였다.

결과물은 실제로 돌아가는 WebGL 게임까지 나왔다. itch.io에 올렸고, 별도 홍보 없이 자연 노출만 둔 상태에서 대략 10명 정도가 플레이했다. 여기서 멈춘 게 아니라 AWS Cognito와 Lambda를 붙여 랭킹 기능도 넣었다. AWS는 자격증 공부로 살짝 본 정도였지 실전 경험은 거의 없었는데, Gemini에게 물어가며 끝까지 붙였다.

생각보다 멀리 갔다

가장 먼저 체감된 건 속도였다. 내가 퇴근 후 지친 상태로 한 달 동안 붙잡았던 것보다, AI가 하루 만에 더 많은 결과를 앞으로 밀어내는 걸 보면서 꽤 놀랐다. 물론 내가 원래 느리게 진행한 것도 사실이지만, 적어도 AI는 피곤하다는 이유로 손을 놓지는 않았다.

UI도 기대 이상이었다. 특히 UIToolkit 쪽이 꽤 인상적이었다. 당시에는 UIToolkit이 인게임 UI에는 그다지 좋지 않다는 평도 있었고, 나도 반신반의하면서 시작했다. 그런데 Antigravity와 Gemini 3.0 Pro 조합은 USS 기반 레이아웃과 화면 구성을 생각보다 잘 밀어붙였다. 나중에 Claude나 Codex로 비슷한 시도를 해 봤을 때 만족도가 덜했던 걸 떠올리면, 그때는 유난히 Gemini가 잘 맞았던 것 같다.

이 지점에서 처음으로 작은 게임 정도는 AI에게 꽤 맡길 수 있겠다는 감각이 생겼다. 단순히 코드 몇 줄 받아 적는 수준이 아니라, 미니게임 하나를 WebGL로 배포하고 랭킹까지 붙이는 단계는 분명히 실전 쪽에 가까웠다.

그래도 결국 사람이 잡았다

그렇다고 편하게 맡겨 두기만 하면 되는 건 아니었다. 환각 때문에 잘못된 코드를 건드리거나, 설명이 틀린 가이드를 주거나, 보안상 좋지 않은 코드를 제안하는 경우가 중간중간 나왔다. 내가 계속 지켜보며 바로잡았기 때문에 굴러간 거지, 그대로 믿고 넘겼으면 꽤 위험했을 것이다.

가장 크게 꼬인 건 물리와 상태 전환이 만나는 지점이었다. FlyingCat은 게임 스텝에 따라 물리가 켜져야 할 때와 꺼져야 할 때가 분명했는데, 이게 1프레임 차이로 잘못 적용되면서 비행 흐름이 어긋나는 문제가 있었다. 처음에는 Unity 기본 물리를 썼지만, 개발을 진행할수록 상태에 따라 일부러 비물리적인 처리가 더 필요해졌고 결국 기본 물리를 버리고 커스텀으로 다시 짰다.

AWS 연동도 비슷했다. 기능 자체를 붙이는 데는 도움이 됐지만, 가이드가 오래됐거나 설명이 너무 축약돼 있어서 실제 콘솔 메뉴를 못 찾는 경우가 종종 있었다. 이런 부분은 AI가 방향을 잡아 주는 데까지만 의미가 있었고, 마지막 확인은 결국 사람이 해야 했다.

FlyingCat 이후에 남은 기준

이 경험 이후로는 바이브 코딩을 더 이상 신기한 장난감으로 보지 않게 됐다. 적어도 미니게임 수준에서는 가능성을 봤고, 이걸 더 큰 프로젝트로 옮겨 볼 수 있겠다는 생각도 생겼다. 나중에 Epoch: Unseen으로 넘어가 보게 된 것도 결국 이 경험이 발판이었다.

동시에 기준도 분명해졌다. AI에게 구현을 맡길 수는 있지만, 디렉팅과 검수는 여전히 사람 몫이다. 특히 상태 전환, 물리, 보안, 외부 서비스 연동처럼 한 번 틀어지면 뒤가 더 힘든 영역은 더 그렇다.

내가 FlyingCat에서 얻은 결론은 단순하다. 바이브 코딩은 이제 장난감 수준은 아니었다. 그렇다고 끝까지 맡기고 손을 떼도 되는 도구도 아니었다. 작은 게임 하나를 실제로 배포해 보고 나서야, 그 두 가지가 동시에 선명하게 보이기 시작했다.

What FlyingCat Showed Me About Vibe Coding

Sat, 20 Dec 2025 00:00:00 GMT

FlyingCat made me take vibe coding seriously. The game is on the project page. This post is about why it changed my standards.

Last summer, vibe coding disappointed. Same request, inconsistent results, too risky for production. Months later the feeling shifted. Running a small project end-to-end felt worth trying.

Why FlyingCat

The reference was NANACA CRASH, a flash game where you set angle and power to launch a character, bouncing off obstacles and supporters to maximize distance. A friend suggested the genre. I twisted it into a cat-and-hamster theme.

Good experiment material because there's more to verify than it looks. Angle and power control, mid-flight state transitions, obstacle effects, UI, WebGL build, leaderboards — plenty of surface area for a small game. Better than a button-heavy app for seeing vibe coding's strengths and limits.

What I delegated, and how far

One tool: Antigravity. One model: Gemini 3.0 Pro. Engine: Unity. UI: UIToolkit. I had a hunch that Gemini's frontend affinity would extend to USS-based layouts. That turned out correct. Image assets came from NanoBanana.

The important part: I deliberately took my hands off the code. The goal wasn't finishing a game fast — it was experiencing agentic coding firsthand. I'd started building manually, then connected Antigravity. After that, I resisted the urge to touch code and intervened only through plans and line-level comments.

Roughly 3–4 days of AI implementation, then 3–4 days of me reviewing changes and adjusting direction. Not full automation, but pushing a project forward without writing code myself — that feeling was significant.

The actual game

The game loop:

Set angle.
Set power.
Launch the hamster.
Use up/down skills mid-flight to change direction.
Hit cats, obstacles, supporters along the way for maximum distance.
Full stop means game over.

Mid-flight effects were varied: relaunch at 45° or 60°, horizontal push, speed reduction, temporary invincibility. Looks simple, but implementation requires physics and state transitions meshing constantly.

The result was a playable WebGL game uploaded to itch.io. About 10 people played with zero promotion. Beyond that, I added a leaderboard using AWS Cognito and Lambda. I had almost no AWS experience beyond studying for a certification — Gemini walked me through it.

It went further than expected

Speed hit first. What I'd struggled with for a month after work, AI pushed further in a day. Partly because I'm slow, but AI doesn't stop because it's tired.

UI surprised me. UIToolkit was especially impressive. At the time, opinions were mixed on UIToolkit for in-game UI. Antigravity with Gemini 3.0 Pro handled USS layouts better than expected. Later attempts with Claude or Codex were less satisfying — Gemini happened to fit well there.

This was the first time I felt "small games can be meaningfully delegated to AI." Not just receiving a few code snippets — deploying a mini-game to WebGL with a leaderboard is closer to production than to a demo.

A human still steered

Didn't mean I could just hand it off. Hallucinated code, incorrect guides, insecure suggestions appeared throughout. It worked because I kept watching and correcting. Trust-and-go would have been dangerous.

The biggest tangle was where physics and state transitions met. FlyingCat needed physics on and off at precise game steps. One-frame misapplication broke the flight flow. Started with Unity's default physics, but as development progressed, intentionally non-physical handling became necessary. Ended up replacing the default with custom physics.

AWS integration was similar. Helpful for getting the feature attached, but guides were outdated or overly abbreviated — actual console menus were hard to locate. AI pointed the direction; final verification needed a human.

Standards after FlyingCat

After this, vibe coding stopped being a novelty. At least at the mini-game scale, the possibility was real. Moving to a bigger project felt plausible. Epoch: Unseen came next, built on this foundation.

Standards sharpened too. AI can handle implementation, but directing and QA stay human. Especially in areas where one wrong step compounds — state transitions, physics, security, external service integration.

Takeaway is simple. Vibe coding stopped being a toy. It also wasn't something you hand off and walk away from. Deploying one small game made both visible at once.

Five UE5 Animation Concepts I Sorted Out First

Sun, 31 Aug 2025 00:00:00 GMT

Kept putting off Unreal Engine. Finally started. Blueprints and Paper2D first, then 3D via a TPS sample. First wall: animation feature names. Heard all of them, couldn't tell where each fits.

Wrote down the ones I kept bumping into. Not a docs rewrite — how I distinguished them on first contact.

Animation Montage

Montage clicked first. Not the looping locomotion side — this is for "play this action now." Attacks, skills, explicit playback.

In the TPS sample, attack combos and weapon swaps come to mind. Multiple sequences in one asset, with Blueprint controlling play timing and section jumps.

UE5 Documentation

Blend Space

Blend Space felt the opposite — less "play" and more "mix." Takes continuously changing values like speed or direction and blends poses accordingly.

Walk to run transitions, directional strafing — that's where it shows up. Initially confused with montages. The rule "if the value keeps changing, it's a blend space" cleared things up.

UE5 Documentation

Aim Offset

The name felt intimidating, but in practice it's roughly "a blend space for aiming." Base pose stays, upper body aim layers on top.

Obvious why TPS needs it. Character moves continuously while the gun direction tracks up/down and left/right. Different use case from full-body locomotion blending.

UE5 Documentation

Animation Notify

Notify made more sense by usage than by name. Pin an event onto the animation timeline. More intuitive than computing timing in code.

Hit detection, footsteps, VFX, camera shake — immediately clear why notifies exist. At the beginner stage, being able to set timing while looking at frames mattered a lot.

UE5 Documentation

IK Rig & IK Rig Retargeting

Most unfamiliar of the five. Hard to know where to start. For now I understand it as "preparation for reusing another character's animations."

As I understand it: IK Rig sets up a bone hierarchy for IK Solver use. Retargeting transfers motion to a different skeleton based on that setup. Makes sense when you think about reusing external animations or swapping characters.

UE5 Documentation

Summary

UE5 animation felt more complex than it is because of the number of named features. My current split: explicit playback → Montage. Value-driven blending → Blend Space. Aim-only overlay → Aim Offset. Timeline events → Notify. Cross-skeleton motion transfer → IK Rig + Retargeting.

Still a beginner. But connecting each feature to its role in the TPS sample was far faster than memorizing definitions.

UE5 애니메이션을 처음 공부하며 먼저 구분한 5가지

Sun, 31 Aug 2025 00:00:00 GMT

언리얼 엔진 공부를 계속 미뤄 두다가 이제야 손을 댔다. 블루프린트와 Paper2D 쪽을 먼저 보고, 3D 파트에서 TPS 샘플을 따라가기 시작했는데 제일 먼저 막힌 건 애니메이션 기능 이름이었다. 이름은 다 들어 본 것 같은데, 막상 어디에 써야 하는지는 잘 안 잡혔다.

그래서 일단 자주 부딪힌 것들만 따로 적어 두기로 했다. 공식 문서 설명을 다시 옮기는 글은 아니고, 처음 볼 때 내가 어떻게 구분했는지에 더 가깝다.

애니메이션 몽타주

몽타주는 제일 먼저 손에 잡혔다. 계속 돌아가는 이동 애니메이션 쪽이 아니라, 공격이나 스킬처럼 지금 이 동작을 재생해라 하고 명확하게 걸어 주는 용도라고 받아들이니 이해가 쉬웠다.

TPS 샘플로 보면 공격 콤보나 무기 교체 같은 장면이 바로 떠오른다. 여러 시퀀스를 한 에셋 안에 넣고, 블루프린트에서 재생 시점이나 섹션 이동을 잡는 쪽이다.

언리얼엔진5 공식문서

블렌드 스페이스

블렌드 스페이스는 반대로 재생한다기보다 섞는다는 느낌이 강했다. 속도나 방향처럼 계속 바뀌는 값을 받아서 포즈를 이어 붙이는 쪽이다.

걷기에서 달리기로 넘어가거나, 이동 방향에 따라 스트레이프가 섞이는 장면을 보면 감이 온다. 몽타주랑 헷갈리긴 했는데, 그냥 계속 변하는 값이면 블렌드 스페이스라고 생각해 두니 훨씬 편했다.

언리얼엔진5 공식문서

에임 오프셋

에임 오프셋은 처음엔 이름부터 어렵게 느껴졌는데, 보고 나니 거의 조준용 블렌드 스페이스였다. 기본 포즈는 유지하고, 상체 조준만 덧입히는 쪽이다.

TPS에서는 왜 필요한지가 바로 보였다. 캐릭터는 계속 움직이는데 총구 방향만 위아래, 좌우로 따라가야 하기 때문이다. 전신 이동을 섞는 블렌드 스페이스와는 쓰임새가 꽤 다르다.

언리얼엔진5 공식문서

애니메이션 노티파이

노티파이는 기능 이름보다 쓰임새가 더 직관적이었다. 애니메이션 타임라인에 이벤트를 박아 두는 방식이다. 코드에서 시간을 계산하는 것보다 훨씬 눈에 잘 들어왔다.

공격 판정, 발소리, 이펙트, 카메라 흔들림 같은 건 왜 노티파이를 쓰는지 바로 납득이 갔다. 특히 입문 단계에서는 프레임 보면서 타이밍을 맞출 수 있다는 게 컸다.

언리얼엔진5 공식문서

IK 릭, IK 릭 리타게팅

IK 릭과 리타게팅은 이 다섯 개 중에서 제일 낯설었다. 이름만 보면 어렵고, 처음에는 어디서부터 봐야 할지도 애매했다. 일단은 다른 캐릭터의 애니메이션을 가져다 쓰기 위한 준비라고 이해하고 있다.

내가 이해한 식으로 적으면, IK 릭은 특정 본 체계를 IK Solver 기준으로 다루기 위한 셋업이고, 리타게팅은 그 셋업을 바탕으로 다른 스켈레톤에 모션을 옮기는 단계다. 외부 애니메이션을 재사용하거나 캐릭터를 갈아끼우는 장면을 떠올리면 왜 필요한지 조금 납득이 갔다.

언리얼엔진5 공식문서

정리

처음 언리얼 애니메이션을 볼 때는 이름이 많아서 더 복잡하게 느껴졌는데, 지금은 대충 이렇게 나누고 있다. 공격처럼 콕 집어 재생해야 하면 몽타주, 값에 따라 자연스럽게 섞이면 블렌드 스페이스, 조준만 따로 얹으면 에임 오프셋, 프레임 타이밍에 이벤트를 걸면 노티파이, 다른 캐릭터에 모션을 옮길 준비를 하는 쪽이 IK 릭과 리타게팅이다.

아직 입문 단계라 기능을 깊게 이해했다고 하기는 어렵다. 그래도 이름 정의부터 외우는 것보다, TPS 샘플에서 각 기능이 맡는 역할을 먼저 연결해 두는 편이 훨씬 빨랐다.

Grayscale Transition Without Saturation in URP Shader Graph

Sat, 23 Mar 2024 00:00:00 GMT

Playing with Shader Graph, I wanted to build a grayscale transition by hand. Simple goal: take one texture, slide between full color and black-and-white with a single slider. Could have used the Saturation node directly, but I wanted to see how it works underneath.

Problem

Required behavior:

One texture input.
Slider at 0: fully grayscale.
Slider at 1: original color.
Smooth interpolation in between.

Saturation node does this out of the box. Still wanted to confirm how grayscale is actually constructed.

Solution

Two steps:

Convert original color to a grayscale value.
Lerp between grayscale and original.

Grayscale isn't equal-weight RGB. Human perception weights channels differently for a natural-looking result:

gray = R * 0.21 + G * 0.71 + B * 0.07

Spread that gray across all three RGB channels for a grayscale color. Then apply Lerp(A, B, T):

A: grayscale
B: original color
T: 0–1 slider value

In shader terms, one line:

result = lerp(grayscaleColor, originalColor, t)

Why this way

The upside is simple debugging:

Grayscale conversion wrong?
Interpolation value wrong?
Final output wrong?

Each part isolates cleanly. Using the Saturation node directly is faster, but building it once makes "grayscale transition = grayscale generation + interpolation" stick clearly.

Notes

The lasting takeaway is a thinking pattern, not a technique. For simple effects like this, understanding "transform the original into some intermediate form, then blend the two" outlasts memorizing a node name.

When building similar effects:

First create the intermediate representation. Here: grayscale.
Blend it with the original via Lerp.
Attach a slider or mask for control range.

URP Shader Graph에서 Color와 Vector가 다르게 보였던 이유

Sat, 23 Mar 2024 00:00:00 GMT

Linear로 돌리던 URP 프로젝트에서 Shader Graph 값을 더하다가 한 번 멈췄다. Color(0.5, 0.5, 0.5)를 두 번 더했는데 흰색까지 가지 않았다. 같은 숫자를 Vector로 넣으면 훨씬 예상에 가까운 결과가 나왔다.

처음에는 노드 연결을 잘못한 줄 알았다. 그런데 조금 뜯어보니 계산식보다 값의 성격이 더 중요했다. 눈에 보이는 색과 계산에 들어가는 숫자를 같은 감각으로 다루고 있었던 셈이다.

헷갈렸던 장면

URP 프로젝트를 Linear로 두면, 화면에 보이는 색과 셰이더 안에서 계산에 쓰이는 값이 바로 1:1로 겹치지 않는다. 머리로는 알고 있었는데도, Shader Graph에서 Color 노드와 Vector 노드를 섞어 쓰니 그 차이가 더 크게 느껴졌다.

문제가 된 예제는 아래였다.

겉으로 보기에는 둘 다 0.5다. 그런데 Color 쪽은 생각만큼 밝아지지 않았고, Vector 쪽은 예상한 대로 더해졌다. 숫자는 같아 보여도 파이프라인 안에서는 같은 값처럼 취급되지 않았다는 뜻이다.

내가 정리한 기준

내가 붙잡고 있던 건 결국 이 부분이었다. Linear 프로젝트에서는 조명과 색 계산이 선형 공간 기준으로 이루어지고, 화면에 보이는 색은 다시 표시용 변환을 거친다. 그래서 눈에 중간 회색으로 보이는 값과 연산에 그대로 넣는 숫자를 같은 감각으로 생각하면 틀어지기 쉽다.

메모해 둔 기준은 이 정도였다.

Texture나 Color처럼 색을 의미하는 입력은 표시용 보정이 얽혀 있을 수 있다.
Vector처럼 숫자를 의미하는 입력은 연산값으로 읽는 편이 안전하다.
Shader Graph에서 색을 직접 더하거나 곱할 때는, 지금 만지는 값이 보이는 색인지 계산용 값인지 먼저 구분해야 한다.

정확한 내부 구현을 전부 뜯어본 건 아니지만, 실전에서는 이 정도만 알아도 훨씬 덜 헷갈렸다. Shader Graph에서 값이 이상하게 보일 때는 노드가 틀린 게 아니라, 내가 서로 다른 공간의 값을 같은 기준으로 보고 있었던 경우가 많았다.

메모

Linear 프로젝트에서는 Color Picker의 숫자를 곧바로 계산용 상수처럼 믿지 않는 편이 낫다.
셰이더에서 계수, 보정값, 마스크처럼 수치가 중요한 값은 Vector나 Float로 두는 편이 덜 헷갈린다.
반대로 최종 색 자체를 다루는 부분은 Color 입력으로 두되, 중간 계산이 기대와 다르면 색 공간을 먼저 의심하는 게 좋다.

결국 헷갈렸던 건 감마 이론 자체보다 이 숫자가 Shader Graph 안에서 어떤 값으로 들어오느냐였다. URP에서 색이 이상하게 더해질 때는 복잡한 이론부터 떠올리기보다, 지금 만지는 값이 색인가 숫자인가부터 다시 보는 편이 빨랐다.

참고 자료

URP Shader Graph에서 Saturation 없이 흑백 전환 만들기

Sat, 23 Mar 2024 00:00:00 GMT

Shader Graph를 만지다가 흑백 전환을 직접 한 번 만들어 보고 싶었다. 주제는 단순하다. 텍스처 한 장을 받아서 흑백과 컬러 사이를 슬라이더 하나로 오가게 만들기. Saturation 노드를 바로 써도 됐지만, 이번에는 굳이 풀어서 만들어 봤다.

문제

필요한 동작은 단순했다.

텍스처 입력 하나를 받는다.
슬라이더 값이 0이면 완전 흑백이 된다.
슬라이더 값이 1이면 원본 컬러가 그대로 나온다.
그 사이 값에서는 자연스럽게 보간된다.

이 정도면 Saturation 노드를 바로 써도 된다. 그래도 흑백 이미지가 실제로 어떻게 만들어지는지를 한 번 손으로 확인해 두고 싶었다.

이렇게 풀었다

구성은 두 단계면 끝난다.

원본 컬러를 grayscale 값으로 한 번 바꾼다.
grayscale과 원본 컬러를 Lerp로 섞는다.

흑백 값은 보통 RGB를 똑같이 더하지 않는다. 눈에 조금 더 자연스럽게 보이도록 채널마다 가중치를 다르게 준다. 여기서는 아래 비율을 썼다.

gray = R * 0.21 + G * 0.71 + B * 0.07

이렇게 얻은 gray를 RGB 세 채널에 다시 펴 주면 그레이스케일 컬러가 된다. 그다음에는 익숙한 Lerp(A, B, T)만 붙이면 된다.

A: grayscale
B: 원본 컬러
T: 0~1 슬라이더 값

실제로 셰이더 쪽 개념만 적으면 아래 한 줄에 가깝다.

result = lerp(grayscaleColor, originalColor, t)

왜 이렇게 했나

좋았던 점은 구조가 단순해서 디버깅하기 쉽다는 데 있다.

흑백 변환이 잘못됐는지
보간 값이 잘못됐는지
최종 출력이 잘못됐는지

세 부분을 따로 떼어 볼 수 있다. Saturation 노드를 바로 쓰면 더 빠를 수는 있지만, 한 번 직접 풀어 보면 흑백 전환이 결국 그레이스케일 생성 + 보간이라는 게 더 또렷하게 보인다.

메모

여기서 남는 건 복잡한 기법보다 사고방식 쪽이다. 이런 간단한 효과는 노드 하나를 외우는 것보다 원본 값을 어떤 형태로 바꾸고, 그 둘을 어떻게 섞는가로 이해하는 편이 오래 남는다.

비슷한 효과를 만들 때는 아래 순서로 생각하면 된다.

먼저 원하는 중간 표현을 만든다. 여기서는 grayscale이다.
그다음 원본과 결과를 Lerp로 섞는다.
마지막에 슬라이더나 마스크로 제어 범위를 붙인다.

Why Color and Vector Looked Different in URP Shader Graph

Sat, 23 Mar 2024 00:00:00 GMT

Linear URP project. Adding Shader Graph values, I paused. Color(0.5, 0.5, 0.5) added twice — didn't reach white. Same numbers through Vector got the expected result.

First thought: wiring mistake. The real issue was value semantics, not math. I'd been treating "visible color" and "computation number" as the same thing.

Where I got confused

In a Linear URP project, what you see on screen and what the shader computes don't map 1:1. Knew this in theory, but mixing Color and Vector nodes in Shader Graph made the gap feel larger.

The problem case:

Both say 0.5. But the Color side didn't brighten as expected. The Vector side added as predicted. Same-looking numbers, different treatment inside the pipeline.

How I sorted it out

The core issue: in a Linear project, lighting and color math happen in linear space. The color you see on screen goes through a display transform. So "what looks like mid-gray" and "the number you plug into a calculation" aren't the same thing.

Rules I wrote down:

Inputs that mean "color" (Texture, Color) may carry display correction.
Inputs that mean "number" (Vector) are safer to read as computation values.
When adding or multiplying colors in Shader Graph, first ask: is this a "visible color" or a "computation value"?

I didn't trace every internal implementation. But in practice, this distinction cut most of the confusion. When Shader Graph values look wrong, it's usually not the nodes — it's me treating values from different spaces as if they're the same.

Notes

In Linear projects, don't trust Color Picker numbers as direct computation constants.
For coefficients, correction values, masks — use Vector or Float. Less confusing.
For final color output, use Color input, but if intermediate math surprises you, suspect color space first.

What confused me wasn't gamma theory itself. It was "how does this number enter the Shader Graph." When URP colors add up wrong, checking "is this a color or a number?" first was faster than reaching for theory.