An AI agent built a deployable 3D gallery of Paris monuments by chaining two Hugging Face Spaces without any human-written integration code. The project, published June 9 by Hugging Face developer Mishig Davaadorj, uses one Space for text-to-image generation and a second for Gaussian splat 3D reconstruction. The agent handled all cross-Space integration, format conversion, and deployment steps autonomously, per the official Hugging Face project documentation (https://huggingface.co/blog/mishig/spaces-agents-md).
Mitchell Hashimoto’s “building block economy” thesis argues AI excels at gluing proven pre-built components rather than building tools from scratch. This thesis previously focused on code libraries. Hugging Face is now extending this model to multimedia AI via a plain-text agents.md file included with every Gradio Space, per the official project documentation (https://huggingface.co/blog/mishig/spaces-agents-md).
The agents.md file documents four exact fields an agent needs to operate the Space: the HTTP call endpoint, polling template for async tasks, file upload path, and authentication hint using the Bearer $HF_TOKEN format. No custom client library or hardcoded integration is required for an agent to interact with the Space, per the official project documentation (https://huggingface.co/blog/mishig/spaces-agents-md).
The Hugging Face Hub hosts thousands of state-of-the-art open-weights models, most of which are deployed as interactive Gradio Spaces. The agents.md convention turns each of these Spaces into a documented, callable building block that agents can discover and compose autonomously, per the official project documentation (https://huggingface.co/blog/mishig/spaces-agents-md).
This dynamic mirrors workflows used by developers running parallel AI sessions with git worktrees to isolate context. The GitHub Copilot app defaults to worktrees precisely because they support parallel work by both human developers and AI agents across isolated task contexts, per GitHub’s official Copilot documentation (https://github.blog/ai-and-ml/github-copilot/what-are-git-worktrees-and-why-should-i-use-them/).
The Paris gallery pipeline chains two Spaces in a sequential workflow. First, an image-generation Space turns text prompts for each Paris monument into clean, dark-background “specimen” style images. For example, it produced a diorama rendering of the Eiffel Tower on a plinth for the gallery, per the official Hugging Face project documentation (https://huggingface.co/blog/mishig/spaces-agents-md).
Second, the VAST-AI/TripoSplat Space reconstructs a 3D Gaussian splat file in .ply format from each single input image. The agent then performs four specific glue steps to finalize the gallery, per the official project documentation (https://huggingface.co/blog/mishig/spaces-agents-md).
The agent first detects TripoSplat’s default Y-down coordinate output and flips all splats upright to correct orientation. It then auto-frames each monument for consistent camera angles across the full gallery, per the official project documentation (https://huggingface.co/blog/mishig/spaces-agents-md).
Next, it compresses .ply files to .ksplat format, cutting file sizes by roughly 3x to enable fast browser loading. Finally, it builds a custom Three.js viewer with scroll-to-switch and drag-to-rotate user controls, per the official project documentation (https://huggingface.co/blog/mishig/spaces-agents-md).
Several of these steps emerged from the agent reacting to physical constraints of single-view 3D reconstruction, rather than pre-written human instructions. For example, a wide glass structure like the Louvre Pyramid produces poor, fragmented splat outputs when rendered from a single angle. A thin vertical structure like an obelisk renders as a dull, low-detail model due to limited visible surface area in the input image, per the official project documentation (https://huggingface.co/blog/mishig/spaces-agents-md).
The human project lead provided only three specific taste-level guidance prompts during development: “make it zoomed out,” “replace the obelisk,” and “the transition lingers too long.” This outsourced R&D loop, where an agent handles all technical implementation and a human provides only high-level creative direction, is exactly the workflow the building block economy model predicts, per the official project documentation (https://huggingface.co/blog/mishig/spaces-agents-md).
Once the initial Paris gallery pipeline was built, spinning up new location-specific galleries required roughly one sentence of prompting each. A single prompt to create a similar Space with splats for Japan produced six monument images, six corresponding 3D splats, compressed .ksplat files, a Three.js viewer, and a fully deployed static Space, per the official project documentation (https://huggingface.co/blog/mishig/spaces-agents-md).
The Japan gallery includes six monuments: Tokyo Tower, Himeji Castle, Kinkaku-ji, Osaka Castle, Great Buddha of Kamakura, and Itsukushima torii. An identical prompt for Egypt generated a gallery of six monuments: the Great Pyramid, the Sphinx, Abu Simbel, Tutankhamun’s funerary mask, the Karnak temple complex, and the Colossi of Memnon, per the official project documentation (https://huggingface.co/blog/mishig/spaces-agents-md).
The same two underlying Spaces and identical agents.md contract were reused for both new galleries with no custom integration code added. This composability aligns with broader trends in agent-driven development workflows, per the official project documentation (https://huggingface.co/blog/mishig/spaces-agents-md).
Slash commands for CLI-based AI agents like GitHub Copilot CLI let developers scope agent context to specific repositories via the /cwd command. They also let developers resume prior agent sessions with the /resume command, per GitHub’s official Copilot CLI documentation (https://github.blog/ai-and-ml/github-copilot/github-copilot-cli-for-beginners-overview-of-common-slash-commands/). These tools are built to support the same parallel, multi-repo development reality that git worktrees enable for both human and AI collaborators, per GitHub’s official documentation (https://github.blog/ai-and-ml/github-copilot/what-are-git-worktrees-and-why-should-i-use-them/).
The primary barrier to using state-of-the-art image, video, text-to-speech, or 3D models has never been the model itself, but the integration work required. This includes configuring SDKs, downloading weights, provisioning GPUs, matching input formats, and building polling logic for async tasks. The agents.md convention collapses this integration surface to a single standardized HTTP contract, per the official Hugging Face project documentation (https://huggingface.co/blog/mishig/spaces-agents-md).
An agent trained to read one agents.md file can operate any Gradio Space with the same convention. This eliminates the need for custom integration code for each individual model. The Paris gallery is a minimal proof of concept for this model: it uses two existing Spaces, zero custom integration code, and produces a fully deployable, browser-loadable multimedia experience, per the official project documentation (https://huggingface.co/blog/mishig/spaces-agents-md).
For builders looking to adopt this workflow, the project outlines four specific actionable steps. First, deploy models as public Gradio Spaces to make them agent-addressable primitives instantly. Second, add a valid agents.md file to each Space that documents the full HTTP schema including endpoint, polling, upload, and auth details, per the official project documentation (https://huggingface.co/blog/mishig/spaces-agents-md).
Third, curate pre-built chains of complementary Spaces, such as image-to-3D, text-to-video-to-audio, or retrieval-to-generation pipelines. Fourth, design Spaces to support agent discovery so agents can find and compose them autonomously, per the official project documentation (https://huggingface.co/blog/mishig/spaces-agents-md).
The completed Paris gallery is deployed as a static Space at mishig/monuments-de-paris, loadable in any standard web browser with no client-side GPU required. The end-to-end workflow required no custom integration code written by a human developer: the agent handled all cross-Space communication, format conversion, compression, and viewer assembly autonomously, per the official Hugging Face project documentation (https://huggingface.co/blog/mishig/spaces-agents-md).
