EngineeringFeb 10, 202612 min read

Introducing Mosaic

Your tests pass. Your app still breaks. We built an in-app testing framework that runs inside your live application — zero mocks, direct store access, and the ability to catch the bugs that every other tool misses.

Your tests pass. Your app still breaks.

We built Orbit, an AI code editor, and we kept shipping bugs that no testing tool could catch.

The first one was session ID remapping. A user starts a conversation, and midway through the AI response stream, the backend reassigns the session ID. Our Zustand store updates. The streaming response keeps writing to the old ID reference. Messages appear in the wrong conversation. Our Playwright tests never saw it — they only check DOM text, not internal session IDs.

The second was sidebar entries vanishing. During an active stream, switching conversations triggers a store update that races with the streaming write. The sidebar re-renders, and the entry for the previous conversation disappears. Our Vitest unit tests pass because they mock the store — mocks don't race.

The third was checkpoint mutations being silently dropped. Orbit supports rewinding a conversation to a previous checkpoint. If you rewind while a stream is active, the Immer patches from the in-flight response collide with the checkpoint restore. Data gets silently dropped. No error is thrown. The UI looks fine until the user scrolls back and finds missing messages.

These aren't edge cases. These are the bugs that ship to production because the testing ecosystem has a fundamental blind spot.

The gap nobody filled

The frontend testing ecosystem splits into two camps that have never been unified.

Camp 1: State observers. Tools like Redux DevTools, LogRocket, and redux-logger run inside your application. They can see every state mutation in real time. But they have zero test execution capability. No assertions, no sequencing, no pass/fail results. They're observation tools, not testing tools.

Camp 2: Test executors. Tools like Playwright, Cypress, and Vitest have structured test runners with assertions and reporting. But they operate from outside the application's JavaScript context. Playwright communicates over the Chrome DevTools Protocol from a separate Node.js process. Vitest runs in an isolated jsdom environment with mocked state. They can click buttons and read DOM text, but they're blind to internal state transitions.

We didn't just assume this gap existed. We searched for solutions across npm, GitHub, Google Scholar, ACM Digital Library, IEEE Xplore, arXiv, and engineering blogs from Meta, Google, Netflix, Vercel, and Shopify. The closest web tools — Redux DevTools and LogRocket — score 4/7 on our criteria but have zero testing capability. The closest actual testing tool is Netflix's SafeTest at 3/7. The only conceptual precedent scoring 5/7 is Dear ImGui Test Engine, which is C++ only for game engines.

Two independent academic surveys confirm the gap. Bertolino et al. (2021) surveyed 80 field-testing papers across ACM and IEEE and found zero approaches targeting frontend JavaScript. A 2025 arXiv survey of 300+ web testing papers similarly found no in-app testing frameworks for frontend state management. The academic term for this approach — running tests inside the live application — is in-vivo testing, coined by Murphy and Kaiser at Columbia (2008). Their work targeted Java. Nobody built the JavaScript equivalent.

Until now.

Capability	Mosaic	Playwright	Cypress	Vitest	Redux DevTools
Runs in app context	Yes	No	Iframe	No	Yes
Direct store access	Yes	No	One-time	Mocked	Yes
Mutation tracing	Yes	No	No	No	Yes
Structured test runner	Yes	Yes	Yes	Yes	No
Zero mocks required	Yes	Yes	Yes	No	N/A

What Mosaic actually does

Mosaic is an in-app testing framework. Your test code runs in the same JavaScript context as your application. No separate process, no iframe, no Chrome DevTools Protocol, no jsdom. The test imports your stores directly and subscribes to mutations in real time.

Here's what that looks like in practice. This is a simplified version of our internal mega stress test — 18 steps across 3 phases with 4 rewinds:

// This runs inside the live application — same JS context
import { useChatStore } from '../stores/chat';
import { createStateRecorder, waitForAgentComplete } from 'mosaic';

const test = mosaic.define('send-verify-rewind', async (t) => {
  const recorder = createStateRecorder(useChatStore);
  recorder.start();

  // Phase 1: Send a message through the real app
  await t.step('send message', async () => {
    const { sendMessage } = useChatStore.getState();
    sendMessage('Write a hello world function');
  });

  // Phase 2: Wait for the real agent response
  await t.step('wait for completion', async () => {
    await waitForAgentComplete(useChatStore, { timeout: 30_000 });
    const { messages } = useChatStore.getState();
    t.assert(messages.length >= 2, 'should have user + agent messages');
  });

  // Phase 3: Rewind and verify state integrity
  await t.step('rewind to checkpoint', async () => {
    const { rewindToCheckpoint } = useChatStore.getState();
    rewindToCheckpoint(0);
    const { messages } = useChatStore.getState();
    t.assert(messages.length === 1, 'should have only the user message');
  });

  // Verify mutation history
  const mutations = recorder.stop();
  t.assert(mutations.length > 0, 'should have recorded mutations');
  t.assert(
    mutations.some(m => m.type === 'REWIND'),
    'should include a rewind mutation'
  );
});

Every line of this test runs inside the live application. The useChatStore import is the same Zustand store the UI renders from. sendMessage triggers the same IPC call to the backend. waitForAgentComplete subscribes to the store and resolves when the streaming response finishes. No mocks. No simulated delays. Real async operations interleaving with real state mutations.

The key capabilities that make this work:

In-app execution — test code shares the same window, the same event loop, the same module scope as the application.
Direct store subscription — call useChatStore.subscribe() to watch every mutation in real time, not just read a snapshot.
Mutation tracing — createStateRecorder() logs every state change with timestamps, action types, and before/after diffs.
Zero-mock execution — real IPC calls, real API responses, real file system operations through the app's own code paths.
Multi-phase test sequences — a step runner that bails on first failure, times each step, and produces structured pass/fail results.
Race condition detection — because the test shares the event loop, it observes the exact timing and interleaving that causes real race conditions.

How it compares

The difference is easiest to see with code. Here's the same "send a message and verify the response" test in Playwright vs Mosaic:

// Playwright — outside the process, reading DOM only
test('send message', async ({ page }) => {
  await page.fill('[data-testid="input"]', 'Hello');
  await page.click('[data-testid="send"]');
  await page.waitForSelector('.message-content');
  const text = await page.textContent('.message-content');
  expect(text).toContain('Hello');
});

// Mosaic — inside the process, reading state directly
test('send message', async (t) => {
  const { sendMessage } = useChatStore.getState();
  sendMessage('Hello');
  await waitForAgentComplete(useChatStore);
  const { messages, activeSessionId } = useChatStore.getState();
  t.assert(messages.length === 2);
  t.assert(activeSessionId === messages[0].sessionId);
});

The Playwright test verifies that text appeared in the DOM. The Mosaic test verifies that the message was stored in the correct session, that the session ID is consistent, and that the store is in the correct state after the operation completes. Playwright would pass even if the message appeared in the wrong session. Mosaic catches it.

For a deeper technical analysis of why this gap exists and what the academic research says about in-vivo testing, read our companion post: Why Frontend Testing Is Broken and What In-Vivo Testing Fixes.

Where we are

Mosaic started as an internal tool. We built it because we had no other option — the bugs we were hitting at Orbit couldn't be caught by any existing testing framework. After months of internal use across our product suite, we're now packaging Mosaic into a standalone framework.

We're being transparent about the status: Mosaic is under active development. The core runtime (in-app execution, store subscription, mutation tracing, step runner) is stable and battle-tested from internal use. We're currently building the public API surface, documentation, and framework adapters for Zustand, Redux, MobX, and Jotai.

Mosaic is built by Recursive Labs, the same team behind Orbit (AI Code Editor) and Phractal (AI chatbot). Beta access is coming soon.

Join the waitlist to get early access to Mosaic.

Be the first to test the in-app testing framework that sees what Playwright can't.

Request early access

FAQ

How do I test race conditions in React?

Traditional tools can't reproduce race conditions because they either mock the state (eliminating real timing) or observe the DOM from outside (missing internal state transitions). Mosaic runs inside your application's JavaScript context and subscribes to store mutations in real time, letting you observe and assert on the exact interleaving that causes race conditions in production.

How do I test Zustand stores without mocks?

Mosaic imports your Zustand store directly — the same instance your UI renders from. Call useChatStore.getState() to read, call useChatStore.subscribe() to watch mutations in real time. No mock setup, no test doubles, no re-creating state shape.

What is in-vivo testing?

In-vivo testing means running tests inside the live application rather than in an isolated environment. The term was coined by Murphy and Kaiser (2008) for Java applications. Mosaic is the first in-vivo testing framework for frontend JavaScript applications.

Playwright vs Cypress vs Mosaic?

Playwright and Cypress are E2E testing tools that drive the application from outside its JavaScript context. They verify DOM output. Mosaic runs inside the application and verifies internal state. They're complementary — use Playwright or Cypress for user-journey E2E tests, use Mosaic for state integrity, race condition, and lifecycle tests.

Does Mosaic replace Vitest?

No. Vitest is excellent for unit testing pure logic, utility functions, and component rendering in isolation. Mosaic targets a different layer: testing state management behavior, async operation interleaving, and multi-step workflows inside the running application. Use both.

What state management libraries does Mosaic support?

Mosaic works with any JavaScript state management library that exposes a subscribe API. We're building first-class adapters for Zustand, Redux, MobX, and Jotai. The core runtime is store-agnostic.

Can I test my app without mocking the backend?

Yes. Mosaic tests run through your application's real code paths — real API calls, real IPC, real WebSocket connections. This is how we catch bugs that mocks hide: timing issues, serialization errors, and response ordering problems that only appear with real backends.