Sign in

Off-chain state: events, logs, and indexers

Your Solana program produces three kinds of output. Account data is what other programs read. Program logs are what off-chain consumers read. Transaction metadata records what happened. None of these are interchangeable. This lecture covers what each one is, what Anchor gives you for emitting events, why "emit an event for every important state change" is the rule, and how the data actually leaves the chain to reach your frontend.

The on-chain / off-chain boundary

Solana draws a sharper line between on-chain and off-chain data than the EVM does. Two facts make this true.

First, account data is the only thing programs can read from each other. There's no on-chain API for reading another program's logs, no way to subscribe to events at the program level, no shared memory between executions. If your program needs to react to something another program did, you read that program's account state. Logs and events don't enter the picture.

Second, account storage is expensive. Every byte you store costs rent at the rent-exempt rate. A 1KB account locks up about 7 million lamports. Storing the full history of every action a user has taken would mean creating new accounts indefinitely, which gets prohibitively expensive after the first thousand operations.

These two facts together create a strict division of labor. State that other programs need to read goes into account data. Everything else, the user-facing log of what happened, analytical data, historical traces, search-friendly indexes, lives off-chain and is built by reading the chain.

How data leaves the chain On chain — what your program produces • Account data readable on demand, structured, costs rent to store • Program logs written by msg!() and emit!(), free to write, not readable on-chain • Transaction data instructions, signers, accounts touched, success/failure 1. Direct RPC reads getAccountInfo getProgramAccounts getTransaction small-scale single-point queries 2. Third-party indexer Helius webhooks Triton Yellowstone SubQuery, The Graph production apps, paid service 3. Self-hosted your worker polls RPC decodes accounts/logs writes to your DB full control, cheap to start Off chain — what your application consumes • Frontend UI showing user balances, recent activity, leaderboards • Notifications when something the user cares about happens • Historical analytics and reporting that doesn't fit on chain • Search by content (find all positions over $1M, all NFTs in a collection, etc.) The choice is mostly economic: paid third-party = fast to start. Self-hosted = control and ownership of the pipeline.

Why "emit for every important state change" is the rule

Here's the rule that catches new Solana developers: anything you want to display, alert on, search by, or analyze later has to be emitted as an event. Reading the account data tells you the current state. It does not tell you how the account got there.

Concretely, suppose your staking program has a StakePosition account with amount and staked_at fields. An off-chain consumer can read the current state of any position by fetching the account. But the consumer cannot answer questions like:

  • When was this position opened?
  • Did this user previously open and close a position?
  • How many positions across the protocol opened in the last 24 hours?
  • What was the largest stake amount opened today?

The first question is partly answered by staked_at, but only because you happened to store the timestamp on the account. The other questions can't be answered from current state at all. Once a position is closed via claim, the account is gone. Its history is unrecoverable from the chain state.

The fix is to emit an event at every meaningful state change: position opened, position claimed, position closed via unstake. The events go into the transaction record, which validators keep, which indexers can read forever. Now any off-chain consumer with access to the historical event stream can reconstruct the full history of every position, even ones that were closed years ago.

There's a temptation to skip events because "I can always read the account." Don't. The moment your account closes or its fields change, the historical view is gone unless you emitted events. Add the event when you write the handler, before you forget what state changes matter.

Third-party indexers: Helius and Triton

The fastest way to get off-chain data into your application is to use a third-party indexer. Several services dominate the Solana indexing space.

Helius offers webhooks and an enhanced RPC. You can register a webhook with a list of program IDs or specific accounts, and Helius will push transactions touching those targets to your backend in near-real-time. Their parsed-transaction API decodes Anchor IDLs automatically, so events arrive as structured JSON rather than raw base64.

Triton's Yellowstone offers a gRPC streaming interface, also called Yellowstone gRPC, where you subscribe to filters on accounts, slots, transactions, or programs. You get a steady stream of updates over the wire and process them in your backend however you want. The interface is lower-level than Helius's webhooks but gives you more control and typically better throughput.

The Graph and SubQuery offer indexing frameworks where you define schemas and event handlers, and the framework runs the indexer for you against the chain. These are higher-level abstractions that work well for query-by-content patterns like "find all users with stake amount over X".

The trade-off across all of these is the same. You write your code against their interface and let them run the infrastructure. You pay them. You get fast time-to-launch in exchange for being a customer of their pipeline.

This is the right choice for most applications. The infrastructure to reliably index a Solana program in production is non-trivial, and renting it from a specialist is usually cheaper than building it yourself, especially in the early stages of an application's life.

Running your own indexer

What most tutorials skip: you can also run your own, and it's much less infrastructure than it sounds. You don't need to run a validator. You don't need a streaming gRPC plugin. The standard pattern is a small worker process that polls a regular RPC endpoint, decodes what it finds, and writes the result into your database. That's the whole thing.

The shape of a typical polling indexer:

  1. A worker runs on a schedule, say every 10 seconds.
  2. On each tick, the worker asks the RPC for whatever happened recently. This is usually some combination of getSignaturesForAddress for your program's transaction history, getBlock or getTransaction to pull full transaction data for the new signatures, and getProgramAccounts or getAccountInfo for current account state when you need a snapshot.
  3. The worker decodes the returned data using your program's IDL: parses out the events from each transaction's logs, decodes account data into typed structs.
  4. The worker writes the decoded results into your database, typically with an upsert keyed by signature or account address so reruns are idempotent.
  5. The worker remembers where it left off, usually by recording the last processed slot or signature, so the next tick picks up from there.

That's it. No validator. No plugin. No streaming infrastructure. Most production indexers run as a Node.js or Python or Go process behind a regular RPC endpoint, whether your own, a public one, or a paid provider like Helius or QuickNode, with a Postgres or similar database holding the results.

The polling interval is your knob. Every 10 seconds catches activity within a 10-second window of delay, which is fine for dashboards, analytics, and most user-facing features. Lower latency is possible by shortening the interval or by switching to a streaming connection if your RPC provider offers one, such as Helius webhooks or Triton's gRPC stream. The trade-off is that streaming is more code to maintain and often costs more, while polling at 10-second intervals is cheap and rarely needs touching.

There are three reasons protocols decide it's worth running their own indexer instead of using a third-party.

The first is unit economics. Once a third-party indexer bill starts adding up, replacing it with a worker process plus a Postgres database is usually much cheaper. The RPC requests to fetch the data are charged separately, but RPC pricing is generally much cheaper per request than indexer-product pricing.

The second is data sovereignty. Some applications need to guarantee they can index data without depending on any external indexing service. A regulated financial protocol, a forensics tool, or an internal analytics system for a company that doesn't want its query patterns visible to a third party are all examples. Your own indexer means no provider sees your queries or your stored data. You still rely on an RPC provider to read the chain, but the indexed view is yours.

The third is custom semantics. Third-party indexers index everything generically and let you filter. Your own pipeline can do custom processing inline: decoding specific Anchor IDLs and writing typed rows, joining account state with transaction logs at write time, computing derived values that your application needs but the generic indexer doesn't produce. The flexibility is real, even if you don't use it on day one.

The practical decision

For a typical application, the path is:

  1. Start with direct RPC. Just call getAccountInfo and getProgramAccounts from your frontend or backend. Works for small-scale apps with simple state. You'll outgrow it.
  2. Move to a third-party indexer for production. Helius or Triton, whichever fits your access pattern. You pay them, they give you reliable real-time data and good APIs. This carries most apps from launch through their first year or two.
  3. Build your own indexer when the bill or the limits start to bite. A polling worker against an RPC endpoint, writing into your own database. More code than option 2, far less infrastructure than people assume.

You don't have to commit to one path forever. The data shape stays constant across all three paths, meaning the events your program emits and the account structures defined by your IDL. The implementation underneath can change as your needs evolve. Picking the right tier for where you are right now matters more than picking the "correct" architecture up front.

The constant across all three paths is the events your program emits. Get those right, meaning comprehensive, well-typed, and emitted at every meaningful state change, and any of the indexing options will work. Skip events or emit them inconsistently, and no amount of fancy infrastructure on top can reconstruct what your program didn't tell anyone happened.