Solana Data Streaming: How to Power Your DApp with Real-Time Data

This guide explores the various approaches to streaming data on Solana, from native WebSocket methods to advanced solutions like Geyser plugins and specialized APIs.

Solana Data Streaming: How to Power Your DApp with Real-Time Data

Solana generates vast amounts of data through transactions, account updates, and other events as a high-throughput blockchain. Developers can stream this data from Solana nodes and other API services to build the most efficient, scalable, and responsive applications.

This guide explores the various approaches to streaming data on Solana, from native WebSocket methods to advanced solutions like Geyser plugins and specialized APIs. We'll examine when to use different streaming techniques, dive into available tools, and provide practical insights for implementing data streaming in your Solana applications.

What is Streaming?

Streaming refers to the continuous flow and processing of real-time data as it is generated. Streaming enables highly efficient, low-latency data pipelines and workflows by reorienting data ingestion and processing to handle increments of data in real time. These performance optimizations are essential for high-throughput and time-sensitive applications such as financial exchanges, IoT networks, and many modern web applications.

Streaming as a Data Processing Model

As a data processing model, stream processing often contrasts with batch processing, which involves collecting data over a period before processing it all simultaneously. While batch processing is useful for in-depth analysis and handling large datasets, it typically results in higher latency, as data is analyzed only after it accumulates.

For applications built on Solana, batch and stream processing can be appropriate for different use cases. Batch processing is best suited for scenarios where historical data analysis and large-scale aggregations are required. For example, a dashboard that analyzes a DeFi protocol’s transactions over the past year might use batch processing to aggregate the latest day’s data every 24 hours and add it to the report.

On the other hand, streaming data can be particularly advantageous due to the blockchain’s high throughput and applications' need for continuous, real-time processing of many on-chain events. 

For instance, for NFT trading platforms like Sniper, streaming ensures that NFT listings, delistings, and sales are processed in real-time across various marketplaces, as seen in our Case Study here. This enables users to observe and react immediately to changing market conditions. By leveraging streaming, applications on Solana can maintain responsiveness and scalability in handling the rapid influx of transactions and events.

When to Stream Data on Solana

Determining whether to use streaming for data retrieval on Solana revolves around the specific on-chain events your use case depends on and the responsiveness required. It is worth reviewing the different methods of retrieving event data on Solana and the respective design patterns those methods support.

Data Retrieval on Solana

Solana applications typically use Remote Procedure Call (RPC) to read and write data from the Solana blockchain. For a comprehensive overview of Solana RPC, see our previous blog post on Understanding Solana RPC. Solana’s native RPC methods use either an HTTP-based (request and response) protocol or the WebSocket (subscription) protocol.

RPC providers such as Syndica offer an elastic-node architecture that automatically scales RPC nodes and can load-balance requests among them. This architecture is specifically designed for Solana RPC.

There are many types of on-chain events that applications may need to process. They can generally be broken down into three categories:

  • Transactions and Logs: Transactions involve operations where accounts execute instructions, often outputting logs as a byproduct. Logging can verify specific operations or monitor state changes in real time.

Examples: The getTransaction HTTP method, for example, returns transaction details for a specific historical transaction. On the other hand, subscribing to the logsSubscribe WebSocket method allows clients to stream transaction logs in real time as they occur on-chain.

  • Slots and Blocks: Solana operates by processing blocks of transactions during short (~400ms) time windows called slots. Monitoring slot and block data is important for applications that need to track the progression of the blockchain, validate confirmations, or synchronize state changes.

Examples: The getSlot HTTP method returns the latest slot number confirmed by Solana’s validators. The WebSocket method slotSubscribe allows you to stream notifications for every slot as the validator confirms them.

  • Account Updates: Solana accounts can be updated frequently based on transactions. Keeping track of these individual updates can be critical for applications that rely on account state.

Examples: The getAccountInfo HTTP method returns the latest information associated with a specific account. The accountSubscribe WebSocket method will stream notifications when the lamports or data for a given account's public key change.

As you can see, among the native RPC methods on Solana, only the WebSocket RPC methods can be used for streaming on-chain events. However, it is essential to note that HTTP-based methods can also be used to listen to on-chain events using a polling design pattern.

Streaming vs. Polling Event Data

Polling can be better suited for scenarios that do not require low latency and can tolerate some delay. For example, consider the use case of monitoring a simple account update event, such as when a specific account’s SOL balance changes. You could use either a streaming or polling approach. Below, we contrast the two methods.

Streaming Approach

The accountSubscribe WebSocket method can receive notifications when a given account's SOL balance (or account data) changes. The client establishes a WebSocket connection to a Solana RPC node (or RPC provider API) and requests a subscription to use this method.

  • Pros:
    • Low latency: minimal time between event and notification.
    • Network efficiency: no redundant requests, potentially more accessible to scale up to higher throughput.
    • Event-driven: receiving events as they occur fits naturally into a larger event-driven architecture. This architectural pattern is common among modern applications built around microservices, where other services can consume the event data immediately upon arrival.
  • Cons:
    • Complexity: Managing the WebSocket connection lifecycle (opening, subscribing, closing, reconnecting, timeouts, etc.) is more complicated than making simple, one-off HTTP requests.
    • Scalability: Scalability can be limited depending on how the specific WebSocket method is implemented. For example, accountSubscribe can subscribe to only one account per subscription, and therefore, managing many simultaneous subscriptions for multiple accounts is required.
    • Interruption: WebSocket connections can be interrupted, introducing the possibility of missed notifications. Additional streams, verification, or backfilling may be necessary to make the stream fault-tolerant.

Polling Approach

Use the getBalance HTTP method to return an account's latest SOL balance. In a polling setup, the client periodically (e.g., every 30 seconds) issues a new HTTP request to check if the balance has changed.

  • Pros:
    • Simplicity: can be easier to implement and understand; involves simple HTTP requests and retries for intermittent errors.
    • Control: allows better control over when and how often the checks for data are made.
  • Cons:
    • Limited granularity: polling only captures the state at the specific slot when the request is made, which means any intermediate states between polling intervals won’t be captured.
    • Redundant requests: if the state of the account isn’t constantly changing, then the requests for some periods will be redundant.

Choosing a streaming versus polling approach depends on your application's requirements and how you wish to balance latency, complexity, and resource consumption.

In the context of the examples above, a trading bot may need to react as quickly as possible to every account balance change as it occurs over time, so streaming may be a better option. On the other hand, a front-end web component that displays a user’s wallet balance may require only the latest balance of the account within a reasonable interval, which makes polling the natural choice.

While the native WebSocket RPC methods allow for basic data streaming on Solana, they have major shortfalls, such as limited event types and scalability. To overcome these hurdles, developers might explore more advanced solutions like the Geyser plugin.

Geyser Plugins

What is a Geyser Plugin?

A Geyser plugin in Solana is a modular extension designed to streamline the flow of blockchain data handled by validators. The solana-geyser-plugin-interface enables the creation of plugins that redirect on-chain data directly from a validator client, including account information, slots, blocks, and transactions.

Developers can use the interface to create custom plugins that facilitate efficient and flexible data streaming to various external systems. The main drawback of employing Geyser plugins is their complexity in setup and maintenance. To effectively utilize a Geyser plugin, you must run it on a Solana validator node.

The Geyser Plugin Interface

The GeyserPlugin trait defines the interface for creating custom plugins that interact with the Solana validator runtime. Plugins implementing this trait can configure logging, handle initialization and cleanup tasks, and specify which data types they want to receive through enabling methods. Key callbacks provide information on account changes, transaction processing, PoH entry information, and block updates, enabling plugins to act on detailed blockchain data as it flows through the validator.

To maintain flexibility and future-proofing, the trait uses versioned data structures for accounts, transactions, and blocks, ensuring compatibility even as the underlying data formats evolve. Additionally, a set of standardized errors allows plugins to consistently handle common issues, such as configuration failures or data update errors.

A validator with the right Geyser plugin can provide real-time Solana data for all downstream services and applications.

Open Source Geyser Plugins

Below is a summary of notable open-source Geyser plugins and related tools, each offering unique capabilities for streaming and managing Solana blockchain data.

Yellowstone Dragon's Mouth (yellowstone-grpc): This provides a gRPC interface for Solana, allowing developers to stream real-time blockchain data from validators with advanced control. It supports multiple programming languages, offers advanced filtering, and includes features to maintain persistent connections.

solana-accountsdb-plugin-postgres: This streams account and transaction data directly into a PostgreSQL database, enabling developers to store and query blockchain data using SQL. Features include connection pooling, multi-threaded operations, batch inserts, customizable data selection, historical account tracking, and secure SSL connectivity. Provides schema management scripts and serves as a reference for creating custom Solana data integration plugins.

solana-accountsdb-plugin-bigtable: This streams Solana blockchain data directly into Google Cloud Bigtable, integrating real-time data with Google's scalable NoSQL database. Offers customizable account selection and optimizations for high throughput and low latency through connection pooling and data compression.

Waverider: This Solana Geyser plugin streams account data to a PostgREST server. PostgREST simplifies the process by providing a RESTful API over PostgreSQL, allowing data management via familiar SQL queries and HTTP requests.

solana-accountsdb-sqs Plugin: This streams Solana account and transaction data directly to Amazon SQS, integrating blockchain data into AWS services. Handles SQS payload size limits with data compression and offers advanced filters and real-time configuration changes via Redis without restarting the validator.

Digital Asset Validator Plugin: This plugin extracts and streams Solana blockchain data for digital asset management, serving as a core component of Metaplex's Digital Asset RPC API. It's message bus agnostic, uses FlatBuffers for efficient serialization, and minimizes the load on the validator for high-performance data streaming.

quic_geyser_plugin: This plugin enhances data streaming from Solana validators using the QUIC protocol and HTTP/3 for high-speed, low-latency access to real-time blockchain data. Outperforms traditional TCP methods and allows clients to customize data subscriptions with specific filters.

Holaplex Indexer: This combines on-chain and off-chain data processing to provide fast access to Solana blockchain information, especially for the NFT ecosystem. The indexer also consumes data via a Geyser plugin, processes it, and stores it in PostgreSQL, exposing enriched data through a GraphQL API.

solana-accountsdb-plugin-kafka: This streams Solana data into Kafka topics, leveraging Kafka's distributed streaming and processing capabilities. It offers advanced filtering and message wrapping for precise data control and operates efficiently without impacting validator performance.

If you want to integrate Solana data into your systems or enhance real-time data processing, exploring these plugins is an excellent starting point. However, remember that running a validator node and maintaining the infrastructure to efficiently support data transfer out of Geyser plugins will involve more hardware, development time, and maintenance. We recommend looking into API solutions for streaming data to simplify setup and focus time on building your core application.

ChainStream API

To recap, the standard methods of streaming data on Solana are either the native Solana WebSocket RPC methods or the Geyser plugin. However, both come with severe limitations: the native WebSocket methods permit only a limited range of events and notification formats, and Geyser plugins can involve substantially more work in terms of infrastructure setup and maintenance.

Developers can leverage custom APIs from Solana infrastructure providers to achieve the best of both worlds—ease, flexibility, and scalability. In this section, we'll introduce Syndica's ChainStream API, our highly reliable, high-throughput data streaming solution.

What is ChainStream API?

ChainStream API utilizes an RPC PubSub WebSocket, allowing applications to subscribe to and receive immediate updates on various blockchain activities, including transaction, slot, and block updates. Key features include:

  • Reliability: ChainStream ensures high reliability by consolidating updates from multiple validators. This aggregation process guarantees that no notifications are missed, making it particularly valuable for applications where high availability is crucial.
  • Scalability: The API is supported by Syndica’s best-in-class Web3 infrastructure. ChainStream scales using an elastic node architecture allowing applications to scale their streaming throughput easily.
  • Flexible Filtering: ChainStream supports subscriptions to different types of on-chain events and offers flexible filtering options for notifications. Developers can specify multiple all/any/none criteria for precisely filtering account keys within a single stream.
  • Real-Time Logging and Analytics: ChainStream users can take advantage of the Syndica platform for real-time logging and analytics. Developers can access detailed views of connection, subscription, and notification activity, making it easier to optimize application performance and understand activity.

How to Use Chainstream API

ChainStream supports subscriptions to the following types of notifications, as they are processed on-chain by validators:

  1. Transaction Notifications: stream full transaction data (instructions, balances, logs, etc.)
  2. Slot Notifications: stream updates to slots as they reach higher commitment levels (processed, confirmed, finalized)
  3. Block Notifications: coming soon, stream high-level block information (rewards, transaction count, etc.)

To create a subscription, customize the types of notifications you wish to receive using the JSON-RPC 2.0 specification.

For instance, this sample JSON-RPC request subscribes to transaction notifications with several filtering options:

{
	"jsonrpc": "2.0",
	"id": 123,
	"method": "chainstream.transactionsSubscribe",
	"params": {
		"network": "solana-mainnet",
		"verified": true,
		"filter": {
			"excludeVotes": false,
			"accountKeys": {
				"all": [
					"vines1vzrYbzLMRdu58ou5XTby4qAqVRLmqo36NKPTg",
					"4fYNw3dojWmQ4dXtSGE9epjRGy9pFSx62YypT7avPYvA"
				],
				"oneOf": [
					"vines1vzrYbzLMRdu58ou5XTby4qAqVRLmqo36NKPTg",
					"4fYNw3dojWmQ4dXtSGE9epjRGy9pFSx62YypT7avPYvA"
				],
				"exclude": ["BGsqMegLpV6n6Ve146sSX2dTjUMj3M92HnU8BbNRMhF2"]
			}
		}
	}
}

network: The initial parameter network specifies which Solana network (e.g., solana-mainnet) you want to subscribe to for receiving updates.

verified: By setting verified to true, the API will only send notifications verified by multiple validators. This option will introduce a one-slot delay in notifications but eliminate any noise in notifications from a single validator.

filter: Various optional filters allow you to customize the subscription:

  • excludeVotes: When set to false, vote transactions are included in the notifications. Setting it to true will exclude them.
  • accountKeys: This filter can be used to specify which account keys are of interest:
    • all: Requires all listed account keys to be present in a transaction.
    • oneOf: At least one of the listed account keys should be present.
    • exclude: Transactions containing the specified account keys will be excluded.

Logging and metrics for your ChainStream subscriptions can be easily accessed on the Syndica platform:

For complete examples, see the ChainStream API documentation: https://docs.syndica.io/platform/chainstream-api.

Conclusion

Developers building applications on Solana should start by familiarizing themselves with the available streaming options—from native WebSockets to advanced solutions like Geyser plugins and APIs such as ChainStream. Begin with simple methods like polling or basic WebSocket subscriptions to get your application running.

As your needs for scalability and real-time data grow, explore more sophisticated tools that offer higher throughput, flexibility, and reliability. The key is to align the complexity of your streaming solution with your application's evolving requirements, ensuring efficiency and ease of maintenance.

–

Syndica empowers principal enterprises and architects advancing the Solana ecosystem. In building the Cloud of Web3, we provide best-in-class RPC node infrastructure, Developer APIs, a robust end-to-end platform, and the next cutting-edge Solana client validator, Sig, written in Zig. Our team offers the dedicated support, reliability, and user-focused orientation for seamless integration with Solana, allowing you to focus solely on building your enterprise.

Get started quickly today for free at syndica.io, and check out our documentation here.