Parsing a Solana Transaction
Accounts store data for a specific point in time. To get any kind of historical data you need to parse previous transactions which will contain their instructions.
You've got the following hierarchy:
Message = Instruction[]
Signatures = Pubkey[]
Transaction = Signatures + Message
Each program is stateless and accounts are passed in. A message can contain arbitrary instructions which are just raw bytes and will simply be handed to whatever programs and expected to be decoded somehow.
Bytes are 8 bits of information, a bit is either a 1
or a 0
.
Solana operates on binary (base 2) numbers so it wants the binary representation:
1010 1111
These bytes (numbers) make up the byte code (many numbers) are read by the Solana virtual machine. The specific format is a custom version of eBPF called Solana Bytecode Format (sBPF).
Now, storing and transmitting binary data can be improved by encoding it using Base58. Instead of representing data in binary (which can be long and hard to read), we can convert it to something more compact and human-friendly, like Base58.
Base58 is a textual encoding that uses just 58 characters:
123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz
It excludes characters that are easily confused (like 0 vs O, or I vs l) to make it easier for humans to read and copy.
Let’s say we have a byte in binary: 10101111
(which is 175
in decimal)
To encode this in Base58, we treat the byte as a number and convert it directly. 175
in Base58 becomes 3m
.
So:
Binary: 10101111 -> 8 bits
Decimal: 175
Base58: 3m -> 2 characters, more compact and easier to handle
While Base58 doesn’t map directly to fixed binary chunks like hexadecimal does (hex maps exactly 4 bits per digit). It’s much more efficient for encoding large binary blobs, especially for things like Solana addresses or content hashes. It reduces the size of binary data and avoids ambiguous characters, making it ideal for human use.
Each instruction specifies:
- The program ID (smart contract handling it)
- The accounts involved
- The instruction data
We group these instructions up and call it a message. Add some signatures and baby you got a stew going (a transaction).
When we talk about stored data on Solana we really mean accounts. All data is stored inside accounts which hold up to 10,240 bytes of information. Each byte is 8 bits (1's and 0's). Accounts act as buffers (calloc
on the blockchain).
The instructions in contrast are ephemeral. The only reason we can read them is because the validator RPC nodes are storing the information contained in those transactions. Instructions don't store their arguments on chain, they are spit out in transaction logs which validators hold for a short period of time. Only archive nodes have the capability to go further back in time.
These accounts and instructions can hold anything as binary (represented as Base58), which means the interfaces are all opaque. Unless you know ahead of time what you are looking for and what it looks like, you won't be able to decode it.
However, things are not quite so dire, because most programs that want to be read are written with a framework called Anchor, which is like the Ruby on Rails of Solana. This framework stores a discriminator at the start of each instructions data.
The discriminator for an instruction is the first 8 bytes of the Sha256 hash of the prefix global plus the instruction name.
So we hash the global::TheFunctionName
and get back a hash:
sha256("global:initialize")
Would produce:
The discriminator The data
v v
[af af 6d 1f 0d 98 9b ed] [d4 6a 95 07 32 81 ad c2 1b b5 e0 e1 d7 73 b2 fb bd 7a b5 04 cd d4 aa 30]
The first 8 bytes can be translated back into their binary data:
af = 175
af = 175
6d = 109
1f = 31
0d = 13
98 = 152
9b = 155
ed = 237
Which would match an IDL:
{
"instructions": [
{
"name": "initialize",
"discriminator": [175, 175, 109, 31, 13, 152, 155, 237],
...
}
]
}
Anchor programs use Borsh encoding/decoding which is a way of storing binary data to and from a programming environment like Rust.
So lets say you are parsing transactions for pump.fun.
You filter for the transactions you are interested in, either listening on chain with websockets, or by crawling through an RPC node.
Typically the information stored in these accounts (at least the ones you will be interested in reading) translate.
When a transaction comes in, you enumerate the instructions and read the first 8 bytes.