Solana transactions

Transactions consist of 2 components:

A message
A list of signatures

A message is just a [recent_blockhash + list_of_accounts + list_of_instructions].

The most basic operating unit on Solana is an instruction.

Each instruction uses a certain amount of compute units (CU). By default an instruction can use up to 200k compute units, with a max limit that can be requested of up to 1.4 million.

The basic flow of a transaction is:

The user or app submits a transaction with one or more instructions to a node that accepts RPC requests.
The transaction is then forwarded, according to the leader schedule, to the next leader.
The leader validates the transaction, processes it, and includes it in a new block. This block is then broadcasted to all other validators who also validate and process the transaction.
During the transaction processing, instructions are executed by the previously deployed programs, and relevant accounts are modified accordingly.

When you read a transaction you will see a bunch of accounts listed at the top level. These are the accounts required for the entire transaction.

Then you will have a list of instructions, and in each instruction you will have a subset of the accounts, perhaps reordered that pertain to that specific instruction. In addition, there is the Data that is passed to the instruction itself, usually some bytes of encoded Borsh.

An instruction contains:

{
  program_id: ,
  accounts: [pubkey],
  instruction_data: [byte]
}

Programs process the instruction data to determine what actions to take on which accounts that have been provided in the instruction.

A message can only be a maximum of 1232 bytes.

Transactions are atomic: if a single instruction fails, the whole transaction will fail and have no effect on the global state.

You must forward-declare every account you intend to read from or write to, which may include:

The system program
The token program
Your account
Your token accounts
Any necessary program accounts
System variables like Clock

Accounts are passed in as an array so you need to get the order right. Usually this is abstracted away in the client to make this easier and less error prone.

Writing to an account requires authority, which is provided by signing the full message with the private key.

There are also non-signed accounts that can be writable, for instance if owned by the program your instruction calls.

Transactions are uniquely identified by its first message signature.

Finally a message contains a blockhash, acting the same as Ethereum nonces. This blockhash acts as a TTL requiring it be no more than 150 blocks old, which works out to about 1 minute 19 seconds (currently).

There is a special nonce mode to enable airgapped signing.

Most of the time we want to implement multiple different instruction calls in the same program. For example the SPL Token Program implements:

InitializeAccount
InitializeMint
Transfer

Each of these requires different call arguments. A common pattern is to encode a discriminator in the first byte of the instruction_data argument in the entrypoint function.

Then at the beginning of the program we decode the first byte which tells us which instruction the user is trying to call. With just one byte we can differentiate up to 256 different instructions.

We can encode the instruction arguments in the remaining bytes of the instruction_data byte slice.

Transactions have changed from their legacy days and become VersionedTransaction which is represented by v0.

The newer transactions add support for Address Lookup Tables. These tables let you go and lookup accounts you need instead of passing them directly into the instructions. These lookup addresses are stored inside an account and looked up at runtime.

Instructions have a limit of the number of parameters. Its undocumented but I've seen people complain around 18.

Instructions only have only have access to the instruction data, accounts and sysvars.

We can't pass accounts in as arguments to the instruction because the runtime needs to know the accounts before the transaction for parallel execution.

This is why even though the system programs, spl programs, etc are all effectively constants, we still need to pass them in.

Confirming transactions

Solana secures itself using a proof of history which requires a long chain of recursive SHA-256 hashes to build a trusted clock.

Blocks producers (validators) hash transaction id's into the stream to record which transactions were processed in their block.

next_hash = hash(prev_hash, hash(transaction_ids))

This becomes a trusted clock because each hash must be produced sequentially. Each block contains a blockhash and a list of hash checkpoints called ticks so that validators can verify the full chain of hashes in parallel, proving that time has actually passed.

Each tick is 12,500 hashes. A slot has 64 ticks (800k hashes) and an epoch has 432,000 slots.

When you submit a transaction, to keep the fees deterministic, its required to include a recent blockhash and if it is not recent enough, then the transaction is rejected.

Validators look up the corresponding slot number for each transaction's blockhash that they wish to process in the block. If this slot is 151 slots lower than the current slot number of the most recent block then it is rejected.

Slots are configured to last about 400ms but can fluctuate between 400-600ms. So a single blockhash is usually valid for about 60 to 90 seconds (~150 slots).

Transactions being dropped helps validators avoid processing the same transaction twice. It helps validators by only requiring them to check if a new transaction is in a small set of recently processed transactions.

A side effect of this is that validators only need about 150MB of memory to track transactions.

Forking

Solana doesn't wait for all validators to agree on a block before the next block is produced. It is common for two different blocks to be chained to the same parent.

We call each conflicting chain a fork. Validators need to vote on one of these forks to reach agreement on which one to choose. Only one fork will be finalized by the cluster and all competing blocks are discarded.

Because of this you should use the confirmed commitment level for your RPC requests. It is only a few slots behind the processed commitment and has a low chance of being owned by a dropped fork.

Roughly 5% of blocks do not end up being finalized by the cluster. So using processed there is a chance you will be dropped.

Using finalized will eliminate the risk of being dropped, but there is typically at least a 32 slot difference between the most recent confirmed block and the most recent finalized block. This would reduce the expiration of your transaction by about 13 seconds, but that could be greater during unstable cluster conditions.

Preflight

Sometimes you can run into issues with different RPC nodes lagging behind one another.

When an RPC node receives a transaction, it will attempt to determine the expiration block using the most recent finalized block or that selected by the preflightCommitment parameter.

A very common issue is that a transaction's blockhash was produced after the block used to calculate the expiration of that transaction. If the node cannot determine when your transaction expires, it will only forward the transaction one time and then drop the transaction.

The same thing happens when simulating transactions. The simulation would fail with a blockhash not found error.

Even if you use skipPreflight you should always set the preflightCommitment to the same commitment level used to fetch your transaction's blockhash for both sendTransaction and simulateTransaction.