Solana accounts

All addresses on Solana uniquely identify an account. They are just buffers for arbitrary data which is stored inside their data field.

struct Account {
  lamports: u64,
  owner: PublicKey,
  executable: bool,
  data: &str,
  rent_epoch: u8
}

An account is a claim to a specific size data storage on the blockchain.

Maximum data is 10MB. Each account pays for this privilege of storing data. As an incentive, paying for 2 years of rent waives any more payments.

When you create your account you tell the blockchain how much space it will need. It is fixed but can be adjusted using realloc.

We can access the data from an account by asking an RPC node to fetch it for us. We would provide the public key of the account.

Anyone can read account data, but only an accounts designated owner can modify anything.

This means that if we ask a program to modify one of the accounts it owns, it will only agree to it if the authority (owner) for that account signed the transaction.

Programs (the smart contracts) are pure, they hold no state. We give them a list of accounts which actually hold the data when we want to call them.

To create an account you first generate a private/public keypair. Then you need to register that account to the blockchain by invoking the create account instruction on the system program.

In addition, you will authorize the system program to debit an account for the new account's rent, ideally paying for 2 years to become exempt from paying any more later.

Creating an account

When you want to create a new account on Solana you do two things:

Create the account and allocate its space on the Blockchain
Initialize the account with its initial data, done by the owner

To create our account first we need to announce it to an RPC node and pay for its rent. We do all this through some kind of client.

A client is anything that is talking to the blockchain but originates off chain. It is a client of the Solana blockchain. Usually these are written in a higher level language like Javascript, but there are plenty of Rust clients as well.

Here is what it looks like to create an account with Javascript:

describe("Create account", async () => {
  const connection = new Connection("...", "confirmed")
  const payer = createKeypairFromFile("...")
  const newAccountKeypair = Keypair.generate()

  it("Creates the account", async () => {
    const instruction = SystemProgram.createAccount({
      fromPubkey: payer.publicKey,
      lamports: LAMPORTS_PER_SOL,
      newAccountPubkey: newAccountKeypair.publicKey,
      programId: SystemProgram.programId,
      space: 0
    })

    const signers = [payer, newAccountKeypair]
    const transaction =  new Transaction().add(instruction)

    await sendAndConfirmTransaction(connection, transaction, signers)
  })
})

The SystemProgram.createAccount is a function that is creating an instruction for us. We set the programId to be the system program.

This will set the owner of this account. This owner is important and will decide who can actually mutate the underlying data it holds, as well as who can send lamports (sol) from the account.

This example is using @solana/web3.js which is the most common library for building clients. Unfortunately Solana is going through a period of change to a newer library @solana/kit which promises more efficient tree-shaking for clients (resulting in a smaller javascript bundle).

Here is what it looks like on kit:

describe("Create account", async () => {
  const { rpc, rpcSubscriptions } = createDefaultSolanaClient()

  it("Creates the account", async () => {
    // Create signers.
    const [payer, mint] = await Promise.all([generateKeyPairSigner(), generateKeyPairSigner()]);

    // Create the instructions.
    const createAccount = getCreateAccountInstruction({
      payer, // <- TransactionSigner
      newAccount: mint, // <- TransactionSigner
      space,
      lamports,
      programAddress: TOKEN_PROGRAM_ADDRESS,
    });
    const initializeMint = getInitializeMintInstruction({
      mint: mint.address,
      mintAuthority: address("1234..5678"),
      decimals: 2,
    });

    // Create the transaction.
    const transactionMessage = pipe(
      createTransactionMessage({ version: 0 }),
      (tx) => setTransactionMessageFeePayerSigner(payer, tx), // <- TransactionSigner
      (tx) => setTransactionMessageLifetimeUsingBlockhash(latestBlockhash, tx),
      (tx) => appendTransactionMessageInstructions([createAccount, initializeMint], tx),
    );

    // Sign the transaction.
    const signedTransaction = await signTransactionMessageWithSigners(transactionMessage);

    // Create a send and confirm function from your RPC and RPC Subscriptions objects.
    const sendAndConfirm = sendAndConfirmTransactionFactory({ rpc, rpcSubscriptions });
     
    // Use it to send and confirm any signed transaction.
    const transactionSignature = getSignatureFromTransaction(signedTransaction);
    await sendAndConfirm(signedTransaction, { commitment: "confirmed" });
  })
})

Kit tends to be a lot more verbose, but its usually expected you would combine these individual functions into your own more familiar utility functions.

Like we said before, accounts are actually buffers. create_account is basically calloc on the blockchain. So it just allocates memory for an array of X objects of Y size, and initializes all bits zero.

There is no required structure for the account data, it is just an array of bytes of a specific size.

Bytes are a chunk of bits (1's and 0's) of a fixed size, in this case 8 bits in a single byte. Data inside of accounts is stored as an array of these bytes.

let number: u32 = 42;
// Convert to little-endian byte array
let number_bytes = number.to_le_bytes();

This would give us an array of 4 bytes.

Why 4? Well it was a 32 bit number. Each byte is 8 bits.

So 8 * 4 = 32!

Here is what it would look like in bytes:

[0x2A, 0x00, 0x00, 0x00] (Little Endian)

Little endian means that the least significant byte is stored first.

                           Behold our number!
42 as a `u32`:             v
00000000 00000000 00000000 00101010

Accounts are owned by programs and give the owner program permission to mutate it. How do you own an account if its actually the program?

That's why all human accounts (like wallets) are owned by the system program. The system program can create more accounts and send lamports. Accounts can only be assigned a new owner once. This owner is always a program, with permission to control the flow of data and lamports.

Clients use private keys to sign transactions which mark accounts as signed during program execution. This signed state does not have any special semantics in the runtime. Its up to the program to give this signed status meaning.

For example, the SPL Token Program:

Each token account is owned by the SPL program
Only the token program can change values
In each token account, SPL stores a field for the address of the authority account which can spend these tokens
When you transfer you provide a signature which corresponds to the authority field
The program checks during execution if the signature is right and allows the transfer if it is
The runtime checks that the SPL program owns the token account

You create an account with a declared size in bytes and you can store arbitrary binary data in it. They can also store lamports (smallest amount of Sol).

Account addresses are ed25519 pubkeys. Authority is a fancy way of saying the owner of the private key and is a confusing second meaning of owner.

When you sign a transaction using an accounts private key, that account is marked as a signer by the program runtime. Other programs can use this information to implement ownership and authority functionality.

Accounts can be created with an allocated size of 0, which means it will store no data. Which can let you use them only for authority and ownership functionalities.

Programs are just data stored in accounts. These accounts are flagged as executable and ownership is transferred to an ebpf loader program.

Accounts are usually used to store some kind of data. They hold arbitrary bytes but most developers will be using some kind of standard serialization method like Borsh.

#[derive(BorshDeserialize, BorshSerialize, Debug)]
pub struct AddressInfo {
  pub name: String,
  pub house_number: u8,
  pub street: String,
  pub city: String
}

Borsh is an encoding for binary objects. Its how we go from bytecode back into something Rust is going to understand.

You would take these types and use them to retrieve the data from the account:

pub fn decode(
  program_id: &Pubkey,
  accounts: &[AccountInfo],
  address_info: AddressInfo
) -> ProgramResult {
  // ...

  address_info.serialize(&mut &mut address_info_account.data.borrow_mut()[..])?;

  Ok(())
}

On the client you would need to also serialize the Borsh representation:

describe("Account data", () => {
  const connection = new Connection("http://localhost:8899", "confirmed")
  const payer = createKeypairFromFile("...")
  const PROGRAM_ID = new PublicKey("...")
  const addressInfoAccount = Keypair.generate()

  const AddressInfoSchema = new Map([
    [AddressInfo, {
      kind: "struct",
      fields: [
        ["name", "string"],
        ["house_number", "u8"],
        ["street", "string"],
        ["city", "string"]
      ]
    }]
  ])

  class Assignable {
    constructor(properties) {
      Object.keys(properties).map((key) => {
        return (this[key] = properties[key])
      })
    }
  }

  class AddressInfo extends Assignable {
    toBuffer() { return Buffer.from(borsh.serialize(AddressInfoSchema, this))}

    static fromBuffer(buffer: Buffer) {
      return borsh.deserialize(AddressInfoSchema, AddressInfo, buffer)
    }
  }

  it("creates the address info account", async () => {
    const instruction = new TransactionInstruction({
      keys: [...],
      programId: PROGRAM_ID,
      data: (
        new AddressInfo({
          name: "Joe C",
          house_number: 123,
          street: "Fake Street",
          city: "Swagsville"
        })
      )
    })

    const transaction = new Transaction().add(instruction)
    const signers = [payer, addressInfoAccount]

    await sendAndConfirmTransaction(connection, transaction, signers)
  })

  it("reads the account data", async () => {
    const accountInfo = await connection.getAccountInfo(addressInfoAccount.publicKey)
    const readAddressInfo = AddressInfo.fromBuffer(accountInfo.data)

    console.log(`Name : ${readAddressInfo.name}`)
  })
})

PDA (Program Derived Addresses)

A PDA address just looks like a public key (the type in Rust is a lie). They don't have any corresponding private key. Solana lets the program that derived the PDA "sign" during cross-program invocations using invoke_signed.

If you are trying to initialize an account that lives at a CPI, you have to do it via CPI, rather than in separate instructions issued from a client.

PDAs are usually used as a way for programs to own mutable accounts and easily sign for them during transactions.

If you want your program to be able to directly mutate an account's data, then it should be the owner.

Lets say you wanted to store a count for the number of users of your app. Normally someone would need to sign to affect the account that holds this data.

Instead of having a human sign, we can make a program the owner of the account. That way it can affect and make changes without human intervention.

To do this we derive an address that can be found deterministically from a set of seeds:

               --- The seed
               v
Address = hash(["some_string" + "another_string"])

These accounts, by design, don't have private keys. Instead, programs can use these seeds to algorithmically sign for transactions and modify data in accounts that are PDAs.

So PDAs are a special kind of account that a program can sign for without a private key. Instead, they can only be signed by a program. These provide authority and ownership capabilities for your programs.

They enable things like:

Allowing programs to own tokens
Allow a token vault where only the program can withdraw from the vault

A PDA is created by generating an ed25519 pubkey not on the curve. This means not every combination of seed and program ID is usable. There are built in functions to take a nonce (the bump) and keep decrementing it until you get an invalid pubkey.

To have a program transfer lamports to a user account, you can have the program sign for its own PDA account and use invoke_signed on the system program.

This is in contrast to how you would usually do CPI using just invoke with your data. When you sign with a PDA using invoke_signed you are using the PDAs seeds, including the bump to sign.

A common pattern is to use addresses derived from a namespace and a user pubkey for efficient key/value mappings, keyed off the user.

A PDA is derived using:

A program ID (the controlling smart contract).
A set of seeds (arbitrary byte arrays).
A bump (a single-byte value, 0–255).

pda = Pubkey::create_program_address(seeds + bump, program_id)

When you use the PDA to interact with an owned account you can directly borrow mutable access to the lamports of an account using try_borrow_mut_lamports instead of the normal CPI invocation.

**ctx.accounts.recipient.to_account_info().try_borrow_mut_lamports()? += amount

Bumps

When you use seeds to derive a public key, there is a chance that the seeds you use, and the public key it derives from them could in theory have an associated private key. If it has a corresponding private key it is considered on curve.

So, Solana tacks on an additional integer (the bump) to your seeds list to make sure it bumps it off of the ed2559 elliptic curve.

The tricky part is you have to manage this bump throughout the program so its good to stick it inside the account data.

Bumps are deterministic values used to find program-derived addresses (PDAs) without requiring a private key. They ensure that a PDA is valid (i.e., does not collide with an actual keypair) while allowing the same address to be derived consistently.

In practice, you can think of your seeds + bump as Solana's version of a hash dictionary:

[b"token", authority.key().as_bytes()]

In this seed we are using a string token and the authority of the transaction to create a unique pairing. This could get us the token address for this particular authority if we initialized an account with that seed.

By knowing the seed and the bump you can deterministically recreate the address without having to store the address anywhere. We can find any users account by reusing the same seed.

So if you wanted to say, store a big list of addresses you could use these seeds to quickly lookup the address you need without knowing it (storing it) ahead of time, which can be expensive and limiting on the blockchain.

Finding these addresses from seeds is an iterative process, where we start at a bump value of 255 and work our way down to zero until we have successfully first bumped off the curve:

let seeds = &[b"my_seed", user_pubkey.as_ref()];
let (pda, bump) = Pubkey::find_program_address(seeds, &program_id);

b"my_seed" is a static seed.
user_pubkey ensures uniqueness per user.
bump is found automatically and returned by the iterating function

It can be tedious to derive these every time so a useful pattern is to store the bump inside of the account you create so we do not need to re-fetch these every time when we want to sign a transaction.

pub struct MyAccount {
    pub bump: u8,
    pub data: u64,
}

Storing the bump allows validation later when signing transactions in frameworks like Anchor.