Protect your Solana program accounts with a discriminator!

Thanks for reading; my name is Sergio Flores and I’m a Solana developer. Today, I want to talk about “Discriminators” and what problem they solve for us in the Solana stack.

First, a little bit of background; In Solana, programs have ephemeral memory; they only hold their internal state while execution happens, which can be a very small amount of time. Actual long term data storage is achived through the use of “Accounts”; which can be understood as “files”. That is, they have an address and may contain some data and the program can use them.

In Solana, the data is separate from the program, and when a program runs, it needs to know what data it is going to use, and you do that by providing it with the addresses of the accounts (that it owns, if it is going to write to them) that it needs to do its job.

Now, there are some rules to using these accounts:

They can hold a maximum of 10 Mb of data each.
Only the program that owns them can write to them.
The data itself they contain is public; anybody can read those bytes.
The valid number of potential addresses is very large.

THE PROBLEM

The problem is that, because of the way Solana works, when your program reads these accounts, it may receive a different account than intended. That is because, the code that decides what account to pass to the program and in what order, lies outside of the control of the chain, and it is subject to errors and malicious intent.

Let’s say your program executes two tasks, A and B, each expecting a data account named the same, A and B. Now if these accounts are of different byte size, the program’s code can reject the wrong one, but if they are the same size, we may have a problem if we switch them around.

Although the data in an account can be up to 10Mb in size, the Solana program can only read a small portion of it at any given point in time, around 30 Kb, which means it sometimes will need to do the reading in parts, opening the door for “interpretation” of the data.

For example, consider that our accounts A and B, are 8 bytes in size each. This could become a problem in the following way:

The accounts are owned by the program, so they start in zero, and only the program can write to them.
Program executes task A1 and writes 2 numbers, each 32 bits, to the 8 bytes available in data account A. These 2 numbers are meant to be counters, say, to keep track of the number of red items and the number of blue items, so they are positive integer numbers.
Account B is in fact supposed to hold a UNIXEPOCH timestamp in 64 bits (the number of seconds passed since Jan 1st, 1970).
Remember that the program does not have long term memory, so it has no way of knowing, as stated, if it has written to a given data account before.
In another transaction, program executes task B, and writes the timestamp to account B.
HACKER takes account B and sends it to the program’s read function A2. This function was expecting an account A, but since account B is the same size in bytes, and can be interpreted in a valid way for A’s expectations, and it is owned by the program, A accepts it.
Now A tries to interpret the UNIXEPOCH number as 2 separate smaller integers, and mean it to represent the counters of reds and blues!

Therefore, a clever hacker could find smart ways to manipulate the inputs and manage to produce accounts with the exact data they need to accomplish their purposes.

In the best case, the above produces unpredictable consequences, and very predictable ones in the WORST.

THE DISCRIMINATOR

A discriminator is a piece of information, stored as a number of bytes in the data accounts, that tells the program what type of account it is. Back to our example, our accounts would have 1 extra byte for this purpose.

Program task A1 receives an account, notices it is empty, and initializes it with a discriminator of value 1.
Task B will do the same, but with a value of 2.
So, when the hacker passes the account B to task A2, task A2 can look for the discriminator, and confirm that it is 1, to signify a data account of type A. Since it is 2, it can reject the operation and avoid the problem.

Anchor does this process automatically for you, using an 8 byte discriminator. However, when programming in plain Rust, you need to take care of this yourself:

First off, if your data accounts are of different sizes for each task, you don’t need a discriminator.
Second, if you only use a small number of accounts, your discriminator can be a single byte, with a very quick check and simple code.
Third, in addition, I think it’s also worthwhile to add “state” to the account, in the form of an extra byte on top of the discriminator, which tells the program what set of functions are allowed or not in the account at any point in time….but that is another topic.

Thank you again for reading, I hope you have found this post useful and see you next time!

Are you interested in learning more about this technology or explore how could it be applied to your particular needs? Please send me a message using the form below, for a no-commitment, exploratory talk about your situation.