Advantageous encoding
15 Nov 2021 | 6 minutes to readInformation is ethereal, and “disappears” with nothing to contain it. The method in which information is stored can not only influence our interpretation of the information, but what we are able to do with it. Special storage formats can allow us to more easily glean additional information from what is stored.
Following is a series of interesting or novel storage/encoding methods, in order of confidence.
Binary / Base 2 #
We take numbers (2
, 7
, 2893479283749283
) and encode them using 1
and 0
(on/off).
This format gives us a few benefits that base 10 numbers don’t.
- Easily tell if number is odd or even by looking at least significant bit (
0x11
= odd,0x10
= even). - Easily multiply/divide by 2 by shifting bits left/right. Faster than using the multiplier instruction (for integers at least)
- Data compression, can represent much larger numbers with less characters
0x0010010 |
18 |
x | - |
0x0001001 |
9 |
x/2 | » |
0x0100100 |
36 |
x*2 | « |
Polish notation #
Way of encoding mathematical operations so that it fits much better into a stack. As a result, made it much easier for early calculators to parse and perform math operations. It’s also special in that the order of operations is determined by the order of
Normal | Polish |
3+4 | +34 |
(3+2)-5/6 | -+32/56 |
https://prettyboytellem.com/writing/2019/Apr/HP-50G%20and%20RPN.html
Homomorphic Encryption #
This is a special method of encrypting data, such that mathematical operations can be performed on the encrypted data without needing to decrypt.
An example could be a datastore of population statistics. Homomorphic encryption would allow researchers performed queries against the data and not leak private information (if configured correctly).
Other methods of encoding to perform specific mathematical operations.
Types[^1]:
- Partially homomorphic encryption encompasses schemes that support the evaluation of circuits consisting of only one type of gate, e.g., addition or multiplication.
- Somewhat homomorphic encryption schemes can evaluate two types of gates, but only for a subset of circuits.
- Leveled fully homomorphic encryption supports the evaluation of arbitrary circuits composed of multiple types of gates of bounded (pre-determined) depth.
- Fully homomorphic encryption (FHE) allows the evaluation of arbitrary circuits composed of multiple types of gates of unbounded depth, and is the strongest notion of homomorphic encryption.
https://github.com/Microsoft/SEAL
https://people.csail.mit.edu/vinodv/FHE/FHE-refs.html
Floating point separation and computation #
Encode each side of the decimal point as a separate integer, then perform the operations on them separately before joining them back together into the floating point result.
Numbers less than 0 have different “behaviors” than integers,
Difficult math operations: roots and squares #
change the format based on the value?
Base 2 numbers allow us to square the number 2 easily.
2^3 == 8 == 0x10 shift left twice -> 0x1000
16 ^ 2 == 256 == 0x10h shift left once -> 0x100h
- encode number
n
in basek
to gete
- To compute
n * k
, shifte
left once
Goal: comput 352 * 16
1. | Have math you want to perform | n * k |
352 * 16 |
2. | Convert n to base k format |
n' * k |
0x160h * 16 |
3. | Shift n' once to multiply by k |
n' * k => n' << 1 |
0x160h << 1 => 0x1600h |
4. | Convert back to the target base (usually base 10) | 0x1600h => 5632 |
To multiply two numbers, reduce number to primes.
2253 * 18 => 2253 * (3 * 3 * 2)
Perform each conversion individually, using the output of the previous conversion.
2253 * (3 * 3 * 2)
6759 * (3 * 2)
20277 * (2)
40554 == 2253 * 18
Simple to see for small numbers, also probably faster than converting back and forth. Good for very large numbers, where it could be difficult to multiply together due to hardware constraints (numbers larger than registers).
The conversion base would usually be prime numbers, but it would also work with composites. Converting with base 4 could be used to perform the same operations as base 2, but twice as fast. Because 4 is twice 2, half as many shifts need to be made. To multiply by 4, you could either convert to base 4 and shift once or convert to base 2 and shift twice.
This could be used to optimize calculations, especially if the numbers are extremely large. Large bases like 64 might be nice so you don’t have to repeat base 2 shifts 5 times, just a base 64 shift once. I assume that smaller bases perform less “work”, but number can more easily be converted to and from that base. It’s much quicker to convert a number to base 2 than base 64, especially because of hardware properties.
Prime notation #
This would be an encoding that would signal if a number is prime or not. Encoded to tell if a number is prime by the way it is encoded. It would be similar to determining wether a binary-encoded number is odd or even based on the least significant bit.
Factor notation #
Encode numbers so that the factors of a number are included. This would allow a number to be factored much quicker because the hard work has already been done, a program simply needs to perform a lookup based on the number. While not that useful for smaller numbers, it may prove useful for operations performed on larger numbers.
“Extra information” storage, meta-encoding #
When storing values, pre-calculate properties about the value being stored, depending on it’s usage.
Store information like the value’s factors, version of the value so you don’t have to calculate them later, or make any calculations you run with the value perform quicker.
If you’re able to make a good guess about calculations that will take place over a set of data, then you can potentially start calculating them before it’s needed.
A good example could be yearly financial records for a company. Once the year is over, no additional data is being added for that year. The company will want to use that data in the future to analyze how they performed and finding takeaways from the data. At a minimum, different metadata could be generated to identify trends, averages, and other statistical information.
Storing the metadata alongside the raw data makes it much easier for a “consumer” program to analyze the raw data and take shortcuts generated by the metadata. By pre-computing on the data, we can save that from needing to be computed again in the future, as long as the calculations required are the same.
Other Computing shortcuts #
http://www.lomont.org/papers/2003/InvSqrt.pdf
Takes advantage of how floating point numbers are stored combined with magic number 0x5f3759df
to calculate an inverse square root - 1/sqrt(x)
.