Research Report | Sub-5M LLM Optimization

The Spark Revolution

Pushing the boundaries of ultra-small language models through Llama architecture and deep convergence training.

Model Parameters 4.98M Llama Architecture (Tied)
Final Eval Loss 3.108 Step 15000 / Epoch 1.42
Inference RAM ~50 MB FP16 Execution
Train Epochs 1.42 15000 Steps
Current Version v4 Fourth version of Spark
Model type Base Pretrained base Llama model

Core Benchmark Progression

Loss Convergence

LAMBADA Perplexity (Log)

Detailed Comparison

Metric Spark v2 (2.17M) Spark v3 (4.98M) Spark v4 (4.98M) Apex 1 Base (350M) Improvement (v3 → v4)
PIQA (Acc Norm) 0.5011 0.5310 0.5593 0.6507 +0.62%
HellaSwag (Acc Norm) 0.2475 0.2693 0.2695 0.3856 +0.07%
LAMBADA (PPL) 23,840,569 957.51 588.26 156 -38.56%
Final Loss 3.310 3.266 3.108 2.8008 -4.84%
Output quality 1/10 3/10 5/10 8/10 +2 points


Example outputs

Spark v2 2.17M

The main concept of physics is the most commonly used, and the magnetic dynamics is a good thing that is the most commonly used in some cases. The links of the physician is a liquid or therapies of the treatment.
The lines in the body is one of the most commonly used in the body, and someone else in the body.
The lines of physician is the only way to use it to identify the lines of physician that has been used to help the physician.
The first step of physician is to determine the physician.
The lines of physician is the most commonly used in

Spark v3 4.98M

The main concept of physics is the ability to build a new technique. The use of physics is an important technique in the future. In the past, the development of the technique can be used to generate a new technique based upon a method called anatomy. It is essential to know the technique and its possible use. These technique can be applied to a given material. It allows anatomy research to be made with the new technique.
Content on the use of chemistry is the form of a technique that is in the form of a physics. In particular, it is a technique used in the treatment of the material. It is used in the production and in the designation of a mechanical orientation, as it is used in the manufacturing of electronic equipment. The method is used in various applications, in the manufacturing process.
It is used to develop the technique that allows for the use of the physicist to

Spark v4 4.98M

The main concept of physics is that it gives some unlimited means to think about the universe. It helps us not only to think about how the universe is created but also how we think about the universe. In this way, an inner universe can be made to our own universe. This is because it is not a matter of fact and that is the object of the universe. It can take a lot of time to understand how it is created and why it must be made.
In the first place, the Universe is a complex and interesting part of it. It can be a kind of a real, creative, and universal part of our universe. It can be just that the universe was created. It could be a kind of universe. It can be a kind of kind of complex concept. That could be something that does something that really needs to be a kind of universe, or something that


Spark Model Series Comparison

Aspect Spark v2 (2.17M) Spark v3 (4.98M) Spark v4 (4.98M)
Params 2.17 million 4.98 million 4.98 million
Training data amount 100k docs 100k docs 500k docs (0.7B tokens)
Training epochs 5 5 1.42
Architecture GPT-2 Llama Llama
Framework nanoGPT HF Transformers HF Transformers
Status Done Done Done