🌻 Sunflower Quantized Inference
The Sunflower models are available in 14B and 32B sizes, supporting 8-bit and 4-bit quantized inference. These models are capable of high-quality translation, text generation, and conversational tasks, optimized for efficient performance on GPUs with limited memory.8-bit Quantization
Balanced memory usage (~16GB for 14B) with high accuracy.
4-bit Quantization
Low memory usage (~10GB for 14B) and faster inference speeds.
Model Variants
Sunflower 14B
- 14B 8-bit: Balanced memory and accuracy, suitable for most GPUs.
- 14B 4-bit: Optimized for memory-limited GPUs and faster inference.
Sunflower 32B
- 32B 8-bit: High accuracy, requires significant VRAM.
- 32B 4-bit: Reduced memory usage, slightly lower accuracy but faster.
Best Practices
- Monitor GPU memory for large inputs or batch processing.
- Adjust inference parameters (sequence length) to fit your hardware limits.

