Discussion about this post

User's avatar
Cyril de Catheu's avatar

Hey,

> Firstly, the gzip + KNN approach is lightweight and economical, especially when compared to cumbersome deep learning models.

as you mention later, it is not really economical given it requires a full KNN computation at each inference. In effect, this requires a lot of cpu and a lot of memory. I'm trying to reproduce the results but it seems even reproducing AG_NEWS takes around 24 hours.

> Unlike state-of-the-art models that necessitate training, this model is non-parametric — meaning it does not contain parameters that need to be learned, which significantly reduces costs.

yes, but this misses the idea that some compression algorithms do build a kind of ~parametric model~ when they build a compression dictionnary. The dictionnary is a form of learning and could be re-used at inference.

I tried to build a demo of such approach here:

https://github.com/cyrilou242/ftcc

it uses zstd to build compression dictionnaries.

Overall training + inference is multiple order of magnitudes faster than the gzip approach, and performance seems to be similar.

I'd be curious to have your feedback on this.

Expand full comment
Abhinav Upadhyay's avatar

I wrote a follow-up article on this which dissects various compression algorithms to analyze what exactly is happening during compression leading to unexpected performance in text classification: https://codeconfessions.substack.com/p/lz77-is-all-you-need

Expand full comment
9 more comments...

No posts