TL;DR:
My iPhone 16 Pro Max produces garbage output when running MLX LLMs. An iPhone 15 Pro runs the same code perfectly. A MacBook Pro also runs the same code perfectly. The tensor outputs on the 16 show numerical values an order of magnitude wrong. I suspect it points
I think you missed the point of his post. His issue is that the numeric operations the phone executes to run the LLM is producing garbage. Arguably this could break all kinds of neural networks, such as voice transcription. He’s not complaining that the LLMs are themselves unable to properly perform math.