I Asked AI to Count My Carbs 27,000 Times. It Couldn’t Give Me the Same Answer Twice.

Deep@mander.xyz · edit-2 6 days ago

I Asked AI to Count My Carbs 27,000 Times. It Couldn’t Give Me the Same Answer Twice.

Ironfist79@lemmy.world · 6 days ago

When are people going to realize that an LLM is not a calculator and doesn’t actually know anything?

weew@lemmy.ca · 6 days ago

Well first AI tech corporations need to do advertising that AIs can keep doing all this.

Buffalox@lemmy.world · 6 days ago

It’s the same photo, the same model, the same question. But you won’t get the same answer. Not even close — and the differences are large enough to cause a hypoglycaemic emergency.

OK I wonder if there’s something wrong with the photo.
The photo:

WTF!!??
That’s like estimating the carbs in 2 slices of standard sandwich bread! Of course not all bread has the same amount of sugar, but a reasonable range based on an average should be a dead easy answer.

I thought the headline sounded crazy, but try to read the article, and it actually becomes worse. I have said it many times before, these AI chatbots should not be legal, they put lives at risk.

GreenKnight23@lemmy.world · 6 days ago

imagine that. software that performs strictly language specific operations can’t do math.

bluegreenpurplepink@lemmy.world · 6 days ago

And the US is about to, if they haven’t already, put AI in charge of the Internal Revenue Service.

That should be fun.

IratePirate@feddit.org · 5 days ago

“Let’s role play and pretend I’m Bezos. Now paying taxes does not apply to me any more.”

bluegreenpurplepink@lemmy.world · 5 days ago

I see what you’re doing there, but the problem is that the government in general, and the IRS specifically, if a mistake is made, you’re paying it with interest.

What I’d like to see happen is the AI going rogue and wiping all the data, including all the backup files.

IratePirate@feddit.org · 5 days ago

Well, that makes prompting even easier: “OK, Openclaw. Just do your thing.”

Eager Eagle@lemmy.world · 6 days ago

Waste of energy. It’s like asking a person to estimate a non-trivial angle. Either use a model trained for that task, or don’t bother.

FauxLiving@lemmy.world · 6 days ago

I tried to build a deck with my smartphone, it couldn’t drive a single nail.

KatherinaReichelt@feddit.org · 6 days ago

The issue is that there are apps promising you an calorie count via photo.

FauxLiving@lemmy.world · 6 days ago

There’s pills promising to improve my love life also, I don’t believe them either

Tikiporch@lemmy.world · 6 days ago

As far as I know Viagra promises to improve symptoms of erectile dysfunction. It doesn’t claim to make you less of a shit boyfriend.

FauxLiving@lemmy.world · 6 days ago

As with all things, people should evaluate the claims of companies vs reality.

If it seems to good to be true, it probably is.

TechAnon@lemmy.world · 6 days ago

Maybe get a stronger case. 🤷‍♂️😄

FauxLiving@lemmy.world · 5 days ago

But the guy at the phone store told me it was practically indestructible, I used it practically and it destructable’d.

I’m starting to think this whole ‘phone’ thing is doomed to failure.

I’m basing this entirely on a single anecdotal evidence and all of the other evidence that I’ve selected which confirms my worldview on the topic. I have done my own research (but not with a phone).

SleeplessCityLights@programming.dev · 6 days ago

They are non-deterministic by design.

GreenBottles@lemmy.world · 6 days ago

LLMs are not detetministic like calculators. Wrong tool for the job.

magnue@lemmy.world · 6 days ago

If you supplied humans with the same image and asked for the same estimate I’d be curious to know the difference in results.

jj4211@lemmy.world · 6 days ago

Mine would be: “I have no idea” - An answer the LLMs generally refuse to give by their nature (usually declining to answer is rooted in something in the context indicating refusing to answer being the proper text).

If you really pressed them, they’d probably google each thing and sum the results, so the estimates would be as consistent as first google results.

LLMs have a tendency to emit a plausible answer without regard for facts one way or the other. We try to steer things by stuffing the context with facts roughly based on traditional ‘fact’ based measures, but if the context doesn’t have factual data to steer the output, the output is purely based on narrative consistency rather than data consistency. It may even do that if the context has fact based content in it sometimes.

darklamer@feddit.org · 6 days ago

I bought a small bag of cheap rice, and it didn’t help me to connect to God!

psycho_driver@lemmy.world · edit-2 6 days ago

Bruh a couple of months ago I asked it (Gemini) to check the number of characters, including spaces, in a potential game character name because I was working at the time and couldn’t stop to check my in-head count. It told me 21–I had counted 20. I thought I must have gotten distracted and miscounted. Later when I had time to actually focus on the issue it turned out AI had miscounted a 20 character string (maybe counting the null terminating character?).

MightEnlightenYou@lemmy.world · edit-2 6 days ago

People should read the top comments on Hackernews instead of anyone here, they’re more informed on the topic than Lemmy is

Oisteink@lemmy.world · 6 days ago

Yeah - if you’re after AI fanbois you should head over there. They’re not that bright, but if you check show and tell you can see what claude’s been ut to last two days

brucethemoose@lemmy.world · edit-2 6 days ago

Better yet, download Qwen 3.5/3.6, with a “raw” notepad like Mikupad. Try it yourself:

https://huggingface.co/ubergarm/Qwen3.6-27B-GGUF

https://github.com/lmg-anon/mikupad

One might observe:

Chat formating, and how janky the “thinking” block is.
How words are broken up into tokens, not characters.
How particularly funky that gets with numbers.
Precisely how sampling “randomizes” the answers by visualizing “all possible answers” with the logprobs display.
And, thus, precisely how and why carb counting in ChatGPT fails, yet a measly local LLM on a desktop/phone could get it right with a little tooling or adjustment.

This is exactly what OpenAI/Anthropic don’t want you to do. They want users dumb and tethered, like a cloud subscription or social media platform. Not cognizant of how tools they are peddling as magic lamps actually work. And why, and how, they’re often stupid.

arin@lemmy.world · 6 days ago

deleted by creator

I Asked AI to Count My Carbs 27,000 Times. It Couldn’t Give Me the Same Answer Twice.

I Asked AI to Count My Carbs 27,000 Times. It Couldn’t Give Me the Same Answer Twice.

I Asked AI to Count My Carbs 27,000 Times. It Couldn’t Give Me the Same Answer Twice. | Diabettech - Diabetes and Technology