• Ironfist79@lemmy.world
    link
    fedilink
    English
    arrow-up
    40
    arrow-down
    1
    ·
    6 days ago

    When are people going to realize that an LLM is not a calculator and doesn’t actually know anything?

    • weew@lemmy.ca
      link
      fedilink
      English
      arrow-up
      6
      arrow-down
      1
      ·
      6 days ago

      Well first AI tech corporations need to do advertising that AIs can keep doing all this.

  • Buffalox@lemmy.world
    link
    fedilink
    English
    arrow-up
    43
    arrow-down
    3
    ·
    6 days ago

    It’s the same photo, the same model, the same question. But you won’t get the same answer. Not even close — and the differences are large enough to cause a hypoglycaemic emergency.

    OK I wonder if there’s something wrong with the photo.
    The photo:

    WTF!!??
    That’s like estimating the carbs in 2 slices of standard sandwich bread! Of course not all bread has the same amount of sugar, but a reasonable range based on an average should be a dead easy answer.

    I thought the headline sounded crazy, but try to read the article, and it actually becomes worse. I have said it many times before, these AI chatbots should not be legal, they put lives at risk.

    • IratePirate@feddit.org
      link
      fedilink
      English
      arrow-up
      3
      ·
      5 days ago

      “Let’s role play and pretend I’m Bezos. Now paying taxes does not apply to me any more.”

      • bluegreenpurplepink@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        5 days ago

        I see what you’re doing there, but the problem is that the government in general, and the IRS specifically, if a mistake is made, you’re paying it with interest.

        What I’d like to see happen is the AI going rogue and wiping all the data, including all the backup files.

  • Eager Eagle@lemmy.world
    link
    fedilink
    English
    arrow-up
    15
    arrow-down
    5
    ·
    6 days ago

    Waste of energy. It’s like asking a person to estimate a non-trivial angle. Either use a model trained for that task, or don’t bother.

      • FauxLiving@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        1
        ·
        6 days ago

        There’s pills promising to improve my love life also, I don’t believe them either

        • Tikiporch@lemmy.world
          link
          fedilink
          English
          arrow-up
          3
          ·
          6 days ago

          As far as I know Viagra promises to improve symptoms of erectile dysfunction. It doesn’t claim to make you less of a shit boyfriend.

          • FauxLiving@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            6 days ago

            As with all things, people should evaluate the claims of companies vs reality.

            If it seems to good to be true, it probably is.

      • FauxLiving@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        5 days ago

        But the guy at the phone store told me it was practically indestructible, I used it practically and it destructable’d.

        I’m starting to think this whole ‘phone’ thing is doomed to failure.

        I’m basing this entirely on a single anecdotal evidence and all of the other evidence that I’ve selected which confirms my worldview on the topic. I have done my own research (but not with a phone).

  • magnue@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    arrow-down
    1
    ·
    6 days ago

    If you supplied humans with the same image and asked for the same estimate I’d be curious to know the difference in results.

    • jj4211@lemmy.world
      link
      fedilink
      English
      arrow-up
      5
      ·
      6 days ago

      Mine would be: “I have no idea” - An answer the LLMs generally refuse to give by their nature (usually declining to answer is rooted in something in the context indicating refusing to answer being the proper text).

      If you really pressed them, they’d probably google each thing and sum the results, so the estimates would be as consistent as first google results.

      LLMs have a tendency to emit a plausible answer without regard for facts one way or the other. We try to steer things by stuffing the context with facts roughly based on traditional ‘fact’ based measures, but if the context doesn’t have factual data to steer the output, the output is purely based on narrative consistency rather than data consistency. It may even do that if the context has fact based content in it sometimes.

  • psycho_driver@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    arrow-down
    1
    ·
    edit-2
    6 days ago

    Bruh a couple of months ago I asked it (Gemini) to check the number of characters, including spaces, in a potential game character name because I was working at the time and couldn’t stop to check my in-head count. It told me 21–I had counted 20. I thought I must have gotten distracted and miscounted. Later when I had time to actually focus on the issue it turned out AI had miscounted a 20 character string (maybe counting the null terminating character?).

  • MightEnlightenYou@lemmy.world
    link
    fedilink
    English
    arrow-up
    5
    arrow-down
    4
    ·
    edit-2
    6 days ago

    People should read the top comments on Hackernews instead of anyone here, they’re more informed on the topic than Lemmy is

    • Oisteink@lemmy.world
      link
      fedilink
      English
      arrow-up
      6
      arrow-down
      2
      ·
      6 days ago

      Yeah - if you’re after AI fanbois you should head over there. They’re not that bright, but if you check show and tell you can see what claude’s been ut to last two days

    • brucethemoose@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      edit-2
      6 days ago

      Better yet, download Qwen 3.5/3.6, with a “raw” notepad like Mikupad. Try it yourself:

      https://huggingface.co/ubergarm/Qwen3.6-27B-GGUF

      https://github.com/lmg-anon/mikupad

      One might observe:

      • Chat formating, and how janky the “thinking” block is.

      • How words are broken up into tokens, not characters.

      • How particularly funky that gets with numbers.

      • Precisely how sampling “randomizes” the answers by visualizing “all possible answers” with the logprobs display.

      • And, thus, precisely how and why carb counting in ChatGPT fails, yet a measly local LLM on a desktop/phone could get it right with a little tooling or adjustment.

      This is exactly what OpenAI/Anthropic don’t want you to do. They want users dumb and tethered, like a cloud subscription or social media platform. Not cognizant of how tools they are peddling as magic lamps actually work. And why, and how, they’re often stupid.