• 0 Posts
  • 144 Comments
Joined 2 years ago
cake
Cake day: June 13th, 2023

help-circle







  • “Up until the semi finals, it seemed like nothing would be able to stop Grok 4 on its way to winning the event,” Pedro Pinhata, a writer for Chess.com, said in its coverage. “Despite a few moments of weakness, X’s AI seemed to be by far the strongest chess player… But the illusion fell through on the last day of the tournament.” He said Grok’s “unrecognizable” and “blundering” play enabled o3 to claim a succession of “convincing wins”.

    I think the main takeaway is that these models are fundamentally inconsistent, and you can never assume they’re going to be reliable based on past performance.



  • AbouBenAdhem@lemmy.worldtoTechnology@lemmy.world*Permanently Deleted*
    link
    fedilink
    English
    arrow-up
    5
    ·
    edit-2
    4 months ago

    “Researchers in the field sometimes describe our goal as to pass the ‘Visual Turing Test,’” said Suyeon Choi […] “A visual Turing Test then means, ideally, one cannot distinguish between a physical, real thing as seen through the glasses and a digitally created image being projected on the display surface,” Choi said.

    So they just came up with a needlessly opaque synonym of “verisimilitude”.







  • Critical paragraph:

    Our research highlights the importance of Germany’s unique institutional context, characterized by strong labor protections, extensive union representation, and comprehensive employment legislation. These factors, combined with Germany’s gradual adoption of AI technologies, create an environment where AI is more likely to complement rather than displace worker skills, mitigating some of the negative labor market effects observed in countries like the US.





  • AbouBenAdhem@lemmy.worldtoTechnology@lemmy.world*Permanently Deleted*
    link
    fedilink
    English
    arrow-up
    3
    ·
    edit-2
    6 months ago

    There was a recent paper claiming that LLMs were better at avoiding toxic speech if it was actually included in their training data, since models that hadn’t been trained on it had no way of recognizing it for what it was. With that in mind, maybe using reddit for training isn’t as bad an idea as it seems.