Can someone who understands this better explain to me how this thing actually places the order into whatever POS they use? Like if LLMs are just advanced auto-complete, I get how they can do “fuzzy” tasks like answering questions or carrying on a conversation, but how do they do rigid tasks like entering the tacos into whatever system the cash register and kitchen use?
The LLM isn’t limited to just what it does. It can interact with other programs.
There are a ton of audio recognition systems available, almost all of them predate this LLM bubble. There’s already an API for interacting with the ordering system. So it’s just down to having the LLM pull what is then do that corresponding action for the order.
This is so simple it doesn’t require anything nearly as complicated as an LLM. The old phone assistants like Siri and Alexa could do this type of thing. It’s literally the same as telling Alexa to place an order for something, and that’s been an ability for years.
So the output from the LLM is just a text description that’s fed into another, smarter piece of software that interprets that text into an order? What task is the LLM actually doing in this case?
The LLM is taking the order. Interpreting what people say into that simple text description. Not everyone talks the same or describes things the same. That is i believe where the bulk of the LLM is doing the work. Then I’m sure there is some background stock management and health checks out manages as well
Yeah, unlike a human that understands a customer saying “one pizzaburger, that’s all”, the app doesn’t understand the situation that the order is complete, but rather just keeps on asking more obviously unwanted cringey questions like “buy two, you’ll save a few cents on the second one?” or “what will you drink with that?” or “is that a big menu?”…
There’s a few ways they could go about it. They could have part of the prompt be something like “when the customer is done taking their order, create a JSON file with the order contents” and set up a dumb register essentially that looks for those files and adds that order like a standard POS would.
They could spell out a tutorial in the prompt, "to order a number 6 meal, type “system.order.meal(6)” calling the same functions that a POS system would, and have that output right to a terminal.
They could have their POS system be open on an internal screen, and have a model that can process images, and have it specify a coordinate pair, to simulate a touch screen, and make it manually enter an order that way as an employee would.
There’s lots of ways to hook up the AI, and it’s not actually that different from hooking up a normal POS system in the first place, although just because one method does allow an AI to interact doesn’t mean it’ll go about it correctly.
Probably something like this. Except not trained to be a rebellious troll. Part of her training set is his chat, hehe. Though despite this one being “evil” neuro, I think normal neurosama is more of a troll now, lol.
This is clipped segments from a live stream, so it jumps ahead at times. It has links to the source channel if you would prefer a full video. This one is probably already too long for most people though.
He does end up figuring out why she has so much trouble correctly inserting code in the right places later.
Edit: also, everytime she says “filtered”, it means whatever she was gonna say would have broken youtube or twitch rules. He has two filters, one on the text generated and one on the text to speech. If the text one catches it, it just outputs filtered instead, if the speech one catches it, she’ll still type something terrible, but only say roughly the first syllable or 2 before the speech is cut off.
Can someone who understands this better explain to me how this thing actually places the order into whatever POS they use? Like if LLMs are just advanced auto-complete, I get how they can do “fuzzy” tasks like answering questions or carrying on a conversation, but how do they do rigid tasks like entering the tacos into whatever system the cash register and kitchen use?
The LLM isn’t limited to just what it does. It can interact with other programs.
There are a ton of audio recognition systems available, almost all of them predate this LLM bubble. There’s already an API for interacting with the ordering system. So it’s just down to having the LLM pull what is then do that corresponding action for the order.
This is so simple it doesn’t require anything nearly as complicated as an LLM. The old phone assistants like Siri and Alexa could do this type of thing. It’s literally the same as telling Alexa to place an order for something, and that’s been an ability for years.
So the output from the LLM is just a text description that’s fed into another, smarter piece of software that interprets that text into an order? What task is the LLM actually doing in this case?
The LLM is taking the order. Interpreting what people say into that simple text description. Not everyone talks the same or describes things the same. That is i believe where the bulk of the LLM is doing the work. Then I’m sure there is some background stock management and health checks out manages as well
What’s wrong with an input machine with buttons or touch screen?
Not futuristic enough or something.
They are not able to answer questions or change simply via a software update.
We have apps for that, and they’re typically a pita. They certainly take longer than just talking through your order.
Yeah, unlike a human that understands a customer saying “one pizzaburger, that’s all”, the app doesn’t understand the situation that the order is complete, but rather just keeps on asking more obviously unwanted cringey questions like “buy two, you’ll save a few cents on the second one?” or “what will you drink with that?” or “is that a big menu?”…
I think the role of the LLM is just to make the system understand the order more accurately.
Its just an API.
There’s a few ways they could go about it. They could have part of the prompt be something like “when the customer is done taking their order, create a JSON file with the order contents” and set up a dumb register essentially that looks for those files and adds that order like a standard POS would.
They could spell out a tutorial in the prompt, "to order a number 6 meal, type “system.order.meal(6)” calling the same functions that a POS system would, and have that output right to a terminal.
They could have their POS system be open on an internal screen, and have a model that can process images, and have it specify a coordinate pair, to simulate a touch screen, and make it manually enter an order that way as an employee would.
There’s lots of ways to hook up the AI, and it’s not actually that different from hooking up a normal POS system in the first place, although just because one method does allow an AI to interact doesn’t mean it’ll go about it correctly.
Probably something like this. Except not trained to be a rebellious troll. Part of her training set is his chat, hehe. Though despite this one being “evil” neuro, I think normal neurosama is more of a troll now, lol.
https://youtu.be/AFtryxMDJQs
This is clipped segments from a live stream, so it jumps ahead at times. It has links to the source channel if you would prefer a full video. This one is probably already too long for most people though.
He does end up figuring out why she has so much trouble correctly inserting code in the right places later.
Edit: also, everytime she says “filtered”, it means whatever she was gonna say would have broken youtube or twitch rules. He has two filters, one on the text generated and one on the text to speech. If the text one catches it, it just outputs filtered instead, if the speech one catches it, she’ll still type something terrible, but only say roughly the first syllable or 2 before the speech is cut off.