@itsnotlupus

itsnotlupus@lemmy.world · edit-2 2 years ago

You can list every man page installed on your system with man -k . , or just apropos .
But that’s a lot of random junk. If you only want “executable programs or shell commands”, only grab man pages in section 1 with a apropos -s 1 .

You can get the path of a man page by using whereis -m pwd (replace pwd with your page name.)

You can convert a man page to html with man2html (may require apt get man2html or whatever equivalent applies to your distro.)
That tool adds a couple of useless lines at the beginning of each file, so we’ll want to pipe its output into a | tail +3 to get rid of them.

Combine all of these together in a questionable incantation, and you might end up with something like this:

mkdir -p tmp ; cd tmp
apropos -s 1 . | cut -d' ' -f1 | while read page; do whereis -m "$page" ; done | while read id path rest; do man2html "$path" | tail +3 > "${id::-1}.html"; done

List every command in section 1, extract the id only. For each one, get a file path. For each id and file path (ignore the rest), convert to html and save it as a file named $id.html.

It might take a little while to run, but then you could run firefox . or whatever and browse the resulting mess.

Or keep tweaking all of this until it’s just right for you.

itsnotlupus@lemmy.world · 2 years ago

No True Christian would ever activate a fully automated sentry killbot that doesn’t use at least one of its compute cores to pray to the Almighty on a loop.

itsnotlupus@lemmy.world · 2 years ago

Presumably because they don’t have a single delivery employee. They just provide “tech” that lets drivers and customers find each others.

Of course if those companies were to become responsible for providing a living wage to their “gig workers”, then it becomes harder to still call them mere “tech” companies (and some might argue that an article using that label to describe them is in fact implicitly picking a side in that lawsuit.)

itsnotlupus@lemmy.world · 2 years ago

I’ll note that there are plenty of models out there that aren’t LLMs and that are also being trained on large datasets gathered from public sources.

Image generation models, music generation models, etc.
Heck, it doesn’t even need to be about generation. Music recognition and image recognition models can also be trained on the same sort of datasets, and arguably come with similar IP right questions.

It’s definitely a broader topic than just LLMs, and attempting to enumerate exhaustively the flavors of AIs/models/whatever that should be part of this discussion is fairly futile given the fast evolving nature of the field.