if you could standardise a file format for a specific task what would you pick and why

jackpot@lemmy.ml · edit-2 2 years ago

if you could standardise a file format for a specific task what would you pick and why

raubarno 🇱🇹@lemmy.ml · 2 years ago

Open Document Standard (.odt) for all documents. In all public institutions (it’s already a NATO standard for documents).

Because the Microsoft Word ones (.doc, .docx) are unusable outside the Microsoft Office ecosystem. I feel outraged every time I need to edit .docx file because it breaks the layout easily. And some older .doc files cannot even work with Microsoft Word.

Actually, IMHO, there should be some better alternative to .odt as well. Something more out of a declarative/scripted fashion like LaTeX but still WYSIWYG. LaTeX (and XeTeX, for my use cases) is too messy for me to work with, especially when a package is Byzantine. And it can be non-reproducible if I share/reuse the same document somewhere else.

Something has to be made with document files.

monobot@lemmy.ml · 2 years ago

It is unbelievable we do not have standard document format.

DigitalJacobin@lemmy.ml · 2 years ago

What’s messed up is that, technically, we do. Originally, OpenDocument was the ISO standard document format. But then, baffling everyone, Microsoft got the ISO to also have .docx as an ISO standard. So now we have 2 competing document standards, the second of which is simply worse.

erogenouswarzone@lemmy.ml · 2 years ago

Bro, trying to give padding in Ms word, when you know… YOU KNOOOOW… they can convert to html. It drives me up the wall.

And don’t get me started on excel.

Kill em all, I say.

DigitalJacobin@lemmy.ml · edit-2 2 years ago

This is the kind of thing i think about all the time so i have a few.

Archive files: .tar.zst
- Produces better compression ratios than the DEFLATE compression algorithm (used by .zip and gzip/.gz) and does so faster.
- By separating the jobs of archiving (.tar), compressing (.zst), and (if you so choose) encrypting (.gpg), .tar.zst follows the Unix philosophy of “Make each program do one thing well.”.
- .tar.xz is also very good and seems more popular (probably since it was released 6 years earlier in 2009), but, when tuned to it’s maximum compression level, .tar.zst can achieve a compression ratio pretty close to LZMA (used by .tar.xz and .7z) and do it faster^[1].
  
  zstd and xz trade blows in their compression ratio. Recompressing all packages to zstd with our options yields a total ~0.8% increase in package size on all of our packages combined, but the decompression time for all packages saw a ~1300% speedup.
Image files: JPEG XL/.jxl
- “Why JPEG XL”
- Free and open format.
- Can handle lossy images, lossless images, images with transparency, images with layers, and animated images, giving it the potential of being a universal image format.
- Much better quality and compression efficiency than current lossy and lossless image formats (.jpeg, .png, .gif).
- Produces much smaller files for lossless images than AVIF^[2]
- Supports much larger resolutions than AVIF’s 9-megapixel limit (important for lossless images).
- Supports up to 24-bit color depth, much more than AVIF’s 12-bit color depth limit (which, to be fair, is probably good enough).
Videos (Codec): AV1
- Free and open format.
- Much more efficient than x264 (used by .mp4) and VP9^[3].
Documents: OpenDocument / ODF / .odt
- @raubarno@lemmy.ml says it best here. .odt is simply a better standard than .docx.
it’s already a NATO standard for documents Because the Microsoft Word ones (.doc, .docx) are unusable outside the Microsoft Office ecosystem. I feel outraged every time I need to edit .docx file because it breaks the layout easily. And some older .doc files cannot even work with Microsoft Word.

lloram239@feddit.de · edit-2 2 years ago

.tar is pretty bad as it lacks in index, making it impossible to quickly seek around in the file. The compression on top adds another layer of complication. It might still work great as tape archiver, but for sending files around the Internet it is quite horrible. It’s really just getting dragged around for cargo cult reasons, not because it’s good at the job it is doing.

In general I find the archive situation a little annoying, as archives are largely completely unnecessary, that’s what we have directories for. But directories don’t exist as far as HTML is concerned and only single files can be downloaded easily. So everything has to get packed and unpacked again, for absolutely no reason. It’s a job computers should handle transparently in the background, not an explicit user action.

Many file managers try to add support for .zip and allow you to go into them like it is a folder, but that abstraction is always quite leaky and never as smooth as it should be.

jackpot@lemmy.ml · 2 years ago

By separating the jobs of archiving (.tar), compressing (.zst), and (if you so choose) encrypting (.gpg), .tar.zst follows the Unix philosophy of “Make each program do one thing well.”.

wait so does it do all of those things?

DigitalJacobin@lemmy.ml · 2 years ago

So there’s a tool called tar that creates an archive (a .tar file. Then theres a tool called zstd that can be used to compress files, including .tar files, which then becomes a .tar.zst file. And then you can encrypt your .tar.zst file using a tool called gpg, which would leave you with an encrypted, compressed .tar.zst.gpg archive.

Now, most people aren’t doing everything in the terminal, so the process for most people would be pretty much the same as creating a ZIP archive.

Laser@feddit.de · 2 years ago

By separating the jobs of archiving (.tar), compressing (.zst), and (if you so choose) encrypting (.gpg), .tar.zst follows the Unix philosophy of “Make each program do one thing well.”.

The problem here being that GnuPG does nothing really well.

Videos (Codec): AV1

Much more efficient than x264 (used by .mp4) and VP9[3].

AV1 is also much younger than H264 (AV1 is a specification, x264 is an implementation), and only recently have software-encoders become somewhat viable; a more apt comparison would have been AV1 to HEVC, though the latter is also somewhat old nowadays but still a competitive codec. Unfortunately currently there aren’t many options to use AV1 in a very meaningful way; you can encode your own media with it, but that’s about it; you can stream to YouTube, but YouTube will recode to another codec.

DigitalJacobin@lemmy.ml · 2 years ago

The problem here being that GnuPG does nothing really well.

Could you elaborate? I’ve never had any issues with gpg before and curious what people are having issues with.

Unfortunately currently there aren’t many options to use AV1 in a very meaningful way; you can encode your own media with it, but that’s about it; you can stream to YouTube, but YouTube will recode to another codec.

AV1 has almost full browser support (iirc) and companies like YouTube, Netflix, and Meta have started moving over to AV1 from VP9 (since AV1 is the successor to VP9). But you’re right, it’s still working on adoption, but this is moreso just my dreamworld than it is a prediction for future standardization.

Laser@feddit.de · 2 years ago

Could you elaborate? I’ve never had any issues with gpg before and curious what people are having issues with.

This article and the blog post linked within it summarize it very well.

DigitalJacobin@lemmy.ml · 2 years ago

deleted by creator

piexil@lemmy.world · 2 years ago

I get better compression ratio with xz than zstd, both at highest. When building an Ubuntu squashFS

Zstd is way faster though

ronweasleysl@lemmy.ml · 2 years ago

Damn didn’t realize that JXL was such a big deal. That whole JPEG recompression actually seems pretty damn cool as well. There was some noise about GNOME starting to make use of JXL in their ecosystem too…

jackpot@lemmy.ml · 2 years ago

wait im confusrd whats the differenc ebetween .tar.zst and .tar.xz

DigitalJacobin@lemmy.ml · 2 years ago

Different ways of compressing the initial .tar archive.

kadu@lemmy.world · edit-2 1 year ago

deleted by creator

Gamma@programming.dev · 2 years ago

I get your point. Since a .tar.zst file can be handled natively by tar, using .tzst instead does make sense.

DigitalJacobin@lemmy.ml · 2 years ago

Sounds like a Windows problem

kadu@lemmy.world · edit-2 1 year ago

deleted by creator

DigitalJacobin@lemmy.ml · edit-2 2 years ago

I get the frustration, but Windows is the one that strayed from convention/standard.

Also, i should’ve asked this earlier, but doesn’t Windows also only look at the characters following the last dot in the filename when determining the file type? If so, then this should be fine for Windows, since there’s only one canonical file extension at a time, right?

kadu@lemmy.world · edit-2 1 year ago

deleted by creator

Spore@lemmy.ml · 2 years ago

There already are conventional abbreviations: see Section 2.1. I doubt they will be better supported by tools though.

kadu@lemmy.world · edit-2 1 year ago

deleted by creator

jackpot@lemmy.ml · 2 years ago

is av1 lossy

DigitalJacobin@lemmy.ml · 2 years ago

AV1 can do lossy video as well as lossless video.

Supermariofan67@programming.dev · 2 years ago

Ogg Opus for all lossy audio compression (mp3 needs to die)

7z or tar.zst for general purpose compression (zip and rar need to die)

dinckel@lemmy.world · 2 years ago

The existence of zip, and especially rar files, actually hurts me. It’s slow, it’s insecure, and the compression is from the jurassic era. We can do better

jackpot@lemmy.ml · 2 years ago

why does zip and rar need to die

Supermariofan67@programming.dev · 2 years ago

Zip has terrible compression ratio compared to modern formats, it’s also a mess of different partially incompatible implementations by different software, and also doesn’t enforce utf8 or any standard for that matter for filenames, leading to garbled names when extracting old files. Its encryption is vulnerable to a known-plaintext attack and its key-derivation function is very easy to brute force.

Rar is proprietary. That alone is reason enough not to use it. It’s also very slow.

PlexSheep@feddit.de · 2 years ago

How about tar.gz? How does gzip compare to zstd?

Supermariofan67@programming.dev · 2 years ago

Both slower and worse at compression at all its levels.

jackpot@lemmy.ml · 2 years ago

why does ml3 need todie

Supermariofan67@programming.dev · 2 years ago

It’s a 30 year old format, and large amounts of research and innovation in lossy audio compression have occurred since then. Opus can achieve better quality in like 40% the bitrate. Also, the format is, much like zip, a mess of partially broken implementations in the early days (although now everyone uses LAME so not as big of a deal). Its container/stream format is very messy too. Also no native tag format so it needs ID3 tags which don’t enforce any standardized text encoding.

TheAnonymouseJoker@lemmy.ml · edit-2 2 years ago

(mp3 needs to die)

How are you going to recreate the MP3 audio artifacts that give a lot of music its originality, when encoding to OPUS? Past audio recordings cannot be fiddled with too much.

Also, fuck Zstandard, its a problematic format due to single file compression ability, hard to repair, not fully stable and lacking too many features compared to 7Z/RAR. Zst is also 15-20% worse at compression ratio. Its only a good format for temporary fast data transit applications (webpage/CDN serving, quick temporary database backups).

d_k_bo@feddit.de · 2 years ago

JPEG-XL for rasterized images.

https://jpegxl.info/why-jxl.html

GamingChairModel@lemmy.world · edit-2 2 years ago

I agree.

I especially love that it addresses the biggest pitfall of the typical “fancy new format does things better than the one we’re already using” transition, in that it’s specifically engineered to make migration easier, by allowing a lossless conversion from the dominant format.

hikaru755@feddit.de · 2 years ago

Never heard of that, thanks for bringing it to my attention!

kadu@lemmy.world · edit-2 1 year ago

deleted by creator

d_k_bo@feddit.de · 2 years ago

GNOME introduced its support in version 45, AFAIK there isn’t a stable distro release yet that ships it.

DigitalJacobin@lemmy.ml · 2 years ago

Unfortunately, adoption has been slow and Alliance for Open Media are pushing back somewhat (especially Google^[1], who leads the group) in favor of their inferior .avif format.

https://www.phoronix.com/news/Chrome-Drops-JPEG-XL ↩︎

RoyaltyInTraining@lemmy.world · 2 years ago

How does it compare to AVIF?

d_k_bo@feddit.de · 2 years ago

AVIF is slower, has a way smaller maximum resolution and doesn’t support progressive decoding as well as lossless JPEG recompression.

Comparison chart of different image codecs

RoyaltyInTraining@lemmy.world · 2 years ago

Oh dam, that resolution limit is a total deal breaker. Can’t believe anyone would release a format with those limitations today…

Infernal_pizza@lemmy.world · 2 years ago

Literally any file format except PDF for documents that need to be edited. Fuck Adobe and fuck Acrobat

rtxn@lemmy.world · edit-2 2 years ago

~~XML for machine-readable data because I live to cause chaos~~

Either markdown or Org for human-readable text-only documents. MS Office formats and the way they are handled have been a mess since the 2007 -x versions were introduced, and those and Open Document formats are way too bloated for when you only want to share a presentable text file.

While we’re at it, standardize the fucking markdown syntax! I still have nightmares about Reddit’s degenerate four-space-indent code blocks.

seaQueue@lemmy.world · edit-2 2 years ago

I’d like an update to the epub ebook format that leverages zstd compression and jpeg-xl. You’d see much better decompression performance (especially for very large books,) smaller file sizes and/or better image quality. I’ve been toying with the idea of implementing this as a .zpub book format and plugin for KOReader but haven’t written any code for it yet.

lloram239@feddit.de · edit-2 2 years ago

I’d setup a working group to invent something new. Many of our current formats are stuck in the past, e.g. PDF or ODF are still emulating paper, even so everybody keeps reading them on a screen. What I want to see is a standard document format that is build for the modern day Internet, with editing and publishing in mind. HTML ain’t it, as that can’t handle editing well or long form documents, EPUB isn’t supported by browsers, Markdown lacks a lot of features, etc. And than you have things like Google Docs, which are Internet aware, editable, shareable, but also completely proprietary and lock you into the Google ecosystem.

glibg10b@lemmy.ml · 2 years ago

JPEG XL for images because it compresses better than JPEG, PNG and WEBP most of the time.

XZ because it theoretically offers the highest compression ratio in most circumstances, and long decompression time isn’t really an issue when the alternative is downloading a larger file over a slow connection.

Config files stored as serialized data structures instead of in plain text. This speeds up read times and removes the possibility of syntax or type errors. Also, fuck JSON.

I wish there were a good format for typesetting. Docx is closed and inflexible. LaTeX is unreadable, inefficient to type and hard to learn due to the inconsistencies that arise from its reliance on third-party packages and its lack of guidelines for their design.

shotgun_crab@lemmy.world · 2 years ago

TOML for configuration files

the_crab_man@lemmy.world · 2 years ago

100% this. Much more readable than JSON, YAML or other custom formats.

AlexWIWA@lemmy.ml · 2 years ago

Markdown for all rich text that doesn’t need super fancy shit like latex

DigitalJacobin@lemmy.ml · 2 years ago

deleted by creator

morrowind@lemmy.ml · 2 years ago

I’d argue asciidoc is better, but less well known

danielfgom@lemmy.world · 2 years ago

Definitely FLAC for audio because it’s lossless, if you record from a high fidelity source…

exFAT for external hard drives and SD cards because both Windows and Mac can read and write to it as well as Linux. And you don’t have the permission pain…

glibg10b@lemmy.ml · 2 years ago

What permission pain?

danielfgom@lemmy.world · 2 years ago

If you were to format the drive with extra and then copy something to it from Linux - if you try open it on another Linux machine (eg you distro hop after this event) it won’t open the file because your aren’t the owner.

Then you have to jump though hoops trying to make yourself the owner just so you can open your own file.

I learnt this the hard way so I just use exFAT and it all works.

jackpot@lemmy.ml · 2 years ago

i’d like there to be a way to standardise midi info in plugins for music

Krafting@lemmy.world · 2 years ago

.dontuse for snaps

Kazumara@feddit.de · 2 years ago

OTDR measurement results in like XML or whatever open self documenting format, just not SOR. Or even just in actual standards compliant SOR, if that’s all I can get.

jackpot@lemmy.ml · 2 years ago

i dont understand any if the acrobyms

gkpy@feddit.de · 2 years ago

except XML xD

Kazumara@feddit.de · edit-2 2 years ago

OTDR: Optical Time Domain Reflectometry
SOR: Standard OTDR Record
XML: Extensible Markup Language

.sor files are a mess, poorly standardized, too restrictive as a format, and every manufacturer makes their own proprietary extensions.

jackpot@lemmy.ml · 2 years ago

what file extension, what category

Kazumara@feddit.de · 2 years ago

Category: OTDR measurement results
File extension: .xml or something entirely new

jackpot@lemmy.ml · 2 years ago

what on earth does rhat do

Kazumara@feddit.de · 2 years ago

An OTDR sends pulses of laser light into a fiber optic cable and records the minute reflections that occur at every point of the cable over time. The time of arrival of the reflections corresponds to the position of where it was reflected. This way you can record the attenuation of an entire cable just from shining in pulses from one end. Good for checking if a new cable was properly installed, or for finding the location of issues in existing cables for debugging.