• 0 Posts
  • 57 Comments
Joined 1 year ago
cake
Cake day: June 22nd, 2023

help-circle


  • masterspace@lemmy.catoProgramming@programming.devRFC 7493: The I-JSON Message Format
    link
    fedilink
    English
    arrow-up
    16
    arrow-down
    1
    ·
    edit-2
    4 days ago

    A summary:

    An old proposal (2015, not sure why OP posted it now), that basically proposes to put some more standards and limitations around JSON formatting to make it more predictable. Most of it seems pretty reasonable:

    • Must be UTF-8 encoded and properly escape Unicode characters
    • Numbers must respect the JavaScript number Type and it’s limitations (i.e. max magnitude of an int etc.)
    • Objects can’t have duplicate keys
    • The order of keys in objects does not matter
    • A JSON file does not need to have a top level object or array, it can be any JSON value (i.e. just a string or a number is still valid JSON).
    • It proposes that when processing JSON, any unrecognized keys should be ignored rather than errored

    It recommends:

    • Specific formats for date-time data
    • That binary data be stored as a bas64url string

    Honestly, the only part of this I dislike is the order of keys not mattering. I get that in a bunch of languages they use dictionary objects that don’t preserve order, but backend languages have a lot more headroom to adapt and create objects that can, vs making a JavaScript thread loop over an object an extra time to reorder it every time it receives data.


  • Dude, go drink a coffee, and then reflect on what a negative little bitch you’re being.

    The quality on Lemmy is somewhat worse than Reddit 10 years ago, entirely because the user base is a fraction of the size and is more equivalent to when Reddit was first growing 15-20 years ago. Even then it was only a success because they bootstrapped it using fake posts and comments.

    Lemmy is doing great, what it needs to grow is a positive and welcoming community, and then for Reddit to do something stupid again to trigger an exodus.


  • The work is reproduced in full when it’s downloaded to the server used to train the AI model, and the entirety of the reproduced work is used for training. Thus, they are using the entirety of the work.

    That’s objectively false. It’s downloaded to the server, but it should never be redistributed to anyone else in full. As a developer for instance, it’s illegal for me to copy code I find in a medium article and use it in our software. I’m perfectly allowed to read that Medium article, learn from it, and then right my own similar code.

    And that makes it better somehow? Aereo got sued out of existence because their model threatened the retransmission fees that broadcast TV stations were being paid by cable TV subscribers. There wasn’t any devaluation of broadcasters’ previous performances, the entire harm they presented was in terms of lost revenue in the future. But hey, thanks for agreeing with me?

    And Aero should not have lost that suit. That’s an example of the US court system abjectly failing.

    And again, LLM training so egregiously fails two out of the four factors for judging a fair use claim that it would fail the test entirely. The only difference is that OpenAI is failing it worse than other LLMs.

    That’s what we’re debating, not a given.

    It’s even more absurd to claim something that is transformative automatically qualifies for fair use.

    Fair point, but it is objectively transformative.



  • You said open source. Open source is a type of licensure.

    The entire point of licensure is legal pedantry.

    No. Open source is a concept. That concept also has pedantic legal definitions, but the concept itself is not inherently pedantic.

    And as far as your metaphor is concerned, pre-trained models are closer to pre-compiled binaries, which are expressly not considered Open Source according to the OSD.

    No, they’re not. Which is why I didn’t use that metaphor.

    A binary is explicitly a black box. There is nothing to learn from a binary, unless you explicitly decompile it back into source code.

    In this case, literally all the source code is available. Any researcher can read through their model, learn from it, copy it, twist it, and build their own version of it wholesale. Not providing the training data, is more similar to saying that Yuzu or an emulator isn’t open source because it doesn’t provide copyrighted games. It is providing literally all of the parts of it that it can open source, and then letting the user feed it whatever training data they are allowed access to.


  • LLMs use the entirety of a copyrighted work for their training, which fails the “amount and substantiality” factor.

    That factor is relative to what is reproduced, not to what is ingested. A company is allowed to scrape the web all they want as long as they don’t republish it.

    By their very nature, LLMs would significantly devalue the work of every artist, author, journalist, and publishing organization, on an industry-wide scale, which fails the “Effect upon work’s value” factor.

    I would argue that LLMs devalue the author’s potential for future work, not the original work they were trained on.

    Those two alone would be enough for any sane judge to rule that training LLMs would not qualify as fair use, but then you also have OpenAI and other commercial AI companies offering the use of these models for commercial, for-profit purposes, which also fails the “Purpose and character of the use” factor.

    Again, that’s the practice of OpenAI, but not inherent to LLMs.

    You could maybe argue that training LLMs is transformative,

    It’s honestly absurd to try and argue that they’re not transformative.




  • Making a copy is free. Making the original is not.

    Yes, exactly. Do you see how that is different from the world of physical objects and energy? That is not the case for a physical object. Even once you design something and build a factory to produce it, the first item off the line takes the same amount of resources as the last one.

    Capitalism is based on the idea that things are scarce. If I have something, you can’t have it, and if you want it, then I have to give up my thing, so we end up trading. Information does not work that way. We can freely copy a piece of information as much as we want. Which is why monopolies and capitalism are a bad system of rewarding creators. They inherently cause us to impose scarcity where there is no need for it, because in capitalism things that are abundant do not have value. Capitalism fundamentally fails to function when there is abundance of resources, which is why copyright was a dumb system for the digital age. Rather than recognize that we now live in an age of information abundance, we spend billions of dollars trying to impose artificial scarcity.


  • they did NOT predict generative AI, and their graphics cards just HAPPEN to be better situated for SOME reason.

    This is the part that’s flawed. They have actively targeted neural network applications with hardware and driver support since 2012.

    Yes, they got lucky in that generative AI turned out to be massively popular, and required massively parallel computing capabilities, but luck is one part opportunity and one part preparedness. The reason they were able to capitalize is because they had the best graphics cards on the market and then specifically targeted AI applications.



  • Well it is one thing to automate a repetitive task in your job, and quite another to eliminate entire professions.

    No it is not. That is literally how those jobs are eliminated. 30 years ago CAD came out and helped to automate drafting tasks to the point that a team of 20 drafters turned into 1 or 2 drafters and eventually turned into engineers drafting their own drawings.

    What you call “menial bullshit” is the entire livelihood and profession of quite a few people, speaking of taxis for one.

    Congratulations, despite you wanting to look at it with rose coloured glasses, that does not change the fact that it is objectively menial bullshit.

    What are all these people going to do when taxi driving is relegated to robots?

    Find other entry level jobs. If we eliminate *all * entry level jobs through automation, then we will need to implement some form of basic income as there will not be enough useful work for everyone to do. That would be a great problem to have.

    Will the state have enough cash to support them and help them upskill or whatever is needed to survive and prosper?

    Yes, the state has access to literally all of the profits from automation via taxes and redistribution.

    A technological utopia is a promise from the 1950s. Hasn’t been realized yet. Isn’t on the horizon anytime soon. Careful that in dreaming up utopias we don’t build dystopias.

    Oh wow, you’re saying that if human beings can’t create something in 70 years, then that means it’s impossible and we’ll never create it?

    Again, the only way to get to a utopia is to have all of the pieces in place, which necessitates a lot of automation and much more advanced technology than we already have. We’re only barely at the point where we can start to practice biology and medicine in a meaningful way, and that’s only because computers completely eliminated the former profession of computer.

    Be careful that you don’t keep yourself stuck in our current dystopia out of fear of change.


  • Better system for WHOM? Tech-bros that want to steal my content as their own?

    A better system for EVERYONE. One where we all have access to all creative works, rather than spending billions on engineers nad lawyers to create walled gardens and DRM and artificial scarcity. What if literally all the money we spent on all of that instead went to artist royalties?

    But tech-bros that want my work to train their LLMs - they can fuck right off. There are legal thresholds that constitute “fair use” - Is it used for an academic purpose? Is it used for a non-profit use? Is the portion that is being used a small part or the whole thing? LLM software fail all of these tests.

    No. It doesn’t.

    They can literally pass all of those tests.

    You are confusing OpenAI keeping their LLM closed source and charging access to it, with LLMs in general. The open source models that Microsoft and Meta publish for instance, pass literally all of the criteria you just stated.




  • I think that’s a huge risk, but we’ve only ever seen a single, very specific type of intelligence, our own / that of animals that are pretty closely related to us.

    Movies like Ex Machina and Her do a good job of pointing out that there is nothing that inherently means that an AI will be anything like us, even if they can appear that way or pass at tasks.

    It’s entirely possible that we could develop an AI that was so specifically trained that it would provide the best script editing notes but be incapable of anything else for instance, including self reflection or feeling loss.