“Falsehood flies, and truth comes limping after it, so that when men come to be undeceived, it is too late; the jest is over, and the tale hath had its effect: […] like a physician, who hath found out an infallible medicine, after the patient is dead.” —Jonathan Swift

  • 32 Posts
  • 733 Comments
Joined 2 years ago
cake
Cake day: July 25th, 2024

help-circle


  • Okay, so:

    I tried installing a program called “hardinfo”. My ZorinOS software store didn’t find it through flathub.

    That’s fair. Repo fragmentation is a real thing on Linux, and it seems like Ultimate Systems didn’t put their software on Flathub.

    So I googled it, found a .deb file, which my Zorin store loaded up to install.

    So instead of just using apt – like every introductory tutorial to Ubuntu and its derivatives leads off with – you chose to do it (effectively) the Windows way that you’re familiar with where you hunt and peck around the Internet for an install file. It’s an understandable mistake (that I think most Windows expats make at some point), but the blame from this point on lies squarely on you.

    Then I hit install, and it spits out a message like “Software was not installed. Requires these three dependancies, which will not be installed”. Didn’t tell me why they didn’t install. Just said "Hardinfo needs these programs. Good luck figuring it out asshole.

    You didn’t have the dependencies, and it told you which ones to install. Why does it need to tell you why it needs them? Nice to have, I guess, but if it’s mandatory, it’s mandatory. No amount of explanation is going to get you around the fact that this software will not function without them. Dependencies aren’t a Linux thing; they’re a reality of modern programming. And I imagine apt would’ve automatically resolved this and asked you to also install the deps.



  • I clarified this a bit in a follow-up comment, but my first comment was simplifying for the sake of countering:

    [it’s not in the public domain] because the actual human work that went into creating it was done by the owner of the AI Model and whatever they trained on.

    Their claim that the copyright for AI-generated works belongs to the model creator and the authors of the training material – and is never in the public domain – is patent, easily disprovable nonsense.

    Yes, I understand it’s more nuanced than what I said. No, it’s not nuanced in their favor. No, I’m not diving into that with a pathological liar (see their other comments) when it’s immaterial to my rebuttal of their bullshit claim. I guess you just didn’t read the claim I was addressing?



  • The answer is that it’s messy and that I’m not qualified to say where the line is (nor, I think, is anyone yet). The generated parts are not copyrightable, but you can still have a valid copyright by bringing together things that aren’t individually copyrightable. For example, if I make a manga where Snow White fights Steamboat Willie, I’ve taken two public domain elements and used them to create a copyrightable work.

    So it’s not like the usage of AI inherently makes a project uncopyrightable unless the entire thing or most of it was just spat out of a machine. Where’s the line on this? Nobody (definitely not me, but probably nobody) really knows.

    As for courts ever finding out, how this affects trade secret policy… Dunno? I’m sure a Microsoft employee couldn’t release it publicly, because as you said, it’d probably violate an NDA. If there were some civil case, the source may come out during discovery and could maybe be analysed programmatically or by an expert. You would probably subpoena the employee(s) who wrote the software and ask them to testify. This is just spitballing, though, over something that’s probably inconsequential, because the end product is prooooobably still copyrightable.

    This kind of reminds me of the blurry line we have in FOSS, where everyone retains the copyright to their individual work. But if push comes to shove, how much does there need to be for it to be copyrightable? Where does it stop being a boilerplate for loop and start being creative expression?


  • Just as a sanity check: the person you’re responding to is a serial troll and what I can only describe as intellectually dishonest at best or a pathological liar at worst. They make up whatever they want and will never concede that the fucking nonsense they just dreamed up five seconds ago based on nothing is wrong in the face of conclusive proof otherwise.

    You shouldn’t waste your time responding to this cretin.







  • That’s not true.

    Okay, then you’ll need to explain the annual emails I’ve gotten saying “Your application to the Wikipedia Library has been approved” after I apparently tripped and fell and filled out a manual form applying to the library every year.

    It doesn’t seem selective once you meet the four aforementioned criteria, but you do need to manually apply.

    The idea you’re talking about, meanwhile, is nonsensical and doesn’t address basically anything about the massive structural problems blacklisting archive.today imposes. I wholly support expanding out the Wikipedia Library, but even this pie-in-the-sky version of it falls too far short of what archive.today provides – and that’s just going forward in an ideal world where you can snap your fingers and make this fantasyland WPL happen as soon as archive.today is blacklisted.

    The “backcatalogue”, so to speak, is what’s going to be the most catastrophic part of this by far. I spent years where my main focus was just on bringing dead sources back to life; I don’t know the full extent of how bad this is, but I know for damn sure what you’ve suggested (which won’t ever happen) undoes barely a fraction of the damage.


  • I think you have a very severe misunderstanding of the Wikipedia Library, which I have access to and frequently use. The WPL allows active editors in good standing to access paywalled sources.

    • You must have an account which is 6+ months old, has made 500 edits, has 10+ edits in the last month, and is not blocked. (an extreme minority of editors, let alone readers.)
    • You must first apply to gain access.
    • For publications with limited subscriptions, you must individually apply on top of your WPL access.
    • Critically: the WPL does not host any of these publications. You are taken to them via a portal and given an access token.

    I can’t emphasize enough how absurd this comparison is. “Solar farms exist; building a Dyson sphere would be basically the same thing. Let’s get to work.” And the thing is: I wish you were right.


    Edit: That said, if you ever need copyleft material, we do maintain Wikimedia Commons for media generally and Wikisource which is a transcribed digital library of free sources. Much narrower in scope than this, but I highly recommend them!


  • So my suggestion, brainstorm ideas that would make you independent:

    Editors have been doing this for years.

    Make agreements with IA to improve retention,

    The IA already lives on a razor’s edge in terms of copyright and is doing everything it thinks it can to push that. Many websites leave the IA be because having free, independent archives can benefit them, but it doesn’t take a lot for a copyright holder to say: “Hey, you’re hosting my IP verbatim, I sent you a takedown request, you didn’t comply, and I’m taking you to court.”

    You can’t just “make agreements” for the IA to violate copyright law (more than it arguably already is). They’re already doing the best they can, and pushing them to do more would endanger Wikipedia even worse. It’s not an exaggeration to say that the IA dying would be a project-wide apocalypse.

    roll your own archiver,

    I’d bet it could be done if the IA went down, triggering a project-wide crisis, but among other things, I’m sure the Wikimedia Foundation doesn’t want to paint a target on its backs. We’re very cautious when it comes to copyrighted material hosted on Wikimedia projects, and this would be dropping a fork into a blender for us.

    make a deal with news orgs to show their articles as citations (this last one I actually like most the more I think about it. A good negotiator can call it advertising for the news org and you’ll at the same time not infringe on copyright like archive[.]today is).

    I don’t think I understand one. The Wikimedia project gets to host verbatim third-party news articles? This is creative but completely unrealistic; you’d be asking news organizations to place their work under a copyleft license for citing on Wikipedia (that’s what we host except for minimal, explicitly labeled fair use material that has robust justification). It’d be a technical nightmare any way you slice it, and logistically it’d be a clusterfuck.

    Even if you magically overcame those problems, Wikipedia exists to be neutral and independent, and this “wink wink nudge nudge ;)” quasi-advertising deal would look corrupt as fuck – us showing preferential treatment for certain sources not based on their quality but on their willingness to do us favors.

    If you wait until point of no return, the choice has already been made for you whether you like it or not. And worst part is that you’d scramble to find a solution instead of the best solution.

    Here’s the thing: we know. This RfC is full of highly experienced editors deciding if Wikipedia is going to amputate. Option A means immediate, catastrophic, irreversible, mostly unfixable damage to Wikipedia. That is something that needs to be thought through, and your suggestions – which are appreciated for showing you’re giving it real thought – reflect that people who don’t regularly edit can’t really, viscerally understand how completely screwed Wikipedia is by this.



  • I don’t really see it as a complicated issue.

    That makes sense from (what I think is) an “outsider’s” perspective. From an “insider’s” perspective*, here’s the problem:

    • Wikipedia has a strict verifiability policy.
      • This policy states that “Each fact or claim in an article must [correspond to reliable sources]”.
      • This policy is the bedrock of Wikipedia. The project is fundamentally unsustainable without it, and we’re still undoing damage from decades ago when the policy either didn’t exist or was too loosely enforced.
      • I’m making a third bullet point because I cannot emphasize enough how much “just ignore it lol” cannot work and has never worked.
    • Hundreds of thousands of articles have citations sourced to archive.today.
      • This is despite the fact that the Internet Archive is prioritized whenever possible. We even have a prolific Internet Archive bot that (when possible) automatically recovers citations.
      • The Interrnet Archive complies with blanket takedown requests of a domain very easily. Even if we ignore the ones going forward because now both resources are unreliable, archive.today would have untold millions of webpages archived which the IA does not – many of which are used on Wikipedia.
      • Archive.today will archive material that the Internet Archive will simply fail to archive because, on a technical level, it’s just better at capturing a static snap of an article (which is what we want). It’s especially true for paywalled articles, which the Wayback Machine is often stymied by.
    • This would also make the Internet Archive the only remaining avenue for archiving URLs, meaning Wikipedia effectively collapses if something happens to the IA (granted that’d already be catastrophic with archive.today, much moreso than archive.today’s hypothetical removal).
    • Archiving URLs isn’t just some incidental thing.
      • Citations are the backbone of Wikipedia. Casual readers might find them comforting to have. Researchers will rely on them. But editors cannot operate without them. We might actually use them more than readers do, because they help us a) check what’s already there, b) better understand the subject ourselves, and c) expand out the article.
      • Link rot is so much more pervasive than I think people fully grasp. When I’m writing an article, if possible, I archive every single source I use at both the Wayback Machine and archive.today, because relying on the link staying up is objectively a mistake (and relying on just one is negligent).
      • The security that archives offer generally just incalculably reduces the workload and mental load for editors.

    If you’ve ever tried to add a citation on Wikipedia to a sentence that says “citation needed”, you’ve rubbed up against Brandolini’s law. A corollary is that it’s much, much harder to cite an uncited statement than it is to create one. If you remove archive.today, you flood Wikipedia with hundreds of thousands of these. It’s dampened a bit by the fact that the citation metadata is still there and that some URLs will still be live, but I cannot emphasize – as an editor of nearly 10 years, with over 25,000 contributions, and who’s authored two featured articles – that you’d introduce a workload that could never be done, whose repurcussions would be felt for decades at a time when Wikipedia is already on shaky footing.

    Even if you somehow poofed away all that work, there are bound to be tens of thousands of statements in articles you have to get rid of because they simply cannot be reasonably sourced anywhere else. For many, many statements, this is not incidental information independent from the rest of the article; many of these removals would require you to fundamentally restructure the surrounding prose or even the entire article.

    It’s hard for me to explain that you just have to “trust me bro” that those people voting “Option C” take what archive.today did very seriously and recognize that either option is going to mean major, irreparable damage to the project. Wikipedia is a lot different from the editing side than it is on the reading one; sometimes it’s liberating, sometimes it’s horrifying, and in this case it’s “I could use a hug”.

    * “Outsider” and “insider” used to denote experience editing; most anyone can do anything on Wikipedia from the get-go.