• ell1e@leminal.space
    link
    fedilink
    English
    arrow-up
    3
    ·
    edit-2
    12 hours ago

    So what’s the quote from the documentation that backs up your claim? The line “perform other product specific crawls” seems extremely vague by design.

    • General_Effort@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      11 hours ago

      I’m not really sure what you are asking here. Did you notice that you can scroll down and see a list of their crawlers?

      • ell1e@leminal.space
        link
        fedilink
        English
        arrow-up
        2
        ·
        edit-2
        9 hours ago

        Nothing on this page seems to contradict the article. But if I simply missed the part that does, I’d be happy to learn.

        • General_Effort@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          ·
          8 hours ago

          You look up what Googlebot does. No AI.

          You want to know what crawlers do AI? Just search for “AI”, or “training”, or some such, or skim through. It’s not long. Google-Extended collects training data. Note that Google-Extended is explicitly not used to rank pages.

          Did that help?

          • ell1e@leminal.space
            link
            fedilink
            English
            arrow-up
            2
            ·
            edit-2
            1 hour ago

            You look up what Googlebot does. No AI.

            The page seems written to perhaps suggest it but doesn’t explicitly say the other bots can’t feed into some other sort of AI training. It would be in Google’s interest to mislead the users here.

            Edit: I found a quote where it says Googlebot does both in one: “Google-Extended doesn’t have a separate HTTP request user agent string. Crawling is done with existing Google user agent […]” and I guess Cloudflare doesn’t trust Google to abide by the access controls. That seems sensible to me.