9098 stories
·
89 followers

SAG-AFTRA Striking Because Companies Won’t Offer Same AI Protections To Voice And Physical Performers, Not Happy About GTA VI Exemption

1 Share

"When you compare that to the reality of this work, they're really not covered"

The post SAG-AFTRA Striking Because Companies Won’t Offer Same AI Protections To Voice And Physical Performers, Not Happy About GTA VI Exemption appeared first on Aftermath.



Read the whole story
InShaneee
7 hours ago
reply
Chicago, IL
Share this story
Delete

Junji Ito’s terrifying Uzumaki hits Adult Swim in September

1 Share
A still image from Adult Swim’s Uzumaki.
Image: Adult Swim

Adult Swim’s long-awaited adaptation of Uzumaki finally has a premiere date — and an appropriately creepy trailer. The series, based on the classic horror manga from Junji Ito, will start airing on September 28th. Episodes will hit Adult Swim first, and then stream on Max the following day.

Uzumaki follows a cursed town that is — and I promise it’s scarier than it sounds — plagued by spirals. Here’s the full synopsis:

“Let’s leave this town together,” asks Shuichi Saito, a former classmate of Kirie Goshima, a high school girl who was born and grew up in Kurouzu-cho. Everything from a strange whirlwind, billowing smoke from the crematorium, and the residents is turning into spirals. People’s eyes spin in whirls, a tongue spirals, and the bodies twist unnaturally. In an attempt to escape the curse of the spirals, Kirie decides to flee from Kuouzu-cho, but can she get away from this turmoil?

It’s one of Ito’s most popular and unsettling stories, and based on the new trailer, it looks like the show is really nailing the detailed black-and-white art that’s so intrinsic to his work. Ito’s particular style has proven to be a tricky thing to translate to animation, as seen in the Netflix anthology from last year. That’s likely why Adult Swim’s version has been in the works for so long; it was first announced in 2019.

For another taste, here’s the first released clip of the show from 2023:

Read the whole story
InShaneee
7 hours ago
reply
Chicago, IL
Share this story
Delete

Video Game Voice Actors Strike Over AI

1 Share

"Frankly, it’s stunning that these video game studios haven’t learned anything from the lessons of last year"

The post Video Game Voice Actors Strike Over AI appeared first on Aftermath.



Read the whole story
InShaneee
1 day ago
reply
Chicago, IL
Share this story
Delete

Anthropic AI Scraper Hits iFixit’s Website a Million Times in a Day

1 Share
Anthropic AI Scraper Hits iFixit’s Website a Million Times in a Day

The web scraper bot for Anthropic’s AI chatbot Claude hit iFixit’s website nearly a million times in a single day, despite the repair database having terms of service provisions that state “reproducing, copying or distributing any Content, materials or design elements on the Site for any other purpose, including training a machine learning or AI model, is strictly prohibited without the express prior written permission of iFixit.”

iFixit CEO Kyle Wiens tweeted Wednesday “Hey @AnthropicAI: I get you're hungry for data. Claude is really smart! But do you really need to hit our servers a million times in 24 hours? You're not only taking our content without paying, you're tying up our devops resources. Not cool.”

Wiens sent me server logs that showed thousands of requests per minute for a several hour period. “We're just the largest database of repair information in the world, no big deal if they take it all without asking and swamp our servers in the process,” he told me, adding that iFixit’s website has millions of total pages. These include repair guides, revision histories for those guides, blogs, news posts, and research, forums, community-contributed repair guides and question-and-answer sections, etc. 

This sort of scraping has become incredibly commonplace, and a recent study by the Data Provenance Institute shows that website owners are increasingly trying to signal to AI companies that they do not want their content scraped for the purpose of training commercial AI tools. Wiens said that iFixit modified its robots.txt file this week to specifically block Anthropic’s crawler bots. 

This is particularly notable because, when I asked Anthropic about the fact that its bot hit iFixit a million times in a day, I was sent a blog post by the company that puts the onus on website owners to specifically block Anthropic’s crawler, called ClaudeBot. 

“As per industry standard, Anthropic uses a variety of data sources for model development, such as publicly available data from the internet gathered via a web crawler,” the blog post reads. “Our crawling should not be intrusive or disruptive. We aim for minimal disruption by being thoughtful about how quickly we crawl the same domains and respecting Crawl-delay where appropriate.”

OpenAI Training Bot Crawls ‘World’s Lamest Content Farm’ 3 Million Times in One Day
“If you were wondering what they’re using to train GPT-5, well, now you know.”
Anthropic AI Scraper Hits iFixit’s Website a Million Times in a Day

The post adds that “opting out of being crawled by ClaudeBot requires modifying the robots.txt file” to block its crawler, meaning that instructing companies to not scrape content based on terms of service alone doesn’t actually do anything in practice unless a website wanted to sue the AI company. 

Across the board, AI companies almost never respect terms of service, which is interesting because many of them have very long terms of service agreements themselves that sometimes restrict what users can do. In a paper published last week that we’ve already written about a few times, researchers at the Data Provenance Institute found that many websites have requested that their content not be scraped in their terms of service, but that often does not do anything. 

This is a shame, lead author Shayne Longpre told me, because terms of service allows website owners to be more nuanced about the types of crawlers they want to allow or block than robots.txt does. 

“The tragedy is that terms of service are specific and nuanced, but not machine readable and robots.txt is machine readable, but incredibly coarse and unspecific,” Longpre said. “With terms of service, I suspect the only ones that are being complied with are the very large companies that maybe have filed lawsuits, but they seem to be otherwise ignored.”

Read the whole story
InShaneee
2 days ago
reply
Chicago, IL
Share this story
Delete

Folks, We Got Another One

1 Share

Over 500 workers at Activision Blizzard formed a union across departments

The post Folks, We Got Another One appeared first on Aftermath.



Read the whole story
InShaneee
2 days ago
reply
Chicago, IL
Share this story
Delete

Google Is the Only Search Engine That Works on Reddit Now Thanks to AI Deal

1 Share
🖥️
404 Media is an independent website whose work is written, reported, and owned by human journalists and whose intended audience is real people, not AI scrapers, bots, or a search algorithm. Sign up to support our work and for free access to this article. Learn why we require this here.
Google Is the Only Search Engine That Works on Reddit Now Thanks to AI Deal

Google is now the only search engine that can surface results from Reddit, making one of the web’s most valuable repositories of user generated content exclusive to the internet’s already dominant search engine.

If you use Bing, DuckDuckGo, Mojeek, Qwant or any other alternative search engine that doesn’t rely on Google’s indexing and search Reddit by using “site:reddit.com,” you will not see any results from the last week. DuckDuckGo is currently turning up seven links when searching Reddit, but provides no data on where the links go or why, instead only saying that “We would like to show you a description here but the site won't allow us.” Older results will still show up, but these search engines are no longer able to “crawl” Reddit, meaning that Google is the only search engine that will turn up results from Reddit going forward. Searching for Reddit still works on Kagi, an independent, paid search engine that buys part of its search index from Google.

The news shows how Google’s near monopoly on search is now actively hindering other companies’ ability to compete at a time when Google is facing increasing criticism over the quality of its search results. And while neither Reddit or Google responded to a request for comment, it appears that the exclusion of other search engines is the result of a multi-million dollar deal that gives Google the right to scrape Reddit for data to train its AI products.

“They’re [Reddit] killing everything for search but Google,” Colin Hayhurst, CEO of the search engine Mojeek told me on a call. 

Hayhurst tried contacting Reddit via email when Mojeek noticed it was blocked from crawling the site in early June, but said he has not heard back.

“It's never happened to us before,” he said. “Because this happens to us, we get blocked, usually because of ignorance or stupidity or whatever, and when we contact the site you certainly can get that resolved, but we've never had no reply from anybody before.”

As Jason wrote yesterday, there’s been a huge increase in the number of websites that are trying to block bots that AI companies use to scrape them for training data by updating their robots.txt file. Robots.txt is a text file which instructs bots whether they are or are not allowed to access a website. Googlebot, for example, is the crawler or “spider” that Google uses to index the web for search results. Websites with a robots.txt file can make an exception to give Googlebot access, and not other bots, so they can appear in search results that can generate a lot of traffic. Recently Google also introduced Google-Extended, a bot which crawls the web specifically to improve its Gemini apps, so websites can allow Googlebot to crawl but block the crawler Google uses to power its generative AI products. 

Robots.txt files are just instructions, which crawlers can and have ignored, but according to Hayhurst Reddit is also actively blocking its crawler.

Reddit has been upset about AI companies scraping the site to train large language models, and has taken public and aggressive steps to stop them from continuing to do so. Last year, Reddit broke a lot of third party apps beloved by the Reddit community when it started charging to access its API, making many of those third party apps too expensive to operate. Earlier this year, Reddit announced that it signed a $60 million with Google, allowing it to license Reddit content to train its AI products. 

Reddit’s robots.txt used to include a bunch of jokes, like forbidding the robot Bender from Futurama from scraping it (User-Agent: bender, Disallow: /my_shiny_metal_ass) and specific pages that search engines are and are not allowed to access. “/r*.rss/” was allowed, while “/login” was not allowed. 

Today, Reddit’s robots.txt is much simpler and more strict. In addition to a few links to Reddit’s new “public content policies,” the file simply includes the following instruction:

User-agent: *
Disallow: /

Which basically means: no user-agent (bot) should scrape any part of the site.

“Reddit believes in an open internet, but not the misuse of public content,” the updated robots.txt file says.

“Unfortunately, we’ve seen an uptick in obviously commercial entities who scrape Reddit and argue that they are not bound by our terms or policies,” Reddit said in June. “Worse, they hide behind robots.txt and say that they can use Reddit content for any use case they want. While we will continue to do what we can to find and proactively block these bad actors, we need to do more to protect Redditors’ contributions. In the next few weeks, we’ll be updating our robots.txt instructions to be as clear as possible: if you are using an automated agent to access Reddit, you need to abide by our terms and policies, and you need to talk to us.”

🤖
Have you seen other interesting crawlers and robots.txt files? I would love to hear from you. Send me an email at emanuel@404media.co.

Reddit appears to have updated its robots.txt file around June 25, after Mojeek’s Hayhurst noticed its crawler was getting blocked. That announcement said that “good faith actors – like researchers and organizations such as the Internet Archive – will continue to have access to Reddit content for non-commercial use,” and that “We are selective about who we work with and trust with large-scale access to Reddit content.” It also links to a guide on accessing Reddit data which plainly states Reddit considers “Search or website ads” as a “commercial purpose” and that no one can use Reddit data without permission or paying a fee. 

Google did not respond to a request for comment, but its announcement of the company’s deal with Reddit points out not only how valuable Reddit is for training AI, but what many of us already know: As Google Search gets increasingly worse in turning up relevant search results, one of the best ways to still get them is to add “Reddit” to your search queries, directing Google to a site where real humans have been writing advice and recommendations for almost two decades. There are a lot of ways to illustrate how useful Reddit can be, but I’m not going to do better than this video:

The fact that Google is the only search engine that leads users to that information now, and that it is apparently the result of a $60 million deal around AI training data, is another example of the unintended consequences of the indiscriminate scraping of the entire internet in order to power generative AI tools. 

“We've always crawled respectfully and we've done it for 20 years. We're verified on Cloudflare, we don't train AI, we're like genuine, traditional genuine searching, we don't do ‘answer engine’ stuff,” Hayhurst said. “Answer engine” is Perplexity’s name for its AI-powered search engine. “The whole point about Mojeek, our proposition is that we don't do any tracking. But people also use us because we provide a completely different set of results.”

Reddit’s deal with Google, Hayhurst said, makes it harder to offer these alternative ways of searching the web. 

“It's part of a wider trend, isn't it?” he said. “It concerns us greatly. The web has been gradually killed and eroded. I don't want to make too much of a generalization, but this didn't help the small guys.”



Read the whole story
InShaneee
2 days ago
reply
Chicago, IL
Share this story
Delete
Next Page of Stories