Notes from the Nova Mob

Melbourne Fannish Drinks

Melbourne SF pub meetup. Second Wednesday of every month

Next meetup: Wednesday 9 July.

The second Wednesday pub meeting open to all Melbourne fans of SF, fantasy, and horror, announces a change of venue. The time is the same, 6pm every second Wednesday. 
It was the Nixon Hotel, in Docklands. Now in Melbourne Central, 9 July 2025, 6pm. Lion Hotel, Level 3, Melbourne Central, 211 La Trobe St. It’s the big sports bar. Happy Hour is 4pm to 7pm, i.e. 3 hours duration.The shift is temporary, something to do with AFL and quiz nights. Perhaps it’s AFL or quiz nights.

https://melbournecentrallion.au/

💥 💥 💥

Nova Mob members, friends, and guests borged into Meta’s AI

Roll a dice to choose the next word to build a sentence. Keep doing that 50 times to build a paragraph or page. What are the chances that you will accurately reproduce a section of a Harry Potter novel? About 98%, if you are one particular AI model. 
But before naming that Artificial Intelligence model, and which novels are uncannily reproduced with no money going back to the writer, how do books get into the AI training set in the first place? If you are Meta, you use a database of pirated books and hoover it all up in its entirety, according to The Atlantic. Just like the Borg on Star Trek. 
Turns out almost all the Nova Mob’s published members, friends, and our guests, are part of the borged data set that Meta ate for its training set. 
Did LibGen have permission to reproduce the books of these writers? 
Did Meta have permission to borg them up into its maw, to train its AI with?
Search for yourself:

 Search LibGen, the Pirated-Books Database That Meta Used to Train AI

https://www.theatlantic.com/technology/archive/2025/03/search-libgen-data-set/682094/

“Millions of books and scientific papers are captured in the LibGen collection’s current iteration.” Including novels, stories, and non-fiction by all these people, I’ve checked:

Eugen Bacon, Max Barry, John Birmingham, Jenny Blackford, Russell Blackford, Sue Bursztynski,
James Cambias, Trudi Canavan, Paul Collins
Jack Dann, Chris Flynn
Rob Gerrand, Kerry Greenwood
Lee Harding, Richard Harland, Robert Hood
Van Ikin, George Ivanoff
Paul Kincaid
Vanessa Len, Ken Liu
Sophie Masson, Bren MacDibble, Iain McIntyre, Sean McMullen, Andrew MacRae, Farah Mendlesohn, Meg Mundell
Shelley Parker-Chan, Hoa Pham, Gillian Polack
Jane Routley, Lucy Sussex
Shaun Tan, Keith Taylor
Kaaron Warren, Janeen Webb

AI, plagiarism, and Harry Potter – as reported on Ars Technica

“Study: Meta AI model can reproduce almost half of Harry Potter book 

The research could have big implications for generative AI copyright lawsuits.”

https://arstechnica.com/features/2025/06/study-metas-llama-3-1-can-recall-42-percent-of-the-first-harry-potter-book/

In its December 2023 lawsuit against OpenAI, The New York Times Company produced dozens of examples where GPT-4 exactly reproduced significant passages from Times stories. In its response, OpenAI described this as a “fringe behaviour” and a “problem that researchers at OpenAI and elsewhere work hard to address.

“But is it actually a fringe behaviour? And have leading AI companies addressed it? New research—focusing on books rather than newspaper articles and on different companies—provides surprising insights into this question.”

A May 2025 paper from Cornell, Stanford, and West Virginia University legal scholars and computer scientists investigated whether five AI models could reproduce text from Books3, a repository which is often used to train AI models and includes many works still under copyright.

I found it fascinating how tokens work. Timothy Lee’s article on Ars Technica – from which I’ve quoted here – describes how it’s done, using the example “peanut butter and.. “ where the next word could be jelly, sugar, cream, other. Each next word has a probability. The maths is applied to that, and in a 50 tokens example it’s a string of probabilities that multiply together (such as 0.83 x 0.32 x 0.27 x 0.56 and so on). Think of each number as a token, and each probability is the chance of selecting the right word. The equation is 50 numbers long.

“The study authors took 36 books and divided each of them into overlapping 100-token passages. Using the first 50 tokens as a prompt, they calculated the probability that the next 50 tokens would be identical to the original passage. They counted a passage as “memorised” if the model had a greater than 50 percent chance of reproducing it word for word.

This definition is quite strict. For a 50-token sequence to have a probability greater than 50 percent, the average [value for each] token in the passage needs [to be] a probability of at least 98 percent!”

One of the 36 books tested was Harry Potter and the Sorcerer’s Stone (US title). 

“The chart [see the article] shows how easy it is to get a model to generate 50-token excerpts from various parts of Harry Potter and the Sorcerer’s Stone. The darker a line is, the easier it is to reproduce that portion of the book.

“Llama 3.1 70B—a mid-sized model Meta released in July 2024—is far more likely to reproduce Harry Potter text than any of the other four models.

“Specifically, the paper estimates that Llama 3.1 70B has memorised 42 percent of the first Harry Potter book well enough to reproduce 50-token excerpts at least half the time.

As one commenter said, “If I could be prompted with a paragraph from a book and give the next paragraph verbatim I think you would agree I had effectively memorised large swaths of the thing, why should an LLM be held to a different standard than a human? And again as the article states, if the standard had been relaxed to missing a few tokens (akin to getting a few words or punctuation wrong here and there) it likely would be a lot higher.”

Interestingly, best-sellers had more likelihood for being predictively reiterated verbatim by the AI – OK, let’s call it for what it is: reproduced – , than did work by less popular writers.

This is one of the best studies to unpack exactly how much has been stolen by the tech giants. It used to be that downloading a mp3 file illegally could get you a US $70,000 fine. What should Meta expect for stealing copyrighted works on an industrial scale?

This is why the Authors’ Societies have court cases under way.

💥 💥 💥

Hoa Pham – Fantasy Writers’ Victoria Workshop

”Hi Nova Mob
I was wondering if you could put in your nova mob newsletter a series of fantasy fiction workshops I am running for Writers Victoria. The information is here:
https://writersvictoria.org.au/calendars/events/event/?id=1346

Date: Tue 1 July 2025 – 12:00 AM to Thu 9 October 2025 – 12:00 AM

With: Hoa Pham

Summary: Join a supportive community of fantasy writers and hone your craft under the guidance of award-winning author and editor Hoa Pham.”

💥 💥 💥

SF Commentary #120 arrives

This is a really well written and presented issue. The first part is devoted to Race Mathews, and for that alone it is a worthy magazine of record. Includes farewells from Iola Mathews and Gareth Evans, as well as letters of comment from those who attended the State Memorial Service for Race. Recommended.

Available from Bruce Gillespie, physical and email addresses and details at e-fanzines.com

💥 💥 💥

Nova Mob About and Contact Us

Nova Mob on social media:  https://novamob.blog/

We’re on Mastodon. Click the invite to follow.
https://mastodonbooks.net/@NovaMob

https://mastodonbooks.net/invite/YECXVBUk

nova@aussiebb.com.au

Friends, out-of-town guests, and new arrivals – you are always welcome and have an open invitation to the Mob’s face-to-face and Zoom meetings.
First time arrivals – free. Otherwise a $5 donation for expenses please.
Face-to-face meetings are at the Kensington Town Hall:

https://activemelbourne.ymca.org.au/venues/kensington-town-hall

Face to face, the Kensington Town Hall has ample parking and excellent disability access. Kensington Railway Station is 13 minutes travel from Flinders St Station on the Craigieburn line. 

Murray MacLachlan

Convenor