Every vocab list + audio from book2/50languages/Goethe-Verlag in Anki deck form (2025)

Book2, also known as 50languages or Goethe-Verlag, is a quite underrated website that provides a bunch of quality free resources. Among them, a comprehensive vocab list and phrasebook complete with audio recordings from native speakers. According to their website it should be enough to put you to A2 level.

One of the draws of the site is that it offers the same content in 50+ languages, which means you can practice any language from any other language. Too often people have to resort to courses in English or another more 'popular' language, and while it may be a way to refresh one of those, it does impede your L3 language acquisition.

The only problem is that you have to go through their website or app that then showers you with ads, prompts you to buy the pro version, etc. So, I wrote a little script that basically downloads the wordlist (1904 words) and audio and wraps it into an anki deck. You prompt it with your source and target languages (say, 'en', 'fr', 'de', 'es', etc.) and it spits an .apkg file that you can import and study however you want.

Theoretically I could do it for all 50*49*2 = 4900 combinations but it's probably best not to overload them with so many requests, lol. So I suggest you use the script for your own needs and then publish the resulting deck so people don't have to do it again. (You will need Python 3 with the beautifulsoup and genanki libraries installed - huge shoutout to genanki's author for letting me make a deck from scratch without reading a word of the anki manual).

You will find attached the script with a couple of example decks (German for French speakers, Georgian for Lithuanian speakers, Portuguese for Japanese speakers). The code is very dirty but it does the job, the decks seem to work, no issue with special characters, and I'm of course open to improvements and feedback.

Link to script and decks:

https://files.catbox.moe/2t61vj.zip

(updated, somewhat more permanent link)

Code: Select all


#!/usr/bin/env python3

from lxml import html
import requests
from bs4 import BeautifulSoup
import sys
import itertools
import genanki
import glob
import shutil
import os.path

origin_language = sys.argv[1].upper()
target_language = sys.argv[2].upper()

url = "https://www.goethe-verlag.com/book2/_VOCAB"

target_url = f"{url}/{origin_language}/{origin_language}{target_language}/"

def pad_number(n):
if n < 10:
return "0" + str(n)
else:
return str(n)

my_model = genanki.Model(
1091735104,
"Simple Model with Media",
fields=[
{"name": "Question"},
{"name": "Answer"},
{"name": "MyMedia"}, # ADD THIS
],
templates=[
{
"name": "Card 1",
"qfmt": "{{Question}}", # AND THIS
"afmt": '{{FrontSide}}<hr id="answer">{{Answer}}<br>{{MyMedia}}',
},
],
css=""".card {
font-family: arial;
font-size: 20px;
text-align: center;
color: black;
background-color: white;
}

.card1 { background-color: #FFFFFF; }
.card2 { background-color: #FFFFFF; }"""
)

my_deck = genanki.Deck(
2059400111, f"Book2 {origin_language}-{target_language} (words)"
)

MAX_LESSONS = 42

for i in range(1, MAX_LESSONS + 1):
r = requests.get(f"{target_url}/{pad_number(i)}.HTM")
soup = BeautifulSoup(r.content, "html.parser")
words = str(soup.select("meta")[-1]).split('"')[1].split("| ")
mp3s = [target_url + str(u).split('"')[1] for u in soup.select("source")]

for w, m in zip(words, mp3s):

filename = f"word_{origin_language}{target_language}_" + m.split("/")[-1]

target_w = " - ".join(w.split(" - ")[:-1]) # necessary because some words have several translations in TL
source_w = w.split(" - ")[-1]
if not os.path.isfile(filename):
dl_file = requests.get(m, stream=True)
print(m)
with open(filename, "wb") as out_file:
shutil.copyfileobj(dl_file.raw, out_file)

my_note = genanki.Note(
model=my_model, fields=[source_w, target_w, f"[sound:{filename}]"]
)

my_deck.add_note(my_note)

my_package = genanki.Package(my_deck)
my_package.media_files = [m for m in glob.glob(f"word_{target_language}_*.mp3")]
my_package.write_to_file(f"book2_{origin_language}{target_language}_words.apkg")

Edit 1: improved the deck's appearance with some css

Edit 2: someone requested a way to also download the individual sentences. Because the logic is a bit different here is an additional script:

Code: Select all

#!/usr/bin/env python3

from lxml import html
import requests
from bs4 import BeautifulSoup
import sys
import itertools
import genanki
import glob
import shutil
import os.path

origin_language = sys.argv[1].upper()
target_language = sys.argv[2].upper()

url = "https://www.goethe-verlag.com/book2"

target_url = f"{url}/{origin_language}/{origin_language}{target_language}/{origin_language}{target_language}"

def pad_number(n):
if n < 10:
return "00" + str(n)
elif n < 100:
return "0" + str(n)
else:
return str(n)

my_model = genanki.Model(
1091735104,
"Simple Model with Media",
fields=[
{"name": "Question"},
{"name": "Answer"},
{"name": "MyMedia"}, # ADD THIS
],
templates=[
{
"name": "Card 1",
"qfmt": "{{Question}}", # AND THIS
"afmt": '{{FrontSide}}<hr id="answer">{{Answer}}<br>{{MyMedia}}',
},
],
css=""".card {
font-family: arial;
font-size: 20px;
text-align: center;
color: black;
background-color: white;
}

.card1 { background-color: #FFFFFF; }
.card2 { background-color: #FFFFFF; }"""
)

my_deck = genanki.Deck(
2059400110, f"Book2 {origin_language}-{target_language} (sentences)"
)

MIN_LESSON = 3 # 2 is the index page
MAX_LESSON = 102 # 103 is the youtube video

for i in range(MIN_LESSON, MAX_LESSON + 1):
r = requests.get(f"{target_url}{pad_number(i)}.HTM") # no slash unlike vocab scraping
soup = BeautifulSoup(r.content, "html.parser")

# header
header_l1_sentences = [t.text for t in soup.find_all("span", {"class": "Stil36"})]
header_l2_sentences = [t.text for t in soup.find_all("span", {"class": "Stil46"})]
l2_audio = [t.find_all("source")[0]["src"] for t in soup.find_all("audio")]

body_l1_sentences = [t.text.strip() for t in soup.find_all("div", {"class": "Stil35"})][:18] # last element is some text about Alzheimer
body_l2_sentences = [t.text.strip().split('\r\n\n')[1] for t in soup.find_all("div", {"class": "Stil45"})]

l1_sentences = header_l1_sentences + body_l1_sentences
l2_sentences = header_l2_sentences + body_l2_sentences

for l1_s, l2_s, m in zip(l1_sentences, l2_sentences, l2_audio):

filename = f"sentence_{origin_language}{target_language}_" + m.split("/")[-1]

if not os.path.isfile(filename):
dl_file = requests.get(m, stream=True)
print(m)
with open(filename, "wb") as out_file:
shutil.copyfileobj(dl_file.raw, out_file)

my_note = genanki.Note(
model=my_model, fields=[l1_s, l2_s, f"[sound:{filename}]"]
)

my_deck.add_note(my_note)

my_package = genanki.Package(my_deck)
my_package.media_files = [m for m in glob.glob(f"sentence_{target_language}_*.mp3")]
my_package.write_to_file(f"book2_{origin_language}{target_language}_sentences.apkg")

It works very much the same way, you basically save this into a file called booksentences2anki.py supply it your source and target languages this way (e.g. learning Modern Greek from Brazilian Portuguese):

Code: Select all

./booksentences2anki.py px el

Edit 3: I changed the way it names the audio files and decks so you can run it consecutively for multiple languages and it doesn't trigger bugs in Anki due to having the same filenames etc.

Edit 4: fixed a bug where multiple translations in the target language for a single word in the source language would make the script fail

Every vocab list + audio from book2/50languages/Goethe-Verlag in Anki deck form (2025)

References

Top Articles
Latest Posts
Recommended Articles
Article information

Author: Clemencia Bogisich Ret

Last Updated:

Views: 6465

Rating: 5 / 5 (60 voted)

Reviews: 91% of readers found this page helpful

Author information

Name: Clemencia Bogisich Ret

Birthday: 2001-07-17

Address: Suite 794 53887 Geri Spring, West Cristentown, KY 54855

Phone: +5934435460663

Job: Central Hospitality Director

Hobby: Yoga, Electronics, Rafting, Lockpicking, Inline skating, Puzzles, scrapbook

Introduction: My name is Clemencia Bogisich Ret, I am a super, outstanding, graceful, friendly, vast, comfortable, agreeable person who loves writing and wants to share my knowledge and understanding with you.