Book2, also known as 50languages or Goethe-Verlag, is a quite underrated website that provides a bunch of quality free resources. Among them, a comprehensive vocab list and phrasebook complete with audio recordings from native speakers. According to their website it should be enough to put you to A2 level.
One of the draws of the site is that it offers the same content in 50+ languages, which means you can practice any language from any other language. Too often people have to resort to courses in English or another more 'popular' language, and while it may be a way to refresh one of those, it does impede your L3 language acquisition.
The only problem is that you have to go through their website or app that then showers you with ads, prompts you to buy the pro version, etc. So, I wrote a little script that basically downloads the wordlist (1904 words) and audio and wraps it into an anki deck. You prompt it with your source and target languages (say, 'en', 'fr', 'de', 'es', etc.) and it spits an .apkg file that you can import and study however you want.
Theoretically I could do it for all 50*49*2 = 4900 combinations but it's probably best not to overload them with so many requests, lol. So I suggest you use the script for your own needs and then publish the resulting deck so people don't have to do it again. (You will need Python 3 with the beautifulsoup and genanki libraries installed - huge shoutout to genanki's author for letting me make a deck from scratch without reading a word of the anki manual).
You will find attached the script with a couple of example decks (German for French speakers, Georgian for Lithuanian speakers, Portuguese for Japanese speakers). The code is very dirty but it does the job, the decks seem to work, no issue with special characters, and I'm of course open to improvements and feedback.
Link to script and decks:
https://files.catbox.moe/2t61vj.zip
(updated, somewhat more permanent link)
Code: Select all
from lxml import html origin_language = sys.argv[1].upper() url = "https://www.goethe-verlag.com/book2/_VOCAB"
#!/usr/bin/env python3
import requests
from bs4 import BeautifulSoup
import sys
import itertools
import genanki
import glob
import shutil
import os.path
target_language = sys.argv[2].upper()
target_url = f"{url}/{origin_language}/{origin_language}{target_language}/"
def pad_number(n):
if n < 10:
return "0" + str(n)
else:
return str(n)
my_model = genanki.Model(
1091735104,
"Simple Model with Media",
fields=[
{"name": "Question"},
{"name": "Answer"},
{"name": "MyMedia"}, # ADD THIS
],
templates=[
{
"name": "Card 1",
"qfmt": "{{Question}}", # AND THIS
"afmt": '{{FrontSide}}<hr id="answer">{{Answer}}<br>{{MyMedia}}',
},
],
css=""".card {
font-family: arial;
font-size: 20px;
text-align: center;
color: black;
background-color: white;
}
.card1 { background-color: #FFFFFF; }
.card2 { background-color: #FFFFFF; }"""
)
my_deck = genanki.Deck(
2059400111, f"Book2 {origin_language}-{target_language} (words)"
)
MAX_LESSONS = 42
for i in range(1, MAX_LESSONS + 1):
r = requests.get(f"{target_url}/{pad_number(i)}.HTM")
soup = BeautifulSoup(r.content, "html.parser")
words = str(soup.select("meta")[-1]).split('"')[1].split("| ")
mp3s = [target_url + str(u).split('"')[1] for u in soup.select("source")]
for w, m in zip(words, mp3s):
filename = f"word_{origin_language}{target_language}_" + m.split("/")[-1]
target_w = " - ".join(w.split(" - ")[:-1]) # necessary because some words have several translations in TL
source_w = w.split(" - ")[-1]
if not os.path.isfile(filename):
dl_file = requests.get(m, stream=True)
print(m)
with open(filename, "wb") as out_file:
shutil.copyfileobj(dl_file.raw, out_file)
my_note = genanki.Note(
model=my_model, fields=[source_w, target_w, f"[sound:{filename}]"]
)
my_deck.add_note(my_note)
my_package = genanki.Package(my_deck)
my_package.media_files = [m for m in glob.glob(f"word_{target_language}_*.mp3")]
my_package.write_to_file(f"book2_{origin_language}{target_language}_words.apkg")
Edit 1: improved the deck's appearance with some css
Edit 2: someone requested a way to also download the individual sentences. Because the logic is a bit different here is an additional script:
Code: Select all
from lxml import html origin_language = sys.argv[1].upper() url = "https://www.goethe-verlag.com/book2" target_url = f"{url}/{origin_language}/{origin_language}{target_language}/{origin_language}{target_language}" def pad_number(n): my_model = genanki.Model( .card1 { background-color: #FFFFFF; } my_deck = genanki.Deck( MIN_LESSON = 3 # 2 is the index page for i in range(MIN_LESSON, MAX_LESSON + 1): # header body_l1_sentences = [t.text.strip() for t in soup.find_all("div", {"class": "Stil35"})][:18] # last element is some text about Alzheimer l1_sentences = header_l1_sentences + body_l1_sentences for l1_s, l2_s, m in zip(l1_sentences, l2_sentences, l2_audio): filename = f"sentence_{origin_language}{target_language}_" + m.split("/")[-1] if not os.path.isfile(filename): my_note = genanki.Note( my_deck.add_note(my_note) my_package = genanki.Package(my_deck)#!/usr/bin/env python3
import requests
from bs4 import BeautifulSoup
import sys
import itertools
import genanki
import glob
import shutil
import os.path
target_language = sys.argv[2].upper()
if n < 10:
return "00" + str(n)
elif n < 100:
return "0" + str(n)
else:
return str(n)
1091735104,
"Simple Model with Media",
fields=[
{"name": "Question"},
{"name": "Answer"},
{"name": "MyMedia"}, # ADD THIS
],
templates=[
{
"name": "Card 1",
"qfmt": "{{Question}}", # AND THIS
"afmt": '{{FrontSide}}<hr id="answer">{{Answer}}<br>{{MyMedia}}',
},
],
css=""".card {
font-family: arial;
font-size: 20px;
text-align: center;
color: black;
background-color: white;
}
.card2 { background-color: #FFFFFF; }"""
)
2059400110, f"Book2 {origin_language}-{target_language} (sentences)"
)
MAX_LESSON = 102 # 103 is the youtube video
r = requests.get(f"{target_url}{pad_number(i)}.HTM") # no slash unlike vocab scraping
soup = BeautifulSoup(r.content, "html.parser")
header_l1_sentences = [t.text for t in soup.find_all("span", {"class": "Stil36"})]
header_l2_sentences = [t.text for t in soup.find_all("span", {"class": "Stil46"})]
l2_audio = [t.find_all("source")[0]["src"] for t in soup.find_all("audio")]
body_l2_sentences = [t.text.strip().split('\r\n\n')[1] for t in soup.find_all("div", {"class": "Stil45"})]
l2_sentences = header_l2_sentences + body_l2_sentences
dl_file = requests.get(m, stream=True)
print(m)
with open(filename, "wb") as out_file:
shutil.copyfileobj(dl_file.raw, out_file)
model=my_model, fields=[l1_s, l2_s, f"[sound:{filename}]"]
)
my_package.media_files = [m for m in glob.glob(f"sentence_{target_language}_*.mp3")]
my_package.write_to_file(f"book2_{origin_language}{target_language}_sentences.apkg")
It works very much the same way, you basically save this into a file called booksentences2anki.py supply it your source and target languages this way (e.g. learning Modern Greek from Brazilian Portuguese):
Code: Select all
./booksentences2anki.py px el
Edit 3: I changed the way it names the audio files and decks so you can run it consecutively for multiple languages and it doesn't trigger bugs in Anki due to having the same filenames etc.
Edit 4: fixed a bug where multiple translations in the target language for a single word in the source language would make the script fail