Question 1

What can I use the decks for?

Accepted Answer

Decks, frequency lists, and other derived statistical data are licensed under CC BY-SA 4.0. You're free to use, share, and adapt them for any purpose — including commercial — as long as you give attribution and share any adaptations under the same license.

This does not cover example sentences or media text, which belong to their respective copyright holders.

Question 2

Can I use your API or scrape the site?

Accepted Answer

The API endpoints are publicly available and you're welcome to use them for personal projects, small-scale research, or tools that complement the learning experience. However, please don't bulk-scrape, crawl, or systematically download the site's data — for example, dumping every deck's vocabulary to mirror the database or build a competing service.

In short:

Personal projects, research, and hobby tools — go for it
Bulk downloading or mirroring significant portions of the database — not allowed
Redistributing scraped data commercially — not allowed

The API may change at any time without notice and is rate-limited. If you're unsure whether your use case is okay, feel free to ask on Discord.

Question 3

Why is X missing?

Accepted Answer

While I aim to make the website as complete as possible, it's an arduous process and I have to prioritise some media over others. If you'd like something added, you can submit a request on the requests page after creating an account. Requests are fulfilled depending on availability to source the text.

Question 4

How accurate are the media decks?

Accepted Answer

For VNs: Extracting text from VNs is a manual process. The decks should be fairly accurate, but they might contain extra text depending on the engine. Please report anything that seems off.
For books: Books are extracted semi-automatically, and may still contain extra info like the table of contents, please be aware of it.
For anime/movies/dramas: The decks are based on subtitle files, extracted in a semi-automated way. The subtitles may contain errors. There's also a chance the episode count, the episode numbers, etc, are not accurate. If you find any mistakes, please use the report an issue button.

Question 5

Why is the word count or character count different from another website?

Accepted Answer

The text extraction process is custom-built and may differ from other websites.. The character count should be mostly the same, unless there's data that's missing or data that was erroneously included. For example, a game can contain item text in another file that was forgotten to be included. The word count can have more differences as there's different ways the words can be split, for example, due to differences in word segmentation. If you find anything that looks erroneous, please use the report an issue button.

Question 6

Which websites do you source the information from?

Accepted Answer

For the metadata, these APIs are used:
Some metadata (descriptions, tags, and ratings) is derived from VNDB, licensed under the Open Database License (ODbL) 1.0.

This product uses the TMDB API but is not endorsed or certified by TMDB

AniList

IGDB

Google Books

The sources for the data are:
Subtitles from Jimaku
Game scripts from Jo-Mako
Some VN scripts from Wareya.
Everything else comes from me or our generous contributors on Discord!

Audio pronunciations are generated using VOICEVOX with voices 四国めたん (Female 1), 九州そら (Female 2, ASMR), 剣崎雌雄 (Male 1), and 青山龍星 (Male 2).

Question 7

Will you support more media types in the future?

Accepted Answer

Yes, I plan to support YouTube. Each comes with its own set of challenges, which is why they will take more time. If you have any other suggestions, please let me know on Discord.

Question 8

Will the website ever be paid?

Accepted Answer

The core features—including access to the decks and the future SRS—will always be free. There might be some premium, extra features in the future, reserved to supporters, but I want the most important features to be accessible to all.