This commit is contained in:
m-gues 2021-12-10 18:35:42 +01:00
parent 2ed6382fd8
commit 34da8d0297
18 changed files with 933824 additions and 1631455 deletions

View file

@ -1,66 +0,0 @@
# Contribution guidelines
Please read the FAQ down below.
## Possible errors / problems in the database
If you find something that, in your opinion, could be the result of incorrectly extacted data, please submit an issue rather than creating a pull request, because the database is created by an automated process.
## Adding your project to the list of projects using this database
In case you have a project that uses this database and you want to add it to the list of projects that are using this database, create a pull request adding it to the table. Do not create an issue asking me or anyone else to add it.
+ You have to be the author/maintainer of the project that you want to add
+ Create a PR in which you add it to the table in the README.md
+ Do not change/alter anything else
+ Your project has to use this database
+ You have to have a link back to this project in the README.md of your project
+ The README.md of your project has to be in english or it must have an english translation
+ Your project has to be hosted either on github or gitlab
+ The table is sorted by project name (ascending). Add your entry accordingly.
+ Project name must match the repository name and link directly to the source code (not a project page such as YOURNAME.github.io)
+ Put your name under _Author/Maintainer_ with a link to your profile.
+ Add a meaningful description in english. The description must not be longer than 150 characters.
# FAQ
## What do you mean by 'meta data provider'?
Websites which provide information about anime such as `myanimelist.net`, `notify.moe`, ...
## Can you please add additional data/properties?
No. The dataset has been created for my own tool. It contains all data/properties that I need and I won't add more data/properties. This is merely an index. The idea is to visit the meta data provider of your choice to get additional information about the anime.
## Can you please add an additional meta data provider?
No. I don't plan to add any additional meta data provider.
## Can you please change the structure of the file?
No. The file has the structure that it needs to have for the purpose it has been built for.
## There are duplicates in the dataset.
If the entry of one meta data provider is not merged with an entry of a different meta data provider, although they are practically the same entry, then this is **not a duplicate**.
They are simply not merged together. This can happen and it is intentional. Since this dataset is created automatically two entries should rather not be merged than falsely merged together.
If you query this dataset based on titles/synonyms it might seem that there are duplicates. However the intended usage is to query by the url of the meta data provider. This way you will always retrieve the entry that you want. Entries being merged together is just a nice to have.
A duplicate by defintion of this dataset is an entry which contains multiple links of the same meta data provider in `sources`.
## Why are there no IDs?
There are. The entries under `sources` are the IDs. Each one of the array's URLs is a key for that specific entry.
## Is this dataset created automatically or manually?
It is created automatically and reviewed in a half-automated process.
## Do you plan to open source the code which creates this dataset?
Yes. Parts of the code are already [available](https://github.com/manami-project?tab=repositories&q=modb&type=source). However there is still work to do before I can/want to open source the rest and that doesn't have any priority right now.
## How do you split entries?
Entries are split if one meta data provider lists multiple entries as one and others don't.
**Example:**
* The entry of a meta data provider which lists 3 Movies as one entry is split from three separate entries of another meta data provider
* A series is listed as one entry having 26 episodes on one meta data provider and as two entries having 13 episodes each on the other meta data provider
However if one entry is listed with 13 episodes whereas the other is listed with 12, because it doesn't count the recap episode then these entries are still merged together.
## Can I somehow contribute?
Currently I can't think of a way. But you can check the [predefined issue templates](https://github.com/manami-project/anime-offline-database/issues/new/choose) in case you want to report to one of the available cases.
## Does this dataset contain all anime from the supported meta data provider?
No. MAL and anisearch are the only provider which list adult titles publicly. So this type of anime is missing for the other meta data providers.
If there are new entries which have been created after an update then those obviously won't appear until the next update.
Apart from that it should contain all titles from the supported meta data provider.

View file

@ -1,8 +0,0 @@
blank_issues_enabled: false
contact_links:
- name: Guide to add your project to the project list.
url: https://github.com/manami-project/anime-offline-database/blob/master/.github/CONTRIBUTING.md#adding-your-project-to-the-list-of-projects-using-this-database
about: How to add your project to the list of projects using this database.
- name: FAQ
url: https://github.com/manami-project/anime-offline-database/blob/master/.github/CONTRIBUTING.md#faq
about: Frequently Asked Questions

View file

@ -1,51 +0,0 @@
---
name: Falsely merged entry
about: Entries have been merged together although they should be separate entries?
title: ''
labels: ''
assignees: manami-project
---
Please read the [FAQ](https://github.com/manami-project/anime-offline-database/blob/master/.github/CONTRIBUTING.md#faq) first.
Especially the sections on [duplicates](https://github.com/manami-project/anime-offline-database/blob/master/.github/CONTRIBUTING.md#there-are-duplicates-in-the-data-set) and [splits](https://github.com/manami-project/anime-offline-database/blob/master/.github/CONTRIBUTING.md#how-do-you-split-entries). Please refrain from creating issues stating that entries should be merged together. This is only for _splitting_ entries which have already been merged together, but should be separated.
**Only one entry per issue**
## Which entry should be split? (original from data set)
**Example:**
```
"https://anidb.net/anime/9466",
"https://anilist.co/anime/15809",
"https://anime-planet.com/anime/the-devil-is-a-part-timer",
"https://kitsu.io/anime/7314",
"https://myanimelist.net/anime/15809",
"https://notify.moe/anime/CGnFpKimR"
"https://anidb.net/anime/16104",
"https://anilist.co/anime/130592",
"https://anime-planet.com/anime/the-devil-is-a-part-timer-2",
"https://kitsu.io/anime/44113",
"https://myanimelist.net/anime/48413",
"https://notify.moe/anime/Zy3-TV8MR"
```
## How should it be split?
**Example:**
```
"https://anidb.net/anime/9466",
"https://anilist.co/anime/15809",
"https://anime-planet.com/anime/the-devil-is-a-part-timer",
"https://kitsu.io/anime/7314",
"https://myanimelist.net/anime/15809",
"https://notify.moe/anime/CGnFpKimR"
```
```
"https://anidb.net/anime/16104",
"https://anilist.co/anime/130592",
"https://anime-planet.com/anime/the-devil-is-a-part-timer-2",
"https://kitsu.io/anime/44113",
"https://myanimelist.net/anime/48413",
"https://notify.moe/anime/Zy3-TV8MR"
```

View file

@ -1,19 +0,0 @@
---
name: Problem in data extraction
about: Is there a problem in the data extraction?
title: ''
labels: ''
assignees: manami-project
---
Please read the [FAQ](https://github.com/manami-project/anime-offline-database/blob/master/.github/CONTRIBUTING.md#faq) first.
* Which data is not extracted correctly? (e.g. title, episodes...)
* Can you provide an example entry?
* Which value is expected?

View file

@ -1,10 +0,0 @@
---
name: Question
about: You have a question which was not covered by the FAQ?
title: ''
labels: question
assignees: manami-project
---
Please read the [FAQ](https://github.com/manami-project/anime-offline-database/blob/master/.github/CONTRIBUTING.md#faq) first.

View file

@ -1,37 +0,0 @@
name: Check JSON files
on:
push:
branches:
- '**'
paths-ignore:
- 'README.md'
- '.gitignore'
- '.gitattributes'
- '.github/**/*'
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Setup node environment
uses: actions/setup-node@v1
with:
node-version: '14'
- name: Install jsonlint
run: npm install jsonlint -g
- name: Check anime-offline-database.json
run: jsonlint -q anime-offline-database.json
- name: Check anime-offline-database-minified.json
run: jsonlint -q anime-offline-database-minified.json
- name: Check dead-entries for anidb
run: jsonlint -q dead-entries/anidb.json
- name: Check dead-entries for anilist
run: jsonlint -q dead-entries/anilist.json
- name: Check dead-entries for kitsu
run: jsonlint -q dead-entries/kitsu.json
- name: Check dead-entries for livechart
run: jsonlint -q dead-entries/livechart.json
- name: Check dead-entries for myanimelist
run: jsonlint -q dead-entries/myanimelist.json

View file

@ -1,29 +0,0 @@
/*
!.gitignore
!README.md
!anime-offline-database.json
!anime-offline-database-minified.json
!.github/
.github/*
!.github/CONTRIBUTING.md
!.github/workflows/
.github/workflows/*
!.github/workflows/json_lint.yml
!.github/ISSUE_TEMPLATE/
.github/ISSUE_TEMPLATE/*
!.github/ISSUE_TEMPLATE/problem-in-data-extraction.md
!.github/ISSUE_TEMPLATE/question.md
!.github/ISSUE_TEMPLATE/falsely-merged-entries.md
!.github/ISSUE_TEMPLATE/config.yml
!dead-entries/
dead-entries/*
!dead-entries/anidb.json
!dead-entries/anilist.json
!dead-entries/kitsu.json
!dead-entries/myanimelist.json
!dead-entries/livechart.json

View file

@ -1,229 +0,0 @@
![CI build status](https://github.com/manami-project/anime-offline-database/workflows/Check%20JSON%20files/badge.svg "CI build status: Check JSON files")
# anime-offline-database
The purpose of this repository is to create an offline database containing anime meta data aggregated by different anime meta data providers (such as myanimelist.net, anidb.net, kitsu.io and more) and allow cross references between those meta data providers. This file is supposed to be used by and created for [manami](https://github.com/manami-project/manami).
**The goal is to deliver at least weekly updates.**
## Statistics
Update **week 48 [2021]**
The database consists of **33043** entries composed of:
+ 23233 entries from myanimelist.net
+ 18215 entries from anime-planet.com
+ 17231 entries from kitsu.io
+ 16208 entries from anisearch.com
+ 15526 entries from anilist.co
+ 15175 entries from notify.moe
+ 12127 entries from anidb.net
+ 9562 entries from livechart.me
Missed updates:
+ **2021:** 0 _(so far)_
+ **2020:** 0
+ **2019:** 2
+ **2018:** 1
## Structure
This repository contains various JSON files. The database file itself as well as one file containing IDs of dead entries for each meta data provider to support the automated process.
### anime-offline-database-minified.json
Minified version of `anime-offline-database.json` which contains the same data, but is smaller in size.
### anime-offline-database.json
#### Data types
**Root**
| Field | Type | Nullable |
| --- | --- | --- |
| data | ```Anime[]``` | no |
**Anime**
| Field | Type | Nullable |
| --- | --- | --- |
| sources | ```URL[]``` | no |
| title | ```String``` | no |
| type | ```Enum of [TV, MOVIE, OVA, ONA, SPECIAL, UNKNOWN]``` | no |
| episodes | ```Integer``` | no |
| status | ```Enum of [FINISHED, ONGOING, UPCOMING, UNKNOWN]``` | no |
| animeSeason | ```AnimeSeason``` | no |
| picture | ```URL``` | no |
| thumbnail | ```URL``` | no |
| synonyms | ```String[]``` | no |
| relations | ```URL[]``` | no |
| tags | ```String[]``` | no |
**AnimeSeason**
| Field | Type | Nullable |
| --- | --- | --- |
| season | ```Enum of [SPRING, SUMMER, FALL, WINTER, UNDEFINED]``` | no |
| year | ```Integer``` | yes |
#### Example:
```json
{
"data": [
{
"sources": [
"https://anidb.net/anime/4563",
"https://anilist.co/anime/1535",
"https://anime-planet.com/anime/death-note",
"https://anisearch.com/anime/3633",
"https://kitsu.io/anime/1376",
"https://livechart.me/anime/3437",
"https://myanimelist.net/anime/1535",
"https://notify.moe/anime/0-A-5Fimg"
],
"title": "Death Note",
"type": "TV",
"episodes": 37,
"status": "FINISHED",
"animeSeason": {
"season": "FALL",
"year": 2006
},
"picture": "https://cdn.myanimelist.net/images/anime/9/9453.jpg",
"thumbnail": "https://cdn.myanimelist.net/images/anime/9/9453t.jpg",
"synonyms": [
"Bilježnica smrti",
"Caderno da Morte",
"Carnet de la Mort",
"DEATH NOTE",
"DN",
"Death Note - A halállista",
"Death Note - Carnetul morţii",
"Death Note - Zápisník smrti",
"Mirties Užrašai",
"Notatnik śmierci",
"Notes Śmierci",
"Quaderno della Morte",
"Sveska Smrti",
"Ölüm Defteri",
"Τετράδιο Θανάτου",
"Бележник на Смъртта",
"Записник Смерті",
"Свеска Смрти",
"Тетрадка на Смъртта",
"Тетрадь cмерти",
"Үхлийн Тэмдэглэл",
"מחברת המוות",
"دفترچه مرگ",
"دفترچه یادداشت مرگ",
"كـتـاب الـموت",
"مدونة الموت",
"مذكرة الموت",
"موت نوٹ",
"डेथ नोट",
"ですのーと",
"デスノート",
"死亡笔记",
"데스노트"
],
"relations": [
"https://anidb.net/anime/8146",
"https://anidb.net/anime/8147",
"https://anilist.co/anime/2994",
"https://anime-planet.com/anime/death-note-rewrite-1-visions-of-a-god",
"https://anime-planet.com/anime/death-note-rewrite-2-ls-successors",
"https://anisearch.com/anime/4441",
"https://anisearch.com/anime/5194",
"https://kitsu.io/anime/2707",
"https://livechart.me/anime/3808",
"https://myanimelist.net/anime/2994",
"https://notify.moe/anime/DBBU5Kimg"
],
"tags": [
"alternative present",
"amnesia",
"anti-hero",
"asexual",
"asia",
"based on a manga",
"contemporary fantasy",
"cops",
"crime",
"crime fiction",
"criminals",
"detective",
"detectives",
"drama",
"earth",
"espionage",
"fantasy",
"genius",
"gods",
"hero of strong character",
"horror",
"japan",
"kamis",
"kuudere",
"male protagonist",
"manga",
"mind games",
"mystery",
"overpowered main characters",
"philosophy",
"plot continuity",
"police",
"policeman",
"present",
"primarily adult cast",
"primarily male cast",
"psychological",
"psychological drama",
"psychopaths",
"revenge",
"rivalries",
"secret identity",
"serial killers",
"shinigami",
"shounen",
"supernatural",
"supernatural drama",
"thriller",
"time skip",
"tragedy",
"twisted story",
"university",
"urban",
"urban fantasy",
"vigilantes"
]
}
]
}
```
### dead-entries
Contains IDs which have been removed from the database of the corresponding meta data provider.
#### Data types
| Field | Type | Nullable |
| --- | --- | --- |
| deadEntries | ```String[]``` | no |
#### Example
```json
{
"deadEntries": [
"38492",
"38518",
"38522",
"38531"
]
}
```
## Other projects using this database
If you have a project that uses this database and you want to add it to this list, please read the [contribution guidelines](./.github/CONTRIBUTING.md) first.
|Project|Author/Maintainer|Short description|
|----|----|----|
|[adb-zeppelin-statistics](https://github.com/manami-project/adb-zeppelin-statistics)|[manami-project](https://github.com/manami-project)|A set of statistics and insights about anime on MAL.|
|[animanga-wordlist](https://github.com/ryuuganime/animanga-wordlist)|[ryuuganime](https://github.com/ryuuganime)|Japanese Anime, Manga, Characters, and Studio Word List/Dictionary|
|[arm-server](https://github.com/BeeeQueue/arm-server)|[BeeeQueue](https://github.com/BeeeQueue)|A REST API for querying this database.|
|[manami](https://github.com/manami-project/manami)|[manami-project](https://github.com/manami-project)|A tool to catalog anime on your hard drive and discover new anime to watch.|

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

933823
db_animes/db_animes.json Normal file

File diff suppressed because it is too large Load diff

2
main.R
View file

@ -3,7 +3,7 @@ library("tidyverse")
data<-fromJSON("C:\\Users\\Marianne\\Desktop\\projet-analyse-exploratoire\\anime-offline-database-master\\anime-offline-database.json")
data<-fromJSON("C:\\Users\\Marianne\\Desktop\\projet-analyse-exploratoire\\db_animes\\db_animes.json")
dfAnimes <- as.data.frame(data)
#Nettoyage des colonnes non utilisées

Binary file not shown.