일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | |||||
3 | 4 | 5 | 6 | 7 | 8 | 9 |
10 | 11 | 12 | 13 | 14 | 15 | 16 |
17 | 18 | 19 | 20 | 21 | 22 | 23 |
24 | 25 | 26 | 27 | 28 | 29 | 30 |
- Jupyter notebook
- shakeratos
- colaboratory
- install
- dataset
- download
- object detection
- face_recognition
- python3
- error
- raspberry pi
- FIle
- keras
- Anaconda
- dlib
- urllib
- YouTube 8M
- TensorFlow
- Deep Learning
- ppc64le
- linux
- CUDA
- Windows 10
- gpu memory
- windows
- colab
- python
- 딥러닝
- pyTorch
- ubuntu
- Today
- Total
Shakerato
YouTube 8M dataset download 본문
* What is YouTube 8M dataset?
https://research.google.com/youtube8m/download.html
* How to download the dataset?
* You have to download the dataset only on the Linux! (not Windows),
because of duplicated file names.
(Windows does not allow this situation: 'Ab.txt', 'aB.txt' in the same folder)
1. Download 'download_fix.py' file from this URL:
https://storage.googleapis.com/data.yt8m.org/download_fix.py
2. Change codes to download the dataset (video-level feature)
No need to follow this step. You can download dataset using this command
'curl data.yt8m.org/download.py | partition=2/video/train mirror=us python'
I modified the code because sometimes downloading is interrupted due to the network latency.
2.1. os.environ['partition'] -> '2/video/train', '2/video/validate', '2/video/test'
2.2. os.environ['mirror'] -> 'us' or 'eu' or 'asia'
2.3. Comment all codes for checking 'os.environ' is setted as (partition, mirror, shard)
2.4. Comment the code for checking 'curl' is installed
2.5. Change the code for download to like below link (using urllib) - this step is impotent.
(https://github.com/entymos/crawling/blob/master/python3_file_download)
2.5.1. os.system('curl %s > %s' % (plan_url, plan_filename))
-> fileDownload(plan_url, plan_filename)
2.5.2. os.system('curl %s > %s' % (download_url, f))
->fileDownload(download_url, f)
3. Run code for each dataset (train, validate, test) (change the code at the step 2.1)