慧星云整理了一些常用的公开数据集,方便您在实例中使用。
使用方法
1、登录方式
使用 ssh 命令登录工作区,例如:
ssh -p 14917 root@hz1.dc.houdeyun.cn2、公共数据位置
公共盘挂载在
/root/public/ 目录,您可以通过运行 df 命令来检查目录是否已正确挂载。
3、使用方式
由于用户只具有对公共盘的只读权限,请确保将压缩文件解压到您本地后再使用。示例命令如下:
unzip /root/public/ModelNet/ModelNet10.zip -d /root/ModelNet10公共数据
| 数据名称 | 实例中路径 | 大小 | 类型 | 发布方 | 介绍 |
| argoverse2.0感知数据集 | /root/public/datasets/argoverse2.0-sensor | 739.02 GiB | 数据集 | https://argoverse.github.io | https://argoverse.github.io/user-guide/ |
| Vimeo-90k | /root/public/datasets/Vimeo-90k | 81.89 GiB | 数据集 | toflow.csail.mit.edu | Vimeo-90k视频超分数据集 |
| CULane | /root/public/datasets/CULane | 42.45 GiB | 数据集 | https://xingangpan.github.io/projects/CULane.html | CULane is a large scale challenging dataset for academic research on traffic lane detection |
| TT100K | /root/public/datasets/TT100K | 106.77 GiB | 数据集 | https://cg.cs.tsinghua.edu.cn/traffic-sign/ | 交通信号灯检测与识别数据集 |
| cifar-100 | /root/public/datasets/cifar-100 | 161.17 MiB | 数据集 | https://www.cs.toronto.edu/~kriz/cifar.html | CIFAR-100图像分类数据集 |
| CUB200-2011 | /root/public/datasets/CUB200-2011 | 1.11 GiB | 数据集 | http://www.vision.caltech.edu/datasets/cub_200_2011/ | 鸟类细粒度分类数据集 |
| ModelNet | /root/public/datasets/ModelNet | 2.34 GiB | 数据集 | https://modelnet.cs.princeton.edu/ | The goal of the Princeton ModelNet project is to provide researchers in computer vision, computer graphics, robotics and cognitive science, with a comprehensive clean collection of 3D CAD models for objects. |
| S3DIS | /root/public/datasets/S3DIS | 14.26 GiB | 数据集 | http://buildingparser.stanford.edu/dataset.html | Stanford Large-Scale 3D Indoor Spaces Dataset (S3DIS) |
| Aishell | /root/public/datasets/Aishell | 14.51 GiB | 数据集 | http://openslr.org/33/ | 400 people from different accent areas in China are invited to participate in the recording, which is conducted in a quiet indoor environment using high fidelity microphone and downsampled to 16kHz. |
| CrowdHuman | /root/public/datasets/CrowdHuman | 13.25 GiB | 数据集 | https://www.crowdhuman.org/ | The CrowdHuman dataset is large, rich-annotated and contains high diversity. CrowdHuman contains 15000, 4370 and 5000 images for training, validation, and testing, respectively. |
| MsCelebV1 | /root/public/datasets/MS-Celeb-1M | 154.4 GiB | 数据集 | http://research.microsoft.com/en-US/projects/irc/acmmm2016.aspx | 微软名人数据集 |
| DIV2K | /root/public/datasets/DIV2K | 8.45 GiB | 数据集 | https://data.vision.ee.ethz.ch/cvl/DIV2K/ | DIVerse 2K resolution high quality images as used for the challenges @ NTIRE (CVPR 2017 and CVPR 2018) and @ PIRM (ECCV 2018) |
| nuScenes | /root/public/datasets/nuScenes | 548.35 GiB | 数据集 | https://www.nuscenes.org/ | nuScenes is a public large-scale dataset for autonomous driving. It enables researchers to study challenging urban driving situations using the full sensor suite of a real self-driving car. |
| CelebA | /root/public/datasets/CelebA | 21.69 GiB | 数据集 | http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html | CelebFaces Attributes Dataset (CelebA) is a large-scale face attributes dataset with more than 200K celebrity images, each with 40 attribute annotations. |
| KITTI | /root/public/datasets/KITTI | 132.99 GiB | 数据集 | https://www.cvlibs.net/datasets/kitti | KITTI数据集 |
| KITTI_Depth_Completion | /root/public/datasets/KITTI/kitti_depth_completion | 92.7 GiB | 数据集 | http://www.cvlibs.net/datasets/kitti/eval_depth.php?benchmark=depth_completion | KITTI深度补全数据集 |
| SemanticKITTI | /root/public/datasets/SemanticKITTI | 82.83 GiB | 数据集 | http://www.semantic-kitti.org/dataset.html#download | SemanticKITTI数据集 |
| MPII Human Pose | /root/public/datasets/mpii_human_pose | 11.27 GiB | 数据集 | http://human-pose.mpi-inf.mpg.de/#download | MPII Human Pose数据集 |
| MVTec AD | /root/public/datasets/mvtec-ad | 4.9 GiB | 数据集 | https://www.mvtec.com/company/research/datasets/mvtec-ad | 工业异常检测的数据集 |
| ImageNet100 | /root/public/datasets/ImageNet100 | 13.41 GiB | 数据集 | image-net.org | ImageNet 100类数据集。参考:https://github.com/HobbitLong/CMC/blob/master/imagenet100.txt |
| ImageNet | /root/public/datasets/imagenet-1k | 157.56 GiB | 数据集 | image-net.org | ImageNet 1000类分类识别数据集 |
| SAIL-VOS | /root/public/datasets/SAIL-VOS | 173.18 GiB | 数据集 | https://sailvos.web.illinois.edu/_site/dataset_readme.html | 语义非模态实例级视频对象分割数据集(内蒙A区有该数据集) |
| MOT17 | /root/public/datasets/mot17 | 5.46 GiB | 数据集 | https://motchallenge.net/data/MOT17/ | MOT17 Challenge |
| Cityscapes | /root/public/datasets/cityscapes | 11.03 GiB | 数据集 | www.cityscapes-dataset.net | 城市街景实例/语义分割 |
| GOT10k | /root/public/datasets/GOT10k | 71.11 GiB | 数据集 | got-10k.aitestunion.com | 大型目标跟踪数据集 |
| MOT20 | /root/public/datasets/mot20 | 4.7 GiB | 数据集 | motchallenge.net/data/MOT20/ | 密集人群中行人跟踪数据集(多目标跟踪) |
| CASIAWebFace | /root/public/datasets/CASIAWebFace | 4.1 GiB | 数据集 | www.cbsr.ia.ac.cn/english/CASIA-WebFace-Database.html | 大规模人脸数据集,主要用于身份鉴定和人脸识别,包含 10,575 个主题和 494,414 张图像 |
| DOTA v1 | /root/public/datasets/DOTA | 18.83 GiB | 数据集 | captain-whu.github.io/DOTA | 航拍图像物体检测数据集 |
| ADEChallengeData2016 | /root/public/datasets/ADEChallengeData2016 | 1.1 GiB | 数据集 | sceneparsing.csail.mit.edu | ADE20K场景语义分割数据集 |
| COCO 2017 | /root/public/datasets/coco2017 | 25.19 GiB | 数据集 | Microsoft | COCO 2017检测数据集 |
| CIFAR10 | /root/public/datasets/cifar-10 | 163 MB | 数据集 | www.cs.toronto.edu | CIFAR10 分类数据集 |
| PASCAL VOC2012 | /root/public/datasets/voc2012 | 1.8 GiB | 数据集 | host.robots.ox.ac.uk | VOC 2012检测和语义分割数据集 |
| PASCAL VOC2007 | /root/public/datasets/voc2007 | 837 MB | 数据集 | host.robots.ox.ac.uk | VOC 2007检测和语义分割数据集 |
| RoBERTa预训练模型(Torch) | /root/public/models/RoBERTa-Pretrain-Model | 1.06 GiB | 模型 | 参考:https://docs.qq.com/sheet/DVnpkTnF6VW9UeXdh?tab=BB08J2 | RoBERTa预训练模型 |
| 开源中英双语对话模型 | /root/public/models/chatglm2-6b | 11.63 GiB | 模型 | https://huggingface.co/THUDM/chatglm2-6b | ChatGLM2-6B 是开源中英双语对话模型 ChatGLM-6B 的第二代版本 |