Indonesian News Corpus

Main Author: RAHUTOMO, FAISAL
Other Authors: MIQDAD MUADZ MUZAD, AAD
Format: Dataset
Terbitan: Mendeley , 2018
Subjects:
Online Access: https:/data.mendeley.com/datasets/2zpbjs22k3
Daftar Isi:
  • This corpus contains 150,466 news articles, which is derived from several freely accessible Indonesian news website. The corpus is designated for research purpose only. The news websites are: • kompas.com is a registered trademark of PT. Kompas Cyber Media. https://inside.kompas.com/about-us • tempo.co is a registered trademark of PT INFO MEDIA DIGITAL. https://www.tempo.co/about • merdeka.com is a registered trademark of PT KAPAN LAGI DOT COM NETWORKS. https://www.merdeka.com/company/tentang-kami.html • republika.co.id is a registered trademark of PT Republika Media Mandiri. https://www.republika.co.id/page/about • viva.co.id is a registered trademark of PT. Viva Media Baru. https://www.viva.co.id/tentang-kami • tribunnews.com is a registered trademark of PT Tribun Digital Online. http://www.tribunnews.com/about-us The corpus is a part of bachelor thesis work of Aad Miqdad Muadz Muzad under the supervision of Faisal Rahutomo. We crawled several categories of the websites for 6 months from July 2015 until December 2015.