Automated Data Extraction Using Web Scraping Techniques and AI for Creating News and Events Database in Thailand's Southern Border Provinces
Main Article Content
Abstract
This research aims to develop and apply an automated data extraction system utilizing Web Scraping techniques and Artificial Intelligence (AI) to establish a comprehensive database of news and events in Thailand's Southern Border Provinces. Furthermore, the study evaluates the system's technical and quantitative performance to enhance information management efficiency at the John F. Kennedy Library, Prince of Songkla University, Pattani Campus.
The implementation process consists of four main stages: (1) data collection via Web Scraping from over 15 targeted national and local news sources; (2) data processing and classification using the Naive Bayes Classifier algorithm; (3) data integration and redundancy reduction; and (4) database design and storage.
The findings indicate that the system significantly reduced data archiving time from 1–2 hours to merely 10–15 minutes per day, representing a reduction of approximately 85%. The system demonstrated a high extraction accuracy of 97% and a duplicate detection rate of 98%. Additionally, the classification accuracy reached 92%, while the processing error rate was reduced to only 2%. The system successfully expanded coverage to include both national and local levels. Expert evaluation rated the system’s overall efficiency as "very good" ( = 4.85, SD = 0.18), particularly highlighting its precision and processing speed. These results significantly contribute to supporting policy planning, situational monitoring, and strategic decision-making in the region, while establishing a reliable database foundation for sustainable future application.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
References
จักรินทร์ สันติรัตนภักดี. (2565). กระบวนการสกัดข้อมูลรายงานอุบัติเหตุทางถนนรายใหญ่และความสามารถในการนำเสนอสารสนเทศด้วยภาพข้อมูลผ่านเว็บไซต์. วารสารศรีนครินทรวิโรฒวิจัยและพัฒนา (สาขามนุษยศาสตร์และสังคมศาสตร์), 14(27), 14-34. https://so04.tci-thaijo.org/index.php/swurd/article/view/259751
ศตวรรษ รามไชย และ ผุสดี พรผล. (2565). การเตรียมข้อมูลจากเว็บในอุตสาหกรรมการท่องเที่ยว: กรณีที่พักในจังหวัดภูเก็ต. ใน การประชุมวิชาการระดับชาติ ด้านวิทยาศาสตร์และเทคโนโลยี เครือข่ายสถาบันอุดมศึกษา ภาคใต้ ครั้งที่ 7 (น.1-10). ฐานข้อมูลวิจัย สถาบันวิจัยและพัฒนา มหาวิทยาลัยราชภัฏภูเก็ต.
Bhatt, C., Bisht, A., Chauhan, R., Vishvakarma, A., Kumar, M., & Sharma, S. (2023). Web scraping techniques and its applications: A review [Conference session]. In 2023 3rd International Conference on Innovative Sustainable Computational Technologies (CISCT) (pp. 1-8). https://doi.org/10.1109/cisct57197.2023.10351298
Bhujbal, M., Bibawanekar, B., & Deshmukh, P. (2023). News aggregation using web scraping news portals. International Journal of Advanced Research in Science, Communication and Technology, 3(2), 275-284. https://doi.org/10.48175/IJARSCT-12138
Farias, W. A. S., Melo, D. M. A., Santos, L. M. dos, de Oliveira, Â. A. S., Medeiros, R. L. B. A., & Silva, Y. K. R. O. (2024). Web scraping as a scientific tool for theoretical reference. https://doi.org/10.21203/rs.3.rs-3854342/v1
Li, Q., Peng, H., Li, J., Xia, C., Yang, R., Sun, L., Yu, P. S., & He, L. (2022). A survey on text classification: From traditional to deep learning. ACM Transactions on Intelligent Systems and Technology (TIST), 13(2), 1–54. https://doi.org/10.1145/3495162
Minaee, S., Kalchbrenner, N., Cambria, E., Nikzad, N., Chenaghlu, M., & Gao, J. (2021). Deep learning-based text classification: A comprehensive review. ACM Computing Surveys (CSUR), 54(3), 1–40. https://doi.org/10.1145/3439726
Mitchell, R. (2018). Web scraping with Python: Collecting data from the modern web (2nd ed.). O'Reilly Media.
Pant, S., Yadav, E. N., Milan, Sharma, M., Bedi, Y., & Raturi, A. (2024). Web scraping using beautifulsoup [Conference session]. In 2024 International Conference on Knowledge Engineering and Communication Systems (ICKECS) (pp. 1-6). Chikkaballapur, India. https://doi.org/10.1109/ickecs61492.2024.10617017
Slamet, C., Andrian, R., Maylawati, D. S., Suhendar, Darmalaksana, W., & Ramdhani, M. A. (2018). Web scraping and Naïve Bayes Classification for job search engine. 288(1):012038-. https://doi.org/10.1088/1757-899X/288/1/012038
Valova, I., Mladenova, T., Kanev, G., & Halacheva, T. (2023). Web scraping - state of art, techniques and approaches [Conference session]. In 2023 31st National Conference with International Participation (TELECOM) (pp. 1-4). Sofia, Bulgaria. https://doi.org/10.1109/telecom59629.2023.10409723
Zhang, H. (2004). The optimality of naive Bayes. In Proceedings of the 17th International Florida Artificial Intelligence Research Society Conference (FLAIRS 2004) (pp. 562–567). AAAI Press.