Exploring entrepreneurial phases with machine learning models: Evidence from Hungary

Abstract
Objective: The article aims to explore the potential differences between the two phases of entrepreneurship, i.e., total early-stage entrepreneurial activity and established business, as defined by the Global Entrepreneurship Monitor (GEM). The study aimed to classify entrepreneurs using various machine learning models and to evaluate their classification performance comparatively.
Research Design & Methods: Using the Hungarian GEM datasets from 2021 to 2023, we analysed a subsample of 964 entrepreneurs. Due to inconsistent results from traditional analyses (e.g., correlations, regressions, principal component analyses), we employed machine learning approaches (supervised learning classification methods) to uncover latent relationships between variables.
Findings: The study utilized seven machine learning classification methods to examine the feasibility of grouping companies within the sample using Hungarian GEM data. Findings indicate that machine learning techniques are particularly effective for classifying businesses, although the performance of each method varies significantly.
Implications & Recommendations: These results provide valuable insights for researchers in selecting methodologies to identify various business phases. Moreover, they offer practical benefits for market research professionals, suggesting that machine learning techniques can enhance the classification and understanding of entrepreneurial phases.
Contribution & Value Added: The study adds to the existing body of knowledge by demonstrating the effectiveness of machine learning methods in classifying business phases. It highlights the variability in performance across different machine learning techniques, thereby guiding future research and practical applications in market research and entrepreneurship studies.
Keywords
entrepreneurship, responsibility, Global Entrepreneurship Monitor, GEM, machine learning
Author Biography
Áron Szennay
Senior Research Fellow at the Budapest LAB Office for Entrepreneurship Development, Budapest University of Economics and Business, PhD in regional sciences, and author of publications on entrepreneurship and its concerns regarding sustainability. His research interests include entrepreneurship, digitalisation, sustainability.
Judit Csákné Filep
Senior Research Fellow at the Budapest LAB Office for Entrepreneurship Development, Budapest University of Economics and Business. She holds a PhD in Management and Business Administration. She is the author of publications on family business and entrepreneurship. As the National Team Leader for Hungary in the Global Entrepreneurship Monitor, she is a major contributor to national and international entrepreneurship research. She also heads the Family Business Research Programme at the Budapest Business University. She is committed to the development of Hungarian family businesses and is a frequent contributor to conferences, podcasts, and other media engagements focused on the sector. Her research and interests include family business and entrepreneurship.
Melinda Krankovits
Assistant professor at Széchenyi István University, Department of Mathematics and Computer Science, PhD in regional sciences. Author of publications on distance learning in higher education and a quality assurance expert. She also contributes to papers on other fields as an expert in AI and machine learning. Her research and interests include machine learning, data processing, and data mining.
References
- Acs, Z. (2006). How Is Entrepreneurship Good for Economic Growth?. Innovations: Technology, Governance, Globalization, 1(1), 97-107. https://doi.org/10.1162/itgg.2006.1.1.97
- Adolfo, C.M.S., Chizari, H., Win, T.Y., & Al-Majeed, S. (2021). Sample Reduction for Physiological Data Analysis Using Principal Component Analysis in Artificial Neural Network. Applied Sciences, 11(17), Article 17. https://doi.org/10.3390/app11178240
- Ahn, K., & Winters, J.V. (2023). Does education enhance entrepreneurship?. Small Business Economics, 61(2), 717-743. https://doi.org/10.1007/s11187-022-00701-x
- Akanmu, S.A., & Gilal, A.R. (2019). A Boosted Decision Tree Model for Predicting Loan Default in P2P Lending Communities. International Journal of Engineering and Advanced Technology (IJEAT), 9(1), 1257-1261. https://doi.org/10.35940/ijeat.a9626.109119
- Alves, M.V.S., Maciel, L.I.L., Passos, J.O.S., Morais, C.L.M., dos Santos, M.C.D., Lima, L.A.S., Vaz, B.G., Pegado, R., & Lima, K.M.G. (2023). Spectrochemical approach combined with symptoms data to diagnose fibromyalgia through paper spray ionization mass spectrometry (PSI-MS) and multivariate classification. Scientific Reports, 13(1), 4658. https://doi.org/10.1038/s41598-023-31565-0
- Amit, R., MacCrimmon, K.R., Zietsma, C., & Oesch, J.M. (2001). Does money matter?: Wealth attainment as the motive for initiating growth-oriented technology ventures. Journal of Business Venturing, 16(2), 119-143. https://doi.org/10.1016/S0883-9026(99)00044-0
- Ashtiyani, M., Navaei Lavasani, S., Asgharzadeh Alvar, A., & Deevband, M.R. (2018). Heart Rate Variability Classification using Support Vector Machine and Genetic Algorithm. Journal of Biomedical Physics and Engineering, 8(4), 423-434. https://doi.org/10.31661/jbpe.v0i0.614
- Bhukya, D.P., & Ramachandram, S. (2010). Decision Tree Induction: An Approach for Data Classification Using AVL-Tree. International Journal of Computer and Electrical Engineering, 660-665. https://doi.org/10.7763/IJCEE.2010.V2.208
- Bhuyan, H.K., & Kamila, N.K. (2015). Privacy preserving sub-feature selection in distributed data mining. Applied Soft Computing, 36, 552-569. https://doi.org/10.1016/j.asoc.2015.06.060
- Cañete-Sifuentes, S., Monroy, R., & Medina-Pérez, M.A. (2021). A Review and Experimental Comparison of Multivariate Decision Trees. IEEE Access9, 110451-110479. https://doi.org/10.1109/ACCESS.2021.3102239
- Celbiş, M.G. (2021). A machine learning approach to rural entrepreneurship. Papers in Regional Science, 100(4), 1079-1105. https://doi.org/10.1111/pirs.12595
- Chanu, U.S., Singh, K.J., & Chanu, Y.J. (2022). An ensemble method for feature selection and an integrated approach for mitigation of distributed denial of service attacks. Concurrency and Computation: Practice and Experience, 34(13), e6919. https://doi.org/10.1002/cpe.6919
- Chen, C.-W., Tsai, Y.-H., Chang, F.-R., & Lin, W.-C. (2020). Ensemble feature selection in medical datasets: Combining filter, wrapper, and embedded feature selection results. Expert Systems, 37(5), e12553. https://doi.org/10.1111/exsy.12553
- Chen, Z. (2023). Medical Image Segmentation Based on U-Net. Journal of Physics: Conference Series, 2547(1), 012010. https://doi.org/10.1088/1742-6596/2547/1/012010
- Chu, W.-M., Tsan, Y.-T., Chen, P.-Y., Chen, C.-Y., Hao, M.-L., Chan, W.-C., Chen, H.-M., Hsu, P.-S., Lin, S.-Y., & Yang, C.-T. (2023). A model for predicting physical function upon discharge of hospitalized older adults in Taiwan—A machine learning approach based on both electronic health records and comprehensive geriatric assessment. Frontiers in Medicine, 10. https://doi.org/10.3389/fmed.2023.1160013
- Chung, D. (2023). Machine learning for predictive model in entrepreneurship research: Predicting entrepreneurial action. Small Enterprise Research, 30(1), 89-106. https://doi.org/10.1080/13215906.2022.2164606
- Csákné Filep, J., Radácsi, L., Szennay, Á., & Timár, G. (2023). Taking initiative and earning a living – Entrepreneurial motivations and opportunity perception in Hungary. Budapesti Gazdasági Egyetem. Retrieved from https://budapestlab.hu/wpcontent/uploads/2023/08/GEM-BGE_beliv_2023_angol_webre.pdf on November 21, 2023.
- Damoah, O.B.O. (2020). Strategic factors predicting the likelihood of youth entrepreneurship in Ghana: A logistic regression analysis. World Journal of Entrepreneurship, Management and Sustainable Development, 16(4), 389-401. https://doi.org/10.1108/WJEMSD-06-2018-0057
- Filser, M., & Eggers, F. (2014). Entrepreneurial orientation and firm performance: A comparative study of Austria, Liechtenstein and Switzerland. South African Journal of Business Management, 45(1), Article 1.
- GEM. (Global Entrepreneurship Monitor). (2022). Global Entrepreneurship Monitor 2021/2022. Global Report: Opportunity Amid Disruption. Retrieved from https://gemconsortium.org/report/gem-20212022-global-report-opportunity-amid-disruption on March 3, 2023.
- GEM. (Global Entrepreneurship Monitor). (2023). Global Entrepreneurship Monitor 2022/2023 Global Report: Adapting to a “New Normal”. Retrieved from https://gemconsortium.org/file/open?fileId=51147 Retrieved on March 3, 2023.
- GEM. (Global Entrepreneurship Monitor). (2024). Global Entrepreneurship Monitor 2023/2024 Global Report: 25 Years and Growing. Retrieved from https://www.gemconsortium.org/report/global-entrepreneurship-monitor-gem-20232024-global-report-25-years-and-growing Retrieved on March 3, 2023.
- Georganos, S., Grippa, T., Vanhuysse, S., Lennert, M., Shimoni, M., & Wolff, E. (2018). Very High Resolution Object-Based Land Use–Land Cover Urban Classification Using Extreme Gradient Boosting. IEEE Geoscience and Remote Sensing Letters, 15(4), 607-611. IEEE Geoscience and Remote Sensing Letters. https://doi.org/10.1109/LGRS.2018.2803259
- Idris, N.F., & Ismail, M.A. (2021). Breast cancer disease classification using fuzzy-ID3 algorithm with FUZZYDBD method: Automatic fuzzy database definition. PeerJ Computer Science, 7, e427. https://doi.org/10.7717/peerj-cs.427
- Jameel, M.M. (2023). Enhancement of E-Banking System in Iraq by web application-based authentication system using face recognition. Wasit Journal for Pure Sciences, 2(4), https://doi.org/10.31185/wjps.252
- Jin, Y., Guo, J., Ye, H., Zhao, J., Huang, W., & Cui, B. (2021). Extraction of Arecanut Planting Distribution Based on the Feature Space Optimization of PlanetScope Imagery. Agriculture, 11(4), https://doi.org/10.3390/agriculture11040371
- Joensuu-Salo, S., Viljamaa, A., & Varamäki, E. (2021). Understanding Business Takeover Intentions—The Role of Theory of Planned Behavior and Entrepreneurship Competence. Administrative Sciences, 11(3), https://doi.org/10.3390/admsci11030061
- Jozdani, S.E., Johnson, B.A., & Chen, D. (2019). Comparing Deep Neural Networks, Ensemble Classifiers, and Support Vector Machine Algorithms for Object-Based Urban Land Use/Land Cover Classification. Remote Sensing, 11(14). https://doi.org/10.3390/rs11141713
- Kachlami, H., Yazdanfar, D., & Öhman, P. (2017). Regional demand and supply factors of social entrepreneurship. International Journal of Entrepreneurial Behavior & Research, 24(3), 714-733. https://doi.org/10.1108/IJEBR-09-2016-0292
- Kautonen, T., Down, S., & Minniti, M. (2014). Ageing and entrepreneurial preferences. Small Business Economics, 42(3), 579-594. https://doi.org/10.1007/s11187-013-9489-5
- Kelley, D., Singer, S., Herrington, M., & Entrepreneurship Research Association (GERA). (2016). Global Entrepreneurship Monitor 2015/2016 Global Report. Retrieved from https://www.gemconsortium.org/file/open?fileId=49480 on March 3, 2023.
- Ključnikov, A., Civelek, M., Čech, P., & Kloudová, J. (2019). Entrepreneurial orientation of SMEs? Executives in the comparative perspective for Czechia and Turkey. Oeconomia Copernicana, 10(4), https://doi.org/10.24136/oc.2019.035
- Krankovits, M., Filep, J.C., & Szennay, Á. (2023). Factors of Responsible Entrepreneurial Behaviour: Empirical Findings from Hungary. Chemical Engineering Transactions, 107, 25-30. https://doi.org/10.3303/CET23107005
- Kurczewska, A., Doryń, W., & Wawrzyniak, D. (2020). An Everlasting Battle between Theoretical Knowledge and Practical Skills? The Joint Impact of Education and Professional Experience on Entrepreneurial Success. Entrepreneurial Business and Economics Review, 8(2), https://doi.org/10.15678/EBER.2020.080212
- Lafuente, E.M., & Vaillant, Y. (2013). Age driven influence of role‐models on entrepreneurship in a transition economy. Journal of Small Business and Enterprise Development, 20(1), 181-203. https://doi.org/10.1108/14626001311298475
- Lakshmi, K.S., Vadivu, G., & Subramanian, S. (2018). Predicting hyperlipidemia using enhanced ensemble classifier. International Journal of Engineering & Technology, 7(3), https://doi.org/10.14419/ijet.v7i3.10693
- Lee, Y.-C., Hsiao, Y.-C., Peng, C.-F., Tsai, S.-B., Wu, C.-H., & Chen, Q. (2015). Using Mahalanobis–Taguchi system, logistic regression, and neural network method to evaluate purchasing audit quality. Proceedings of the Institution of Mechanical Engineers, Part B: Journal of Engineering Manufacture, 229(1_suppl), 3-12. https://doi.org/10.1177/0954405414539934
- Lévesque, M., & Minniti, M. (2011). Age matters: How demographics influence aggregate entrepreneurship. Strategic Entrepreneurship Journal, 5(3), 269-284. https://doi.org/10.1002/sej.117
- Malek, N.H.A., Yaacob, W.F.W., Wah, Y.B., Nasir, S.A.M., Shaadan, N., & Indratno, S.W. (2023). Comparison of ensemble hybrid sampling with bagging and boosting machine learning approach for imbalanced data. Indonesian Journal of Electrical Engineering and Computer Science, 29(1), https://doi.org/10.11591/ijeecs.v29.i1.pp598-608
- Mathivanan, N.M.N., Md.Ghani, N.A., & Janor, R.M. (2018). Improving Classification Accuracy Using Clustering Technique. Bulletin of Electrical Engineering and Informatics, 7(3). https://doi.org/10.11591/eei.v7i3.1272
- Mienye, I.D., & Sun, Y. (2022). A Survey of Ensemble Learning: Concepts, Algorithms, Applications, and Prospects. IEEE Access, 10, 99129-99149. https://doi.org/10.1109/ACCESS.2022.3207287
- Muhathir, M., Pangestu, R.T., Safira, I., & Melisah, M. (2023). Performance Comparison of Boosting Algorithms in Spices Classification Using Histogram of Oriented Gradient Feature Extraction. Journal of Computer Science, Information Technologi and Telecommunication Engineering (JCoSITTE), 4(1). https://doi.org/10.30596/jcositte.v4i1.13710
- Najafabadi, M.M., Villanustre, F., Khoshgoftaar, T.M., Seliya, N., Wald, R., & Muharemagic, E. (2015). Deep learning applications and challenges in big data analytics. Journal of Big Data, 2(1), 1. https://doi.org/10.1186/s40537-014-0007-7
- Nath, P., Saha, P., Middya, A.I., & Roy, S. (2021). Long-term time-series pollution forecast using statistical and deep learning methods. Neural Computing and Applications, 33(19), 12551-12570. https://doi.org/10.1007/s00521-021-05901-2
- Obeidat, I., Hamadneh, N., Alkasassbeh, M., Almseidin, M., & AlZubi, M.I. (2019). Intensive Pre-Processing of KDD Cup 99 for Network Intrusion Classification Using Machine Learning Techniques. International Journal of Interactive Mobile Technologies (iJIM), 13(01), Article 01. https://doi.org/10.3991/ijim.v13i01.9679
- Park, R.C., & Hong, E.J. (2022). Urban traffic accident risk prediction for knowledge-based mobile multimedia service. Personal and Ubiquitous Computing, 26(2), 417-427. https://doi.org/10.1007/s00779-020-01442-y
- Patel, S., Wang, M., Guo, J., Smith, G., & Chen, C. (2023). A Study of R-R Interval Transition Matrix Features for Machine Learning Algorithms in AFib Detection. Sensors, 23(7), https://doi.org/10.3390/s23073700
- Peng, L., & Liu, Y. (2018). Feature Selection and Overlapping Clustering-Based Multilabel Classification Model. Mathematical Problems in Engineering, 2018(1), 2814897. https://doi.org/10.1155/2018/2814897
- Puga, J.L., & García, J.G. (2012). A Comparative Study on Entrepreneurial Attitudes Modeled with Logistic Regression and Bayes Nets. The Spanish Journal of Psychology, 15(3), 1147-1162. https://doi.org/10.5209/rev_SJOP.2012.v15.n3.39404
- Razaghzadeh Bidgoli, M., Raeesi Vanani, I., & Goodarzi, M. (2024). Predicting the success of startups using a machine learning approach. Journal of Innovation and Entrepreneurship, 13(1), 80. https://doi.org/10.1186/s13731-024-00436-x
- Reynolds, P., Bosma, N., Autio, E., Hunt, S., De Bono, N., Servais, I., Lopez-Garcia, P., & Chin, N. (2005). Global Entrepreneurship Monitor: Data Collection Design and Implementation 1998-2003. Small Business Economics, 24(3), 205-231. https://doi.org/10.1007/s11187-005-1980-1
- Rezende, P.M., Xavier, J.S., Ascher, D.B., Fernandes, G.R., & Pires, D.E.V. (2022). Evaluating hierarchical machine learning approaches to classify biological databases. Briefings in Bioinformatics, 23(4), bbac216. https://doi.org/10.1093/bib/bbac216
- Saranyadevi, S., Murugeswari, R., & Bathrinath, S. (2019). Road risk assessment using fuzzy Context-free Grammar based Association Rule Miner. Sādhanā, 44(6), 151. https://doi.org/10.1007/s12046-019-1136-7
- Sattar, H., Bajwa, I.S., & Shafi, U.F. (2019). An Intelligent Air Quality Sensing System for Open-Skin Wound Monitoring. Electronics, 8(7), https://doi.org/10.3390/electronics8070801
- Savin, I., Chukavina, K., & Pushkarev, A. (2023). Topic-based classification and identification of global trends for startup companies. Small Business Economics, 60(2), 659-689. https://doi.org/10.1007/s11187-022-00609-6
- Sharma, S., & Sharma, P. (2019). Predictive Risk Factors of Heart Disease using an Efficient Classification based Approach. International Journal of Computer Applications, 178(27), 27-30. https://doi.org/10.5120/ijca2019919028
- Singh, N., & Singh, P. (2021). A hybrid ensemble-filter wrapper feature selection approach for medical data classification. Chemometrics and Intelligent Laboratory Systems, 217, 104396. https://doi.org/10.1016/j.chemolab.2021.104396
- Soria, L.M., Ortega, F.J., Álvarez-García, J.A., Velasco, F., & Fernández-Cerero, D. (2020). How efficient deep-learning object detectors are?. Neurocomputing, 385, 231-257. https://doi.org/10.1016/j.neucom.2019.10.094
- Staartjes, V.E., Serra, C., Muscas, G., Maldaner, N., Akeret, K., Niftrik, C.H.B. van, Fierstra, J., Holzmann, D., & Regli, L. (2018). Utility of deep neural networks in predicting gross-total resection after transsphenoidal surgery for pituitary adenoma: A pilot study. Neurosurgical Focus, 45(5), E12. https://doi.org/10.3171/2018.8.FOCUS18243
- Stel, A. van, Carree, M., & Thurik, R. (2005). The Effect of Entrepreneurial Activity on National Economic Growth. Small Business Economics, 24(3), 311-321. https://doi.org/10.1007/s11187-005-1996-6
- Sternberg, R., & Wennekers, S. (2005). Determinants and Effects of New Business Creation Using Global Entrepreneurship Monitor Data. Small Business Economics, 24(3), 193-203. https://doi.org/10.1007/s11187-005-1974-z
- Stewart, R.D., Auffret, M.D., Warr, A., Walker, A.W., Roehe, R., & Watson, M. (2019). Compendium of 4,941 rumen metagenome-assembled genomes for rumen microbiome biology and enzyme discovery. Nature Biotechnology, 37(8), 953-961. https://doi.org/10.1038/s41587-019-0202-3
- Subramanian, R.S., Prabha, D., Maheswari, B., & Aswini, J. (2021). Customer Analysis Using Machine Learning Algorithms: A Case Study Using Banking Consumer Dataset. In Recent Trends in Intensive Computing (pp. 689-694). IOS Press. https://doi.org/10.3233/APC210263
- Szerb L. (2004). A vállalkozás és a vállalkozói aktivitás mérése. Statisztikai Szemle, 82(6-7), 545-566.
- Tuncer, T., Dogan, S., Özyurt, F., Belhaouari, S.B., & Bensmail, H. (2020). Novel Multi Center and Threshold Ternary Pattern Based Method for Disease Detection Method Using Voice. IEEE Access, 8, 84532-84540. IEEE Access. https://doi.org/10.1109/ACCESS.2020.2992641
- United Nations. (2015). Transforming our world: The 2030 Agenda for Sustainable Development. Retrieved from https://www.un.org/ga/search/view_doc.asp?symbol=A/RES/70/1&Lang=E on March 3, 2023.
- Urbano, D., Alvarez, C., & Turró, A. (2013). Organizational resources and intrapreneurial activities: An international study. Management Decision, 51(4), 854-870. https://doi.org/10.1108/00251741311326617
- Vaghela, B.V., Vandra, H.K., & Modi, K.N. (2012). Analysis and Comparative Study of Classifiers for Relational Data Mining. International Journal of Computer Applications, 55(7), 11-21. https://doi.org/10.5120/8765-2685
- Wach, K., & Głodowska, A. (2021). How do demographics and basic traits of an entrepreneur impact the internationalization of firms?. Oeconomia Copernicana, 12(2), Article 2. https://doi.org/10.24136/oc.2021.014
- Wang, Z., Xu, C., Liu, W., Zhang, M., Zou, J., Shao, M., Feng, X., Yang, Q., Li, W., Shi, X., Zang, G., & Yin, C. (2023). A clinical prediction model for predicting the risk of liver metastasis from renal cell carcinoma based on machine learning. Frontiers in Endocrinology, 13. https://doi.org/10.3389/fendo.2022.1083569
- Weber, M. (1982). A protestáns etika és a kapitalizmus szelleme [The Protestant Ethic and the Spirit of Capitalism]. Gondolat.
- Wood, D.E., & Salzberg, S.L. (2014). Kraken: Ultrafast metagenomic sequence classification using exact alignments. Genome Biology, 15(3), R46. https://doi.org/10.1186/gb-2014-15-3-r46
- Xu, H., Zhou, J., G. Asteris, P., Jahed Armaghani, D., & Tahir, M.M. (2019). Supervised Machine Learning Techniques to the Prediction of Tunnel Boring Machine Penetration Rate. Applied Sciences, 9(18).