分类: 动力与电气工程 >> 工程热物理学 分类: 计算机科学 >> 计算机应用技术 提交时间: 2025-04-11
摘要: In the current era of artificial intelligence, the advancement of high-performance computing based on electronic devices is hindered by thermal contact resistance. To accurately predict this resistance, we established a comprehensive database derived from extensive experimental work documented in previous studies. By employing machine learning algorithms, we developed a prediction model for thermal contact resistance that utilizes this dataset. This model can predict the thermal contact resistance among all learned materials, demonstrating a significant degree of general applicability. Our model shows strong performance on the test set (with a coefficient of determination of 0.982) , reflecting a high level of predictive accuracy. Additionally, the interpretability analyses conducted on the machine learning model are consistent with established theories of thermal contact resistance, further confirming the model’s accuracy. We anticipate that this database will support the development of thermal contact resistance prediction models and that our model will enhance the precision of thermal contact resistance predictions.
分类: 计算机科学 >> 自然语言理解与机器翻译 提交时间: 2024-01-03
摘要: League of Legends (LoL) is a highly popular multiplayer online competitive game, featuring intricate game mechanics and team cooperation, making the prediction of match outcomes a challenging task. This study utilizes a dataset from Kaggle, comprising 9,879 ranked matches ranging from Diamond I to Master tier, to build a machine learning model predicting the ultimate winner, either the blue or red team, based on the features of the first 10 minutes of gameplay. Through steps such as data loading, preprocessing, and feature engineering, we provided effective inputs for the model. For model selection, we opted for the Logistic Regression algorithm, achieving a model accuracy of 0.7277 through data splitting and training. This accuracy robustly supports predictions of the winning side, whether blue or red. However, to further enhance model performance, we recommend exploring additional feature en#2;gineering methods, investigating alternative machine learning algorithms, and fine-tuning hyperpa#2;rameters. The introduction of deep learning models is also a promising avenue to better capture the complex relationships within the game. Through these improvements, we anticipate increasing the models predictive accuracy for future matches, offering valuable insights for game development and enhancement.
分类: 计算机科学 >> 自然语言理解与机器翻译 提交时间: 2023-11-01
摘要: Most research and applications on natural language still concentrate on its superficial features and structures. However, natural language is essentially a way of encoding information and knowledge. Thus, the focus should be on what is encoded and how it is encoded. In line with this, we suggest a database-based approach for natural language processing that emulates the encoding of information and knowledge to build models. Based on these models, 1) generating sentences becomes akin to reading data from the models (or databases) and encoding it following some rules; 2) understanding sentences involves decoding rules and a series of boolean operations on the databases; 3) learning can be accomplished by writing on the databases. Our method closely mirrors how the human brain processes information, offering excellent interpretability and expandability.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-28 合作期刊: 《数据智能(英文)》
摘要: Artificial intelligence and machine learning applications are of significant importance almost in every field of human life to solve problems or support human experts. However, the determination of the machine learning model to achieve a superior result for a particular problem within the wide real-life application areas is still a challenging task for researchers. The success of a model could be affected by several factors such as dataset characteristics, training strategy and model responses. Therefore, a comprehensive analysis is required to determine model ability and the efficiency of the considered strategies. This study implemented ten benchmark machine learning models on seventeen varied datasets. Experiments are performed using four different training strategies 60:40, 70:30, and 80:20 hold-out and five-fold cross-validation techniques. We used three evaluation metrics to evaluate the experimental results: mean squared error, mean absolute error, and coefficient of determination (R2 score). The considered models are analyzed, and each model's advantages, disadvantages, and data dependencies are indicated. As a result of performed excess number of experiments, the deep Long-Short Term Memory (LSTM) neural network outperformed other considered models, namely, decision tree, linear regression, support vector regression with a linear and radial basis function kernels, random forest, gradient boosting, extreme gradient boosting, shallow neural network, and deep neural network. It has also been shown that cross-validation has a tremendous impact on the results of the experiments and should be considered for the model evaluation in regression studies where data mining or selection is not performed.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-28 合作期刊: 《数据智能(英文)》
摘要: Machine learning (ML) applications in weather and climate are gaining momentum as big data and the immense increase in High-performance computing (HPC) power are paving the way. Ensuring FAIR data and reproducible ML practices are significant challenges for Earth system researchers. Even though the FAIR principle is well known to many scientists, research communities are slow to adopt them. Canonical Workflow Framework for Research (CWFR) provides a platform to ensure the FAIRness and reproducibility of these practices without overwhelming researchers. This conceptual paper envisions a holistic CWFR approach towards ML applications in weather and climate, focusing on HPC and big data. Specifically, we discuss Fair Digital Object (FDO) and Research Object (RO) in the DeepRain project to achieve granular reproducibility. DeepRain is a project that aims to improve precipitation forecast in Germany by using ML. Our concept envisages the raster datacube to provide data harmonization and fast and scalable data access. We suggest the Juypter notebook as a single reproducible experiment. In addition, we envision JuypterHub as a scalable and distributed central platform that connects all these elements and the HPC resources to the researchers via an easy-to-use graphical interface.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-28 合作期刊: 《数据智能(英文)》
摘要: There is a huge gap between (1) the state of workflow technology on the one hand and the practices in the many labs working with data driven methods on the other and (2) the awareness of the FAIR principles and the lack of changes in practices during the last 5 years. The CWFR concept has been defined which is meant to combine these two intentions, increasing the use of workflow technology and improving FAIR compliance. In the study described in this paper we indicate how this could be applied to machine learning which is now used by almost all research disciplines with the well-known effects of a huge lack of repeatability and reproducibility. Researchers will only change practices if they can work efficiently and are not loaded with additional tasks. A comprehensive CWFR framework would be an umbrella for all steps that need to be carried out to do machine learning on selected data collections and immediately create a comprehensive and FAIR compliant documentation. The researcher is guided by such a framework and information once entered can easily be shared and reused. The many iterations normally required in machine learning can be dealt with efficiently using CWFR methods. Libraries of components that can be easily orchestrated using FAIR Digital Objects as a common entity to document all actions and to exchange information between steps without the researcher needing to understand anything about PIDs and FDO details is probably the way to increase efficiency in repeating research workflows. As the Galaxy project indicates, the availability of supporting tools will be important to let researchers use these methods. Other as the Galaxy framework suggests, however, it would be necessary to include all steps necessary for doing a machine learning task including those that require human interaction and to document all phases with the help of structured FDOs.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-27 合作期刊: 《数据智能(英文)》
摘要: Computational prediction of in-hospital mortality in the setting of an intensive care unit can help clinical practitioners to guide care and make early decisions for interventions. As clinical data are complex and varied in their structure and components, continued innovation of modelling strategies is required to identify architectures that can best model outcomes. In this work, we trained a Heterogeneous Graph Model (HGM) on electronic health record (EHR) data and used the resulting embedding vector as additional information added to a Convolutional Neural Network (CNN) model for predicting in-hospital mortality. We show that the additional information provided by including time as a vector in the embedding captured the relationships between medical concepts, lab tests, and diagnoses, which enhanced predictive performance. We found that adding HGM to a CNN model increased the mortality prediction accuracy up to 4%. This framework served as a foundation for future experiments involving different EHR data types on important healthcare prediction tasks.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-18 合作期刊: 《数据智能(英文)》
摘要: Data availability statements can provide useful information about how researchers actually share research data. We used unsupervised machine learning to analyze 124,000 data availability statements submitted by research authors to 176 Wiley journals between 2013 and 2019. We categorized the data availability statements, and looked at trends over time. We found expected increases in the number of data availability statements submitted over time, and marked increases that correlate with policy changes made by journals. Our open data challenge becomes to use what we have learned to present researchers with relevant and easy options that help them to share and make an impact with new research data.