Predicting crop yield using data fusion by matrix factorization algorithm
Main Authors: | Milica Brkić, Sanja Brdar, Vladimir Crnojević |
---|---|
Format: | info Proceeding eJournal |
Bahasa: | eng |
Terbitan: |
, 2019
|
Subjects: | |
Online Access: |
https://zenodo.org/record/3997846 |
Daftar Isi:
- How to choose the best hybrid of particular crop for the given location when there are thousands of choices of different varieties on the market? Yield is one of the best indicators for making the decision which seed varieties would be suitable. In order to choose the best hybrid for the given location we need to be able to predict crop yield of all existing hybrids for that location. Not all varieties will be suitable for all fields. This task may be seen as recommendation system where we want to recommend the best hybrid, the one that will give the highest yield, on the chosen farm. Predicting yield is a hard task. There are many parameters like weather, soil and genetics that influence on yield. The biggest challenge in improving the accuracy of prediction is to jointly analyze the complex interaction of all those parameters. In this task we used Data Fusion by Matrix Factorization (DFMF) algorithm that allows us to inference that complex interactions. DFMF uses a penalized matrix tri-factorization model that collectively tri-factorizes many data matrices such that each data matrix is decomposed into a product of tree latent matrices. Data that was analyzed in the paper comes from Syngenta Crop Challenge. It contains information about soil, weather and performance of various hybrids. We created matrix where the rows were hybrids and the columns were fields present in the chosen year and the entries of the matrix represent yield. Only ~10% of the matrix was known and the task was to complete the rest of the matrix, to find out the yield of all hybrid on all locations. In order to do that other data sources should help us. We wanted to enrich historical dataset as it is impossible to plant every seed variety on all fields. Getting new, enriched dataset would help us in making predictions for the next season, identifying the behavior of hybrids in different settings, deciding weather hybrid is tolerant or not to stresses...