Friday, 27 October 2017

Discoveries in the Great Solar Data Mountain

One of the key's to the discovery of the Higgs Boson by the Large Hadron Collider at CERN
was the recognition of the quantity of data required to reveal a signature of the Higgs Boson. This needed a novel approach to the analysis and storage of the mountain of data that would be generated by the Large Hardron Collider.... big data had arrived. Just as CERN was first with the world wide web it was also first with the recognition of the Big Data problem. The story started with the discovery of the neutral current (a.k.a. the Z boson) in 1973 by the Gargamelle bubble chamber. This was remarkable because three events were observed in over 1.4 million bubble chamber photographs taken over a two year period, we didn't have digital image processing back then! Now, research is massively dependent on digital image processing and it is fascinating to follow the enormous variety of research problems in the area of solar physics, these researchers make use of toolkits such as the IDL solar software library, the more recent SunPY project or even Matlab. Along with many disciplines, there are two massive problems in this research field
  1. Solar physics has a big data problem, how do researchers collaboratively analyse the mountain of data both historical and being generated by new satellites studying our nearest star.
  2. How do we ensure that the software we use continues to be fit for purpose?
One of the solutions to the first problem was addressed by two excellent talks to the solar physics group the first talk a couple of weeks ago was about deep learning for solar flare forecasting. Yudong Ye gave todays talk which was an introduction to machine learning and its application to space physics. Ye said that Machine learning is more and more useful in this data explosion era and could be a powerful tool to reveal hidden connections and pave the way to new discoveries.  Two quantities  noted currently are the Sloane Digital Sky survey  which holds 140TB of data for optical telescope sky surveys between 2000 and 2010. The NASA NCSS holds 32pB of climatological data covering the years upto 2013. Ye provided a clear introduction to machine learning covering its concepts, he identified the different categories and gave recent applications in space physics. With an example of deciding whether there is a strong geomagnetic storm (namely, the Dst index is less than -100) from ICME’s plasma and magnetic field parameters using a support vector machine. He explained step by step how a machine learning method was applied to the specific problem described above. Further details are in references 11-15 below.
The previous talk given by Xin Huang discussed a model for  deep learning based solar flare forecasting. Solar flares originate from the release of the energy stored in the non-potential magnetic field of active regions, the triggering mechanism for these flares, however, are still unknown. For this reason, conventional solar flare forecasting is probabilistic and based on the statistical relationship between the characteristic parameters of active regions and solar flares. In the deep learning method, forecasting patterns can be learned from the line-of-sight magnetograms of solar active regions. It is necessary to obtain observational data with sufficient size to train the forecasting model and test its performance. Huang described how a dataset was created from the line-of-sight magnetogarms of active regions observed by SOHO/MDI and SDO/HMI from April 1996 to October 2015 along with the corresponding soft X-ray solar flares observed by GOES. The MDI data was taken as the training set and the HMI data as the testing set. The experimental result indicated that (1) the forecasting patterns can be automatically reached with the training set and these patterns can also be applied to the testing set, which is reduced to be the MDI proxy data; (2) the performance of the deep learning forecasting model is not sensitive to the given forecasting periods (6 hour, 12 hour, 24 hour or 48 hour); (3) a reasonable forecasting model is achieved for solar flares with higher importance. Huang used a deep learning package called CAFFE and used a single NVIDIA GPU (see references 6-9) below. He described how a cascade of layers in a convolutional neural network were used for feature extraction. The trick with deep learning is to exploit readily trained networks and to make use of supervised learning.

This talk was rather inspirational I've known for a long time that the Matlab package provides machine learning toolbox. At the risk of a little knowledge being dangerous I decided to try one of the matlab deep learning demos with a GPU, which is a Demonstration of Image category classification using deep learning (ref 2). This was very easy to run and I attempted a simple image classification on a set of photographs, clearly this is very powerful. But this is open to all our users on the central HPC at the university of sheffield it's possible to run the matlab deep learning demos. ShARC features a range of deep learning and machine learning software which has been well used and tested by the RSE and machine learning groups at The University of Sheffield.

A further possibility for researchers is to use the new deep learning cluster, JADE, based at Oxford ( see reference 10 ). It is fortunate that The University of Sheffield is a partner in this project making access much easier for researchers to use this powerful and increasingly used technique to meet the challenge of the big data problem (see reference 19). We can look forward to some excellent adventures exploring the great solar data mountain! 
  1.  The discovery of the weak neutral currents
  2. Demonstration of Image category classification using deep learning with Matlab
  3. Neural network toolbox for Alexnet Network with Matlab  
  4. Neural network importer for CAFFE models
  5. Mathworks neural networks toolbox team
  6. CAFFE
  8. TORCH
  10. JADE
  11. Predicting Coronal Mass Ejections Using Machine Learning Methods
  12. Solar Flare Prediction Model with Three Machine-learning Algorithms using Ultraviolet Brightening and Vector Magnetograms 
  13. Space weather research group (Bradford)
  14. Automated Prediction of CMEs Using Machine Learning of CME – Flare Associations 
  16. Studying imagery from solar dynamics 
  17. Application of Convolution Neural Network to the forecasts of flare classification and occurrence using SOHO MDI data
  18. Application of a deep-learning method to the forecast of daily solar flare occurrence using Convolution Neural Network
  19. GPU Computing Sheffield 

No comments:

Post a Comment