Back to the programme printer.gif Print

Delegates are invited to meet and discuss with the poster presenters in this topic directly after the session 'Advanced operation & maintenance' taking place on Thursday, 13 March 2014 at 11:15-12:45. The meet-the-authors will take place in the poster area.

David Ferguson University of Strathclyde, United Kingdom
David Ferguson (1) F P Victoria Catterson (1)
(1) University of Strathclyde, Glasgow, United Kingdom

Printer friendly version: printer.gif Print

Presenter's biography

Biographies are supplied directly by presenters at EWEA 2014 and are published here unedited

David Ferguson received the M.Eng degree in Electrical and Electronic Engineering from Heriot Watt University, Edinburgh, U.K., in 2010. He is currently working towards the Ph.D degree within the Centre for Doctoral Training in Wind energy Systems at the University of Strathclyde, Glasgow, with a focus on wind turbine condition monitoring.


Big data techniques for wind turbine condition monitoring


The continual development of sensor and storage technology has led to a dramatic increase in volumes of data being captured for condition monitoring and machine health assessment. Beyond wind energy, many sectors are dealing with the same issue, and these large, complex data sets have been termed ‘Big Data’. Big Data may be defined as having three dimensions: volume, velocity, and variety. This paper discusses the application of Big Data practices for use in wind turbine condition monitoring, with reference to a deployed system capturing 2 TB of data per month.


A comprehensive wind turbine condition monitoring system (CMS) described in [1] and [2] has been installed in an operational Vestas V47 wind turbine for the purpose of developing algorithms to detect machine deterioration. This system measures a range of parameters including vibration, voltage, and current at a sampling rate of up to 20 kHz. The system captures approximately 2 TB of data each month in the form of MySQL MyISAM tables and saves this data on 2 TB external hard drives which can be swapped over as required. The data is then taken in smaller batches onto a computer where detailed analysis may be carried out.
The size of the data set had two key impacts on its analysis. Due to the volume of data and the lack of infrastructure at the remote site, it was not possible to transmit the data via the internet, so manual collection was required. Secondly, a number of difficulties arise when trying to work with this volume of data on a standard desktop computer. Programs such as MS Excel are not suited to dealing with data files of this size, and so the data was handled using a MySQL server and database and imported into Matlab for analysis.
Through the collection and analysis of data from this system, the paper investigates and addresses the issues of working with Big Data for the purpose of wind turbine condition monitoring. Comparisons will be made between different approaches of handling Big Data and recommendations will be given on which are appropriate to given volumes of data.

Main body of abstract

Big Data Concept
Big Data is a term which is becoming increasingly important in companies that are managing huge volumes of data [3]. Until recently, the majority of these companies have been in the marketing or financial sectors dealing with the behaviours of their customers [4]. Other companies involved in Big Data include delivery companies such as UPS who are tracking millions of packages worldwide [5]. As technology improves and it becomes easier to store large volumes of data the definition of Big Data moves from the terabyte scale to the petabyte scale. Big Data is often described in terms of the 3Vs model [6]: velocity (the speed at which the data can be processed), volume (the volume of data that is being stored or analysed), and variety (the different types of data that are being stored or analysed). Big Data differs from standard data due to the complexity introduced by these three parameters.
When implementing Big Data practices within a company there are a number of considerations to take into account. At a high level these can be split into hardware and software considerations. Hardware considerations may include the method for storing data and what volume of data will have to be stored locally (as opposed to being stored remotely, or “in the cloud”). The processing speed should also be considered and whether there is a requirement for redundancy of the hardware or the data itself. Software considerations may include what type of platform will be used to handle the data, such as Big Data-specific tools like Hadoop. Other considerations may include the type of database structure or the requirements for security of the data.
Big Data Applied to Wind Farm Operation and Maintenance
Wind turbine condition monitoring has the potential to reduce operation and maintenance costs through reduced downtime and optimised maintenance scheduling. Current systems, however, may not necessarily be able to detect all levels of deterioration and faults. One reason for this is the lack of high frequency data which contains enough information to be able to detect machine degradation far enough in advance as to allow remedial actions to be planned effectively. At present the majority of wind turbines are monitored by SCADA (supervisory control and data acquisition) systems which provide data only at 10 minute averages, whereas standard machinery diagnostics practices require high frequency vibration monitoring [7].
The system described above can produce approximately 2 TB of data per month from one turbine. As technology improves, 2 TB of data may not be considered Big Data; however 2 TB of data from each wind turbine across a wind farm generates data in the petabyte range. Implementing Big Data practices within a wind farm provides the infrastructure to handle this volume, and unlock the information within the data. This may benefit different parties such as technicians, operators, or manufacturers, and allow real time decision making for maintenance action.
Results: Standard data versus Big Data
A comparison has been made between CMS data and SCADA data to illustrate the necessity for improved data handling techniques. One hour of data from the in-service turbine was read into MS Excel and basic calculation of the mean of 14 variables (including wind speed, rotor speed, and generator temperature) was carried out to illustrate the difference in the number of rows and the processing time.

For one hour of data the difference in processing time may seem insignificant. However a day’s worth of CMS data will produce 4,320,000 rows of data of which MS Excel can only handle 1,048,576 rows at a time. In comparison, a day’s worth of SCADA data is 144 rows, which is many orders of magnitude smaller, and therefore almost any tools can be used for calculation. There are also significant performance implications for the CMS data as the operation performed becomes more complex than calculation of the mean.


The drive to reduce the cost of wind energy is resulting in an increase in the use of condition monitoring systems that are able to detect faults at an earlier stage, allowing more economic maintenance scheduling and reduced downtime. For these systems to be able to detect faults effectively well in advance of any failure they must capture a sizeable volume of data which contains enough information to allow deterioration to be detected. As the number of condition monitoring systems increases—along with the number of parameters being measured—so too does the volume of data. In order to be able to use this data effectively, systems and platforms must be put in place which are able to handle it.
The full paper will show, through the application of a wind turbine condition monitoring system, that large volumes of data require improved data handling techniques compared to that of conventional SCADA systems. Investigations have highlighted that programs such as MS Excel are unable to handle the large files which are in excess of 1,000,000 rows of data. One way to improve data handling is through the use of platforms such as Hadoop that can manage and process large volumes of data more efficiently. Through the use of improved data handling techniques it may be possible to collect high frequency data from an entire wind farm. This data can then be used to effectively monitor wind turbine deterioration and developing faults leading to improved maintenance scheduling and reduced downtime, thus bringing down the overall cost of wind energy.

Learning objectives
The work discussed in this paper will provide a deeper understanding of the technical challenges associated with the use of Big Data for wind farm condition monitoring. It will highlight which methods are suited to dealing with the high volumes of data captured by wind turbine condition monitoring systems.

1. Zaher, A., et al., Database Management for High Resolution Condition Monitoring of Wind Turbines. UPEC: 2009 44th International Universities Power Engineering Conference. 2009.
2. D. Ferguson, et al., Designing Wind Turbine Condition Monitoring Systems Suitable for Harsh Environments, in IET Renewable Power Generation2013, IET: Beijing.
3. J. Hurwitz, A.N., F. Halper, M. Kaufman, Big Data For Dummies. 2013: John Wiley & Sons, Inc.
4. Intel and IBM, Combat Credit Card Fraud with Big Data, 2013, Intel Corporation.
5. T.H. Davenport and J. Dyche, Big Data in Big Companies, 2013, SAS Institute Inc.
6. Beyer, M. Gartner Says Solving 'Big Data' Challenge Involves More Than Just Managing Volumes of Data. 2011 [cited 2013 7/10/2013]; Available from:
7. Tavner, P., et al., Condition Monitoring of Rotating Electrical Machines. 2008: Institution of Engineering and Technology.