Writing, storing and viewing time series data is a core part of an energy monitoring application like emoncms. Over the years the way time series data has been handled in emoncms has gone through several significant revisions both out of necessity as emoncms.org use has grown and through increased understanding of how disks and filesystems work and how to reduce disk IO through in-memory storage. In order to best describe the current solution it helps to begin with a brief history of time series database design within emoncms.
Time series data is a series of timestamped measurements. For example a temperature time series may look like this:
The first solution used in emoncms was a simple mysql table with two fields one for the time-stamp and the other for the value and one table per time series feed. Data was requested using a mysql query that picked out a table row 'every nth row'.
After around 5 months query times became increasingly slow nearing 12s for a historical data request which involved selecting 1000 data points out of 2.5 million. A temporary solution was found that initially gave an improvement of 10x. The solution involved adding an index on the time column and making repeated requests from php pulling out a set of data points at given time intervals.
While it worked sufficiently well on a server with a light load as emoncms.org grew this approach became again unusable. Whenever the mysql query queue became more than a few tens of lines the delay added as queue items waited to be processed slowed down the overall historical data request significantly. A series of improvements to emoncms that reduced the number of mysql queries performed in input processing gave a little bit more time of 'ok' performance but it never took long for emoncms to grow and usability to deteriorate.
In May 2013 Mike Stirling who wrote the timestore time series database got in contact, suggesting I took a look at his work. Timestore is a dedicated time series database written in c, its data request performance and disk use blew the results I had go so far with mysql away: Over halving disk use and reducing data request times by another 10x on a non loaded server and around 398x to 196ms on a busy server.
Timestore became the engine of choice within emoncms in late July 2013 providing a huge boost in performance but converting the large amounts of mysql time series data to timestore on an already stretched server was very time consuming. I started to experiment with writing time series engines with direct file access from scratch in php and realised that it wasn’t that hard to write an engine that could take the GB's of mysql time series data amassed on emoncms.org move it to another folder and read and write to it directly obtaining significantly improved performance over mysql without requiring timely conversion. This engine became the PHPTimeSeries engine which is a variable interval time series engine documented in full here:
While timestore solved the problem of reading historical data its write performance was slower than the mysql solution due to the addition of all the averaged layers with associated increase in disk IO. As I worked through converting emoncms.org feeds to it I noticed the server load climb significantly.
Moving input and feed meta data (last updated time and value) to redis an in-memory database rather than using mysql which resulted in disk IO load caused another significant improvement in performance.
By this time most of the meta data that did not need to be persistent and most of the feed data had been removed from mysql and either put in timestore, phptimeseries or redis.
The next big capacity improvement came from moving the server to Solid State Drives (SSD's) which have much shorter seek times as there's no hard drive head that needs to move across a physical disk. Thanks to Ynyr Edwards for encouraging me to try redis and for helping with adding redis to emoncms as well as recommending the use of a dedicated server with SSD drives.
At this point with much reduced mysql load, very fast graph load times thanks to timestore it seemed that from an emoncms.org load and user experience point of view it seemed we had a winning solution, but a few months later another instability began to occur. Timestore would freeze up for seconds at a time. I managed to replicate the issue on my local machine. I poured through the timestore code (thanks to Mike for making timestore open source) and started to understand how timestore worked, Im not a good c programmer so to try and check if I understood it I started writing a port in php. I then did some performance testing on both versions. It turned out that the php port didn’t suffer from the stability issue and continued to be stable up to its maximum post rate which was over 10x the current emoncms.org load and so I decided to move over to using the php port with the added benefit that I now knew how every line of code worked and so could adapt as needed in future.
This brings us to the present day with the 3 main emoncms feed engines:
These 3 engines provide between them an implementation that can fit most applications.
Time Series feed engine development is not yet complete, the current area of research is how best to add write buffering in order to make the writing of the time series data much more efficient.
Due to the way filesystems work writing 9 bytes at a time to each data file in this way is not particularly write efficient. File systems usually have a minimum IO size that is much larger than 9 bytes (i.e 512 bytes), we can improve the engine write implementation by buffering and writing in large blocks that are closer to this minimum IO size. This is the current area of research in emoncms feed engines.