Artificial Intelligence

Guide on Deep Learning for Time Series Multi Label Classification

Some observations and a few tips to tackle some of the major problems in the Multi-label classification of time-series data.


Illustration of ROC Curve
                                                    Illustration of ROC Curve

Hello all,  

In the recent few days, I was working on a multi-label classification problem involving time-series data and I encountered many new things (including a big red flag), wanted to share a few pointers so that anyone working on a similar problem statement can get a jump start.    

Few TensorFlow references here and there, but overall it's framework-agnostic😊

1. Very important to understand the scale of values in your data, get that sorted out at the beginning itself  
2. I used tf.data for loading, handling, and processing data from a CSV file and it is was very straightforward and easy-to-use function calls (can pipe multiple functionalities – added advantage)  
3. I have seen many developers start big and slowly reduce the model complexity and I'm also guilty of doing this in the past. But, lesson learned – Start small and increase complexity if you feel the training is getting saturated after certain epochs. Having a smaller model (if performing on par with a bigger model) is always advantageous because of its latency when deployed  
4. Always use a custom training loop or over-ride the model.fit() for having better control on training than using the highly abstract way of training a model  
5. When it comes to multi-label classification, tracking or depending on just accuracy and loss is a big NO. Use precision, recall, f1-score as they are better indicators. This is the biggest red flag!  
6. Write code snippets around these metrics and have rightly configured custom callbacks for an efficient training  
7. Use the ROC and Precision-Recall curve to understand the outputs of your model given the fact you can set thresholds for the sigmoid output for setting a label of 1/0 for the neurons  
8. Actively track your training runs using any of the open-source tools or using a code script. Better to use tools that can do the heavy lifting, and you can focus on data and model than other things (there are plenty of tools out there). In short, start adopting ML Ops practices.  
9. One final thing – Use docker and write the code in such a way that it enables you to stick your container in a Multi-GPU or TPU environment, and it still runs flawlessly for a scaled-up training exercise

I will keep sharing my views and a few more tips as and when I progress through solving my problem statement. Until then, I hope the above tips will aid you when working on a similar problem statement.

Similar posts

Grid & Energy Analytics Blog

You are leading grid transformation through data analytics and enabling transformation in utilities worldwide. Our job is to provide you timely information and resources you need to change the world.

 

Sign Up to get more out of your inbox