Rough Bitcoin Prediction with Time Series Forecasting

Time Series Forecasting, what exactly does it mean ?

Time Series Forecasting is a method used to analyze time series data using statistics and modeling to make predictions. The prediction is not always exact, it can be approximate and should be seen as a probability, “the likelihood of forecasts can vary wildly”. The data quality is very important, “the more comprehensive data we have, the more accurate the forecasts can be”.

It can be applied in various domains, such as weather forecasting, climate forecasting, economic forecasting (in our case), etc… Anything that has data in time. As the prediction can be approximate, the goal can differs too, we’re not always searching the exact future point but to obtain an approximate idea of where/when it is.

It isn’t infallible, time series forecasting doesn’t apply in all situations. To decide if it’s pertinent to use this method, we should first analyze and understand correctly the problematic we’re facing to judge if the time series forecasting is appropriate or not. They’re will be situation where the prediction will be too inaccurate as the likelihood vary too much.

Example of sales time series forecasting between 2017 February & 2018 April graph

How to build the dataset ? Which preprocessing method we used ?

Probably the most important step, is to build our dataset before training any model, obviously you would say. Firstly, we need to obtain raw data, there’s many way of doing it and in our case we had CSV files containing raw data on the bitcoin between 2012 and 2019.

We could have worked on all the time series, or just on one year. In our case we’ve taken in account the last 24 hours and the goal was to predict the following hour (One hour forecasting) and considered that a time window was one hour. As our raw data were one minute time windows, we had to format the data before building the dataset.

The raw datasets are formatted such that every row represents a 60 second time window containing:

  • The start time of the time window in Unix time
  • The open price in USD at the start of the time window
  • The high price in USD within the time window
  • The low price in USD within the time window
  • The close price in USD at end of the time window
  • The amount of BTC transacted in the time window
  • The amount of Currency (USD) transacted in the time window
  • The volume-weighted average price in USD for the time window

We proceeded in steps that are the following :

  • We first cleaned the raw data, we only wanted the close price in USD at the end of the time window so we extract only this column.
  • After that, we proceeded with the curation step that consist of converting the format of our data, we switch from minutes to hours in order build the correct time windows.
  • Third step was to divide the dataset into training dataset and testing dataset, we’ve split the data into 80 % of the data into training data and 20 % into testing data.
  • As we had our training dataset and testing dataset, we needed to preprocess it before giving it to our model. We applied the standardization method (following formula) using the mean and standard deviation of the data in order to scale our data between 0 and 1 respectively.
Standardization (or Z-score normalization) Formula
  • After that we build our time windows to keep only the last 24 hours, and we called the tf.keras.preprocessing.timeseries_dataset_from_array() that creates the dataset of sliding windows over the provided time series.
  • And we have now a functional training dataset and testing dataset that can be used with our models !

How to setup the dataset as a tensorflow dataset ?

When using tensorflow to build your dataset, you can call the API. This “enables you to build complex input pipelines from simple, reusable pieces” and will be used such as to create a dataset that can be used by our architectures.

In our case we could have used function, it’s quite simple to use and will build a tensorflow dataset with the wanted number of time windows that you’ve specified. But we decided to use another function (as said earlier) :


This function “creates a dataset of sliding windows over a timeseries provided as array”. It returns a instance like the function above. We used it as it has different parameter such as batch_size and shuffle with which we can play a bit, and it’s a more dynamic way of building our time series tensorflow dataset.

Chosen architecture : The Long Short-Term Memory (LTSM) architecture

Once we had our dataset functional, it was time to build the neural architecture to perform the training and give us the prediction. In this time series we had to work with Recurrent Neural Network (RNN), and we choose the LSTM architecture as it’s known as a pertinent way of dealing with Time series forecasting.

We first build a simple LSTM architecture to obtain a first glimpse of the predictions, and tried to improve it by building deeper versions. Here are our architectures, using tensorflow-keras.

Simple LTSM version

In our first version, we’ve build a simple architecture that is composed of a simple LSTM layer with 32 units and a simple output layer of 1 unit.

Deep LTSM version

In the second version, it’s quite similar but we add a second LSTM layer of 64 units, just after the first LSTM layer and we’re keeping the output layer with 1 unit.

Deeper LTSM version

The last version is the deep LSTM using the tf.keras.layers.LSTM() parameters, we still have 2 LSTM layers, the first one with 128 units this time, and the second with 256 units. We’ve set the recurrent dropout rate at 0.03 and added the RELU activation function on the first LSTM layer.

Models performance and results

Our models ready, we’ve trained each one of them with 5 epochs. Here is the graph of the raw datas that we used, with the training dataset and testing dataset respectively.

Raw Data Graph

Simple LTSM version

Simple LSTM Architecture Loss Performance
Simple LSTM Architecture Predictions

Here we have the Loss performance of our simple LSTM architecture and the predictions given after the 5 training epochs. We can see that the predictions #1 and #2 are better, as the distance between the red dot and the green dot is smaller than in the prediction #3 and #4. The closest the points are, the more accurate the prediction.

Deep LTSM version

Deep LSTM Architecture Loss Performance
Deep LSTM Architecture Predictions

We did the same with the deep LSTM architecture, we’ve trained with 5 epochs and we can see that our loss performance tend to decrease a little bit more than with the simple architecture, but what is interesting is our predictions, we still have some that are not accurate enough but the visual representation is more logical and the distance between the dots is smaller.

Deeper LTSM version

Deeper LSTM Architecture Loss Performance
Deeper LSTM Architecture Predictions

In the last training session, with our deeper LSTM architecture, same as before we had 5 epochs of training, and we can already see that the loss performance of the training and valid are more efficient and have similar pattern. For our prediction, we can see that we obtain more accurate predictions that are very interesting, as some of them were already found in the previous architecture. It seems, based on the predictions, that the deeper LSTM Architecture performs better than the other two.


We are satisfied with our results, even if we still have many questions on how to improve and to really verify the predictions. This project was quite interesting and challenging, changing our habits and pushing us on a topic that is not the easiest. Time Series Forecasting is a useful analyzing method that can be used in different fields and not only for Bitcoin prediction, we still need to learn more about it and improves our understanding of how to use LSTM architecture, how to optimize them and how to optimize our dataset to performs better.

We had some ideas to see how the models would performs if we were to change the time series, taking more than the last 24 hours, or trying to predict more than just the following one hour. Changing our Z-score normalization by a Batch normalization, and tuning our hyperparameter, in order to verify if we can have better results, and if our current results are really accurate.

We though to change the graphic representation, to take in account the precision metrics of the training session, and to show the prediction on the original graph (raw data graph) and not separated. Our goal now is to continue the improvement of the project by managing data other than with a CSV, using database. And by building a scraper that could retrieve the current data of the bitcoin rather than depending of a specific CSV that was given to us.

Co-authored by

Adrien Millot (Github, Medium)

Nathan Lapeyre (Github, Medium)



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store