python – Wrong range of values ​​in Time-series forecasting TensorFlow

I have a dataset of prices with 36619 samples and 6 features. The distributions of the features vary significantly, so I create the following function to adapt a normalization layer and create a TensorFlow (TF) dataset. The description of each step is on the function as well as the parameters.tensorflow.__version__=2.8.2

WINDOW_JUMP = 24
WINDOW_SIZE = 24*7
SHIFT = 1
TARGET = 1
BATCH_SIZE = 128
SHUFFLE_BUFFER = 10000
SERIES_SHAPE = [WINDOW_SIZE,6]

WINDOW_SIZE results in the dataset having samples of shape 168 x 6 and the targets will be 24 ( WINDOW_JUMP) time-steps ahead.

def windowed_dataset(series,
    window_size, 
    window_jump, 
    target, 
    batch_size,
    shuffle_buffer,
    dim_target = 0,
    shift=1,
    processing = None,
    verbose=[0,0,0,0,0,0]):
    """Generates dataset windows

    Args:
      series (array of float) - contains the values of the time series
      window_size (int) - the number of time steps to include in the feature
      window_jump (int) - number of time steps ahead to predict
      target (int) - number of targets to predict from window_size to window_jump
      batch_size (int)
      shuffle_buffer(int) - buffer size to use for the shuffle method
      dim_target (int) - Case of a multivariate dataset, the number of the column from which to extract the targets
      shift (int) - jump between windows
      Processing (Keras.layers.preprocessing Object) - If passed, Preprocessing layer to adapt
      verbose (vector of Booleans) - Print steps of the dataset creation process
                                    Only interate once.

    Returns:
      dataset (TF Dataset) - TF Dataset containing time windows and targets
    """
    print('--> Generate a TF Dataset from the series values')
    dataset = tf.data.Dataset.from_tensor_slices(series)
    if verbose[0]:
      # Preview the result
      for val in dataset:
        print(f'An entry sample would be: {val.numpy()}')
        print()
        break
    if processing != None:
      print('t --> Adapting the preprocessed layer to the data')
      processed = dataset.window(window_size, shift=shift, drop_remainder=True)
      processed = processed.flat_map(lambda window: window.batch(window_size))
      processing.adapt(processed)
    print('--> Window the data but only take those with the specified size')
    dataset = dataset.window(window_size + window_jump, shift=shift, drop_remainder=True)
    if verbose[1]:
      for window_dataset in dataset:
        print(f'After windowing: {[item.numpy() for item in window_dataset]}')
        print()
        break
    print('--> Flatten the windows by putting its elements in a single batch')
    dataset = dataset.flat_map(lambda window: window.batch(window_size + window_jump))
    if verbose[2]:
      for window in dataset:
        print(f'Flattened: {window.numpy()}')
        print()
        break
    print('--> Create tuples with features and labels')
    dataset = dataset.map(lambda window: (window[:-window_jump], window[-target:][:,dim_target]))
    if verbose[3]:
      for x,y in dataset:
        print('Tuple of features and labels:')
        print("nx = ", x.numpy())
        print("ny = ", y.numpy())
        print()
        break
    print('--> Shuffle the windows')
    dataset = dataset.shuffle(shuffle_buffer)
    if verbose[4]:
      for x,y in dataset:
        print('Suffled:')
        print("nx = ", x.numpy())
        print("ny = ", y.numpy())
        print()
        break
    
    print('--> Create batches of windows')
    dataset = dataset.batch(batch_size).prefetch(1)
    if verbose[5]:
      for x,y in dataset:
        print('Batchs:')
        print("nx = ", x.numpy())
        print("ny = ", y.numpy())
        print()
        break
    if processing != None:
      print('Returning dataset and adapted layer')
      return dataset,processing
    else:
      print('Returning dataset')
      return dataset

The dataset was split into Train (90%), Validation (5%), and Test (5%) sets. I obtained the Train and Validation set to fit the model with the function above. In this step, I verify the values ​​of the targets contained in each set and are in the desired range of values.

# Train set
train_series, normalizer_layer = windowed_dataset(train_set,
                                WINDOW_SIZE,WINDOW_JUMP,TARGET,BATCH_SIZE,SHUFFLE_BUFFER,
                                processing= tf.keras.layers.Normalization(input_shape=SERIES_SHAPE))

Finally, I created, compiled, and fitted the model.

# Train the model
def compile_fit_model(model,epochs,train_series,validation_series,lr_schedule= None, callbacks=None):
  # Initialize the optimizer
  if lr_schedule==None:
    print('nCreating optimizer')
    optimizer = tf.keras.optimizers.Adam()
  else:
    print('nCreating optimizer with scheduler')
    optimizer = tf.keras.optimizers.Adam(lr_schedule)
  # Set the training parameters
  model.compile(loss="mse", optimizer=optimizer, metrics=['mae'])
  with tf.device(device_name):  
    if callbacks == None:
      print('nFitting the model')
      history = model.fit(train_series, epochs=epochs)#,validation_data=validation_series)
    else:
      print('nFitting the model with calbacks: {0}'.format(callbacks))
      history = model.fit(train_series, epochs=epochs,callbacks=callbacks)#,validation_data=validation_series)
  return history

def get_uncompile_model(input_shape,norm=None):
    model = tf.keras.models.Sequential()
    if norm != None:
        model.add(norm)
    model.add(tf.keras.layers.SimpleRNN(1))
    model.add(tf.keras.layers.Dense(1))
    model.summary()
    return model
model = get_uncompile_model(SERIES_SHAPE,normalizer_layer)
history = compile_fit_model(model,3,train_series,val_series,lr_schedule=1e-8)

I am getting the following prediction from the validation data, I don’t expect it to be accurate, after all, I haven’t tuned It. However, it should at least be able to get into the range of correct values, right? I have tried the linear activation function in the last layer and various configurations for the Neural Network but nothing worked. I have followed the TF tutorial on Regression for beginners but I can’t find anything wrong.

Can someone please help me to identify what I am doing wrong?. I am relatively new to Time-Series and TensorFlow. If you find any annotations to add different from the question, feel free to tell me, I will be grateful.

Leave a Comment