0%

AI学习笔记--Tensorflow自定义

Tensorflow框架下自定义loss,metrics,training_loop,layer的一些实践经验。

自定义损失函数(Loss)

1. Custom loss function with extra arguments

Keras库的损失函数标准的输入参数形式是loss(y_true, y_pred),面向的使用场景是最常见的预测与目标(或标签)的比较。而我希望计算损失函数时用到模型的input作为输入参数,所以需要自定义一个损失函数。

在tensorflow官方doc中,给出了两种自定义损失函数的方法。一种是定义一个loss function,另一种是继承tf.keras.losses.Loss类。其中第二种允许输入除(y_true, y_pred)之外的其他参量。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# custom loss function with extra arguments
class LocLoss(keras.losses.Loss):
def __init__(self, kg_raw, model_size, nobs, batch_size, name="custom_loss"):
super().__init__(name=name)
# these are some extra arguments:
self.kg_raw = kg_raw
self.model_size = model_size
self.nobs = nobs
self.batch_size = batch_size

def call(self, kg_true, loc_f):
k = tf.constant([1, 1, self.nobs, 1], tf.int32)
kg_pred = tf.math.multiply(self.kg_raw, tf.tile(tf.reshape(loc_f, [self.batch_size,self.model_size,1,1]), k))
mse = tf.math.reduce_mean(tf.square(kg_true - kg_pred))
return mse

# create a model
# specify model structure...
model = keras.Model(inputs=inputs, outputs=outputs)

# compile the model
model.compile(
optimizer=keras.optimizers.Adam(learning_rate=0.0001),
loss=LocLoss(inputs,model_size,nobs,batch_size), # model inputs as extra args
experimental_run_tf_function=False, # neccecarry !
)

# train the model
history = model.fit(train_dataset, epochs=10, validation_data=test_dataset)

模型的input是kg_raw,output是一个函数loc_f,target是kg_true,loss是$MSE(kg_{raw}*loc_f, kg_{true})$。注意为了能够让tensorflow自动求导,所有loss中的参量都要以tensorflow tensor的形式计算。

在自定义LocLoss class中,__init__方法接受extra arguments(kg_raw, model_size, nobs, batch_size);call方法计算loss,它的输入参数仍然是经典的(y_true, y_pred),只不过这里的y_pred是模型输出的loc_f,计算时需要用到的extra arguments用self.*args调用。

在compile model时将inputs作为输入参数,就实现了含其他参数的损失函数的自定义。这里会碰到一个问题,如果不设置experimental_run_tf_function=False,编译会报错(Check this for more information)

1
tensorflow.python.eager.core._SymbolicException: Inputs to eager execution function cannot be Keras symbolic tensors, but found [<tf.Tensor '...' shape=(...) dtype=float32>]

2. Weighted loss

待填坑。

3. Load pre-trained model and predict

上述含model inputs作为参数自定义loss的训练好的模型,想要保存下来并且之后用来预测。若将整个模型保存,之后load时会报错:

1
2
3
model.save('/save_dir')

trained_model = tf.keras.models.load_model('/save_dir',custom_objects={'loss': LocLoss(inputs)})

解决办法是只保存模型的weights,load时也只load weights:

1
2
3
model.save_weights('/save_dir')

model.load_weights('/save_dir')

在load之前要预先定义好模型并且compile。

【附】自定义损失函数的一些常见参考问题:


自定义评价函数(Metrics)

tensorflow官方doc中也给出了自定义metrics的方法,同自定义loss类似,可以继承keras.metrics.Metric类,写好后放进model.compile()里编译。

我自定义metrics的动机是想记录模型训练过程中gradient norm的变化,来检查最优化有没有卡住。gradient在tensorflow 2.0之后可以方便的从GradientTape()得到,所以需要go lower level,看看在训练时发生了什么。具体需要改写的是keras.Model类中的train_step方法,该方法决定了model.fit()时计算gradient, loss, metrics以及梯度下降的过程。tensorflow官方doc有详细解释。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
# custom metrics 
class GradientNorm(tf.keras.metrics.Metric):
def __init__(self, name='gradient_norm', **kwargs):
super().__init__(name=name, **kwargs)
self.gd_norm = self.add_weight(name='gdnorm', initializer='zeros') # variable must be assigned by add_weight method

def update_state(self, y, gradients, sample_weight=None):
norm = tf.norm(gradients, ord='euclidean') / tf.cast(tf.size(gradients, out_type=tf.int32), tf.float32)
self.gd_norm.assign(norm) # variable must be updated by add_weight method

def result(self):
return self.gd_norm

def reset_states(self):
self.gd_norm.assign(0)

# custom what happened during training by overriding train_step()
class CustomModel(keras.Model):
def train_step(self, data):
# Unpack the data. Its structure depends on your model and
# on what you pass to `fit()`.
x, y = data

with tf.GradientTape() as tape:
y_pred = self(x, training=True) # Forward pass
# Compute the loss value
# (the loss function is configured in `compile()`)
loss = self.compiled_loss(y, y_pred, regularization_losses=self.losses)

# Compute gradients
trainable_vars = self.trainable_variables
gradients = tape.gradient(loss, trainable_vars)

# Update weights
self.optimizer.apply_gradients(zip(gradients, trainable_vars))

# Update metrics
self.compiled_metrics.reset_states()
self.compiled_metrics.update_state(y, gradients)

# Return a dict mapping metric names to current value
return_metrics = {m.name: m.result() for m in self.metrics}
return return_metrics

# create a model
# specify model structure...
model = CustomModel(inputs=inputs, outputs=outputs)

# compile the model
model.compile(
optimizer=keras.optimizers.Adam(learning_rate=0.0001),
loss=LocLoss(inputs,model_size,nobs,batch_size), # model inputs as extra args
experimental_run_tf_function=False, # neccecarry !
metrics=[GradientNorm()], # custom metrics
)

这种方法是通过compiled_loss, compiled_metrics计算和更新。如果想要添加多个metrics,而且不同的metrics有不同的输入参量(上面的方法compiled_metrics固定输入参量只能是(y, gradients)),则需要下面的方法,这种方法不在model.compile()时指定loss和metrics。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
gd_norm = GradientNorm()
# PSNR is another custom metric, which takes (y_true, y_pred) as input
psnr_metric = PSNR()
loss_tracker = keras.metrics.Mean(name="loss")

class CustomModel(keras.Model):
def train_step(self, data):
# Unpack the data. Its structure depends on your model and
# on what you pass to `fit()`.
x, y = data

with tf.GradientTape() as tape:
y_pred = self(x, training=True) # Forward pass
# Compute the loss value
loss = keras.losses.mean_squared_error(y, y_pred)

# Compute gradients
trainable_vars = self.trainable_variables
gradients = tape.gradient(loss, trainable_vars)

# Update weights
self.optimizer.apply_gradients(zip(gradients, trainable_vars))

# Update metrics (includes the metric that tracks the loss)
gd_norm.update_state(y, gradients)
psnr_metric.update_state(y, y_pred)
loss_tracker.update_state(loss)

# Return a dict mapping metric names to current value
return_metrics = {m.name: m.result() for m in self.metrics}
return return_metrics

@property
def metrics(self):
# We list our `Metric` objects here so that `reset_states()` can be
# called automatically at the start of each epoch
# or at the start of `evaluate()`.
# If you don't implement this property, you have to call
# `reset_states()` yourself at the time of your choosing.
return [loss_tracker, gd_norm, psnr_metric]

不过这种办法只在tensorflow 2.2以上版本支持,否则会报错(check this for more info)

1
ValueError: The model cannot be compiled because it has no loss to optimize.


自定义训练过程(Training Loop)


自定义层(Layer)

在tensorflow自定义layer有两种选择:继承Layer类或继承Lambda类。这两种都属于tensorflow.keras.layers类。

tensorflow官方更推荐继承Layer类而不是Lambda类。原因可参考官方文档,和这篇。通常来说,只有非常简单的操作才推荐用Lambda层,有两个原因,一是因为Lambda层在save model和load model时需要相同的环境配置,这使得模型没那么便携;另一个是用了Lambda层的model在debug的时候很不方便。

下面是一个继承Layer类编写自定义层的例子。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
from tensorflow.keras import layers

class Loc_by_1D(layers.Layer):
def __init__(self, obs_dens, model_size, nobs, batch_size, **kwargs):
super(Loc_by_1D, self).__init__(**kwargs)
self.obs_dens = obs_dens
self.model_size = model_size
self.nobs = nobs
self.batch_size = batch_size

def rotate(self, matrix, shifts):
""""here are some codes"""

def call(self, inputs):
kg_f = tf.squeeze(inputs[0]) # multiple inputs as a list
loc_f = tf.squeeze(inputs[1]) # multiple inputs as a list

"""some operations on inputs"""
return kg_pred

def compute_output_shape(self, input_shape):
shape = list(input_shape)
assert len(shape) == 2
return tuple([shape[0], self.model_size, self.nobs])

def get_config(self):
config = {'obs_dens': self.obs_dens, 'model_size': self.model_size,
'nobs': self.nobs, 'batch_size': self.batch_size}
base_config = super(Loc_by_1D, self).get_config()
return dict(list(base_config.items()) + list(config.items()))

# the last layer of my keras model
outputs = Loc_by_1D(obs_dens=obs_dens, model_size=model_size, nobs=nobs, batch_size=kwargs['batch_size'])([inputs, x5]) # here the inputs is a list [inputs, x5]

这里我自定义了一个叫Loc_by_1D的层,其中包含几个主要的函数

  • __init__用来初始化,指定一些参数;
  • call里一般写的是这个layer的核心功能。当输入给call function的inputs不只一个时,最好用list的形式传递参数(如例子中那样),参考这篇
  • compute_output_shape用来计算该层输出张量的shape,不是必须的,除了输出shape是动态的情况,都会自动计算;
  • get_config用来检查model的时候改层有哪些指定参数。

在自定义的layer里,各种操作也只能用tensorflow的函数库,比如tf.squeeze()对应numpy里的np.squeeze(),一般来说,numpy里一些简单常用的操作都能在tensorflow找到对应,只不过操作的对象不再是np.array而是tensor,后者是tensorflow里最常用的数组变量。