Exponential Moving Average (EMA)

When training a model, it is often beneficial to maintain moving averages of the trained parameters. Evaluations that use averaged parameters sometimes produce significantly better results than the final trained values.

shadow_variable = decay * shadow_variable + (1 - decay) * variable

ema = tf.train.ExponentialMovingAverage(decay=0.99)  
update_losses = ema.apply([discrim_loss, gen_loss_GAN, gen_loss_L1])  

Training supervisor

A training helper that checkpoints models and computes summaries.

tf.train.Supervisor (newer version tf.train.MonitoredTrainingSession)

The Supervisor is a small wrapper around a Coordinator, a Saver, and a SessionManager that takes care of common needs of TensorFlow training programs.

Exporting and Serving Models with TensorFlow

Saver adds operations that allow us to save and restore the model’s parameters by using binary files called checkpoint files, mapping the tensor values to the names of the variables. (Here take note that the the saving use sess as the handle, the saved parameter values to have to exactly match the current graph when restored.)

saver = tf.train.Saver(max_to_keep=7, keep_checkpoint_every_n_hours=0.5)

checkpoints and summary

You save the model to checkpoints because the Variables in the model, including neural network weights and biases and the global_step counter, keep changing during the training process. The structure of the model doesn’t change. The saved checkpoints allow you to load the trained model for serving and to resume training later.

Supervisor帮助我们处理一些事情 (1)自动去checkpoint加载数据或初始化数据
(3)不需要创建summary writer



tf.control_dependencies(self, control_inputs)  

Arguments:control_inputs: A list of ‘Operation’ or ‘Tensor’ objects which must be executed or computed before running the operations defined in the context. (note control_inputs is a list)
Return: A context manager that specifies control dependencies for all operations constructed within the context.


 with tf.control_dependencies([a, b, c]):
       # 'd' and 'e' will only run after 'a', 'b', and 'c' have executed.  
       d = ...  
       e = ...  

Variable Scope

name_scope, variable_scope目的:

  1. 减少训练参数的个数
  2. 区别同名变量



tf.get_variable(<name>, <shape>, <initializer>)  

tf.get_variable 和tf.Variable的区别
tf.get_variable 拥有一个变量检查机制,会检测已经存在的变量是否设置为共享变量,如果已经存在的变量没有设置为共享变量,TensorFlow 运行到第二个拥有相同名字的变量的时候,就会报错。

def my_image_filter(input_images):
    conv1_weights = tf.Variable(tf.random_normal([5, 5, 32, 32]),
    conv1_biases = tf.Variable(tf.zeros([32]), name="conv1_biases")
    conv1 = tf.nn.conv2d(input_images, conv1_weights,
        strides=[1, 1, 1, 1], padding='SAME')
    return  tf.nn.relu(conv1 + conv1_biases)

有两个变量(Variables) conv1_weighs, conv1_biases和一个操作(Op) conv1,如果你直接调用两次,不会出什么问题,但是会生成两套变量

# First call creates one set of 2 variables.
result1 = my_image_filter(image1)
# Another set of 2 variables is created in the second call.
result2 = my_image_filter(image2)

如果把 tf.Variable 改成 tf.get_variable,直接调用两次,就会出问题了:

result1 = my_image_filter(image1)
result2 = my_image_filter(image2)
# Raises ValueError(... conv1/weights already exists ...)

为了解决这个问题,TensorFlow 又提出了 tf.variable_scope 函数:它的主要作用是,在一个作用域 scope 内共享一些变量, 可以有如下几种用法

with tf.variable_scope("image_filters") as scope:  
    result1 = my_image_filter(image1)  
    scope.reuse_variables() # or   
    result2 = my_image_filter(image2)  

需要注意的是:最好不要设置 reuse 标识为 False,只在需要的时候设置 reuse 标识为 True。

with tf.variable_scope("image_filters1") as scope1:  
    result1 = my_image_filter(image1)  
with tf.variable_scope(scope1, reuse = True)  
    result2 = my_image_filter(image2)  

通常情况下,tf.variable_scope 和 tf.name_scope 配合,能画出非常漂亮的流程图,但是他们两个之间又有着细微的差别,那就是 name_scope 只能管住操作 Ops 的名字,而管不住变量 Variables 的名字

with tf.variable_scope("foo"):  
    with tf.name_scope("bar"):  
        v = tf.get_variable("v", [1])  
        x = 1.0 + v  
assert v.name == "foo/v:0"  
assert x.op.name == "foo/bar/add"