您的位置: 专家智库 > >

国家自然科学基金(61273140)

作品数:5 被引量:30H指数:3
相关作者:刘德荣穆朝絮王鼎更多>>
相关机构:北京科技大学天津大学中国科学院自动化研究所更多>>
发文基金:国家自然科学基金北京市自然科学基金天津市自然科学基金更多>>
相关领域:理学电气工程自动化与计算机技术更多>>

文献类型

  • 4篇中文期刊文章

领域

  • 2篇理学
  • 1篇电气工程
  • 1篇自动化与计算...

主题

  • 2篇迭代
  • 1篇迭代算法
  • 1篇动态规划
  • 1篇动态规划方法
  • 1篇神经网
  • 1篇神经网络
  • 1篇数据驱动
  • 1篇数据驱动控制
  • 1篇驱动控制
  • 1篇自适应动态规...
  • 1篇最优跟踪控制
  • 1篇离散混沌系统
  • 1篇混沌系统
  • 1篇跟踪控制
  • 1篇OPTIMA...
  • 1篇ADP
  • 1篇BATCH
  • 1篇CHAOTI...
  • 1篇CONSTR...
  • 1篇FEATUR...

机构

  • 1篇北京科技大学
  • 1篇天津大学
  • 1篇中国科学院自...

作者

  • 1篇王鼎
  • 1篇穆朝絮
  • 1篇刘德荣

传媒

  • 1篇自动化学报
  • 1篇Chines...
  • 1篇Intern...
  • 1篇IEEE/C...

年份

  • 2篇2017
  • 2篇2015
5 条 记 录,以下是 1-4
排序方式:
Policy iteration optimal tracking control for chaotic systems by using an adaptive dynamic programming approach被引量:1
2015年
A policy iteration algorithm of adaptive dynamic programming(ADP) is developed to solve the optimal tracking control for a class of discrete-time chaotic systems. By system transformations, the optimal tracking problem is transformed into an optimal regulation one. The policy iteration algorithm for discrete-time chaotic systems is first described. Then,the convergence and admissibility properties of the developed policy iteration algorithm are presented, which show that the transformed chaotic system can be stabilized under an arbitrary iterative control law and the iterative performance index function simultaneously converges to the optimum. By implementing the policy iteration algorithm via neural networks,the developed optimal tracking control scheme for chaotic systems is verified by a simulation.
魏庆来刘德荣徐延才
关键词:离散混沌系统最优跟踪控制动态规划方法策略迭代迭代算法
基于迭代神经动态规划的数据驱动非线性近似最优调节被引量:10
2017年
利用数据驱动控制思想,建立一种设计离散时间非线性系统近似最优调节器的迭代神经动态规划方法.提出针对离散时间一般非线性系统的迭代自适应动态规划算法并且证明其收敛性与最优性.通过构建三种神经网络,给出全局二次启发式动态规划技术及其详细的实现过程,其中执行网络是在神经动态规划的框架下进行训练.这种新颖的结构可以近似代价函数及其导函数,同时在不依赖系统动态的情况下自适应地学习近似最优控制律.值得注意的是,这在降低对于控制矩阵或者其神经网络表示的要求方面,明显地改进了迭代自适应动态规划算法的现有结果,能够促进复杂非线性系统基于数据的优化与控制设计的发展.通过两个仿真实验,验证本文提出的数据驱动最优调节方法的有效性.
王鼎穆朝絮刘德荣
关键词:自适应动态规划数据驱动控制神经网络
Feature Selection and Feature Learning for High-dimensional Batch Reinforcement Learning: A Survey被引量:2
2015年
Tremendous amount of data are being generated and saved in many complex engineering and social systems every day.It is significant and feasible to utilize the big data to make better decisions by machine learning techniques. In this paper, we focus on batch reinforcement learning(RL) algorithms for discounted Markov decision processes(MDPs) with large discrete or continuous state spaces, aiming to learn the best possible policy given a fixed amount of training data. The batch RL algorithms with handcrafted feature representations work well for low-dimensional MDPs. However, for many real-world RL tasks which often involve high-dimensional state spaces, it is difficult and even infeasible to use feature engineering methods to design features for value function approximation. To cope with high-dimensional RL problems, the desire to obtain data-driven features has led to a lot of works in incorporating feature selection and feature learning into traditional batch RL algorithms. In this paper, we provide a comprehensive survey on automatic feature selection and unsupervised feature learning for high-dimensional batch RL. Moreover, we present recent theoretical developments on applying statistical learning to establish finite-sample error bounds for batch RL algorithms based on weighted Lpnorms. Finally, we derive some future directions in the research of RL algorithms, theories and applications.
De-Rong LiuHong-Liang LiDing Wang
关键词:REINFORCEMENTLEARNINGFEATUREFEATURELEARNING
Optimal Constrained Self-learning Battery Sequential Management in Microgrid Via Adaptive Dynamic Programming被引量:13
2017年
This paper concerns a novel optimal self-learning battery sequential control scheme for smart home energy systems.The main idea is to use the adaptive dynamic programming(ADP) technique to obtain the optimal battery sequential control iteratively. First, the battery energy management system model is established, where the power efficiency of the battery is considered. Next, considering the power constraints of the battery, a new non-quadratic form performance index function is established, which guarantees that the value of the iterative control law cannot exceed the maximum charging/discharging power of the battery to extend the service life of the battery.Then, the convergence properties of the iterative ADP algorithm are analyzed, which guarantees that the iterative value function and the iterative control law both reach the optimums. Finally,simulation and comparison results are given to illustrate the performance of the presented method.
Qinglai WeiDerong LiuYu LiuRuizhuo Song
共1页<1>
聚类工具0