<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Reinforcement Learning | 占俊飞的个人主页</title><link>https://junfei-z.github.io/zh/tags/reinforcement-learning/</link><atom:link href="https://junfei-z.github.io/zh/tags/reinforcement-learning/index.xml" rel="self" type="application/rss+xml"/><description>Reinforcement Learning</description><generator>Hugo Blox Builder (https://hugoblox.com)</generator><language>zh-Hans</language><lastBuildDate>Wed, 07 May 2025 00:00:00 +0000</lastBuildDate><image><url>https://junfei-z.github.io/media/icon_hu70bcee51a3cd7a7338014254a2e0c844_1401285_512x512_fill_lanczos_center_3.png</url><title>Reinforcement Learning</title><link>https://junfei-z.github.io/zh/tags/reinforcement-learning/</link></image><item><title>RL-Enhanced Disturbance-Aware MPC for Robust UAV Trajectory Tracking</title><link>https://junfei-z.github.io/zh/research/rl-enhanced-disturbance-aware-mpc-for-robust-uav-trajectory-tracking/</link><pubDate>Wed, 07 May 2025 00:00:00 +0000</pubDate><guid>https://junfei-z.github.io/zh/research/rl-enhanced-disturbance-aware-mpc-for-robust-uav-trajectory-tracking/</guid><description>&lt;a href="https://junfei-z.github.io/uav_control.pdf" target="_blank">
&lt;img src="https://img.shields.io/badge/View%20Full%20Paper-PDF-red?logo=adobeacrobatreader&amp;logoColor=white" alt="PDF">
&lt;/a>
&lt;p>[已被 IEEE SMC 2025 录用] — 即将发表&lt;/p>
&lt;p>本研究提出了 &lt;strong>ROAM&lt;/strong>，一种新颖的 RL 增强、扰动感知的 MPC 框架，用于不确定和动态环境中的&lt;strong>精确 UAV 轨迹跟踪&lt;/strong>。该方法结合了 MPC 的预测优势、reinforcement learning (RL) 的快速响应能力以及自适应 sliding mode observer (SMO) 的鲁棒性。&lt;/p>
&lt;h2 id="问题与动机">问题与动机&lt;/h2>
&lt;p>使用 MPC 的传统 UAV 控制器在&lt;strong>模型失配&lt;/strong>、&lt;strong>风扰动&lt;/strong>和&lt;strong>计算延迟&lt;/strong>下表现不佳，导致残余跟踪误差和收敛缓慢。本工作通过两项创新解决这些挑战：&lt;/p>
&lt;ul>
&lt;li>&lt;strong>离线训练的 RL 热启动策略&lt;/strong>以加速 MPC 收敛&lt;/li>
&lt;li>&lt;strong>Adaptive Super-Twisting Sliding Mode Observer (AST-SMO)&lt;/strong> 以估计和抑制实时扰动&lt;/li>
&lt;/ul>
&lt;h2 id="技术贡献">技术贡献&lt;/h2>
&lt;h3 id="1-基于-rl-的热启动">1. 基于 RL 的热启动&lt;/h3>
&lt;ul>
&lt;li>通过在专家 MPC 轨迹上进行模仿学习，训练了一个&lt;strong>方向条件策略&lt;/strong>。&lt;/li>
&lt;li>在实时控制中，它为 MPC 求解器提供&lt;strong>与轨迹一致的初始猜测&lt;/strong>，将早期跟踪误差降低了 &lt;strong>16.9%&lt;/strong>，计算时间减少了 &lt;strong>38.7%&lt;/strong>。&lt;/li>
&lt;/ul>
&lt;h3 id="2-用于扰动估计的-ast-smo">2. 用于扰动估计的 AST-SMO&lt;/h3>
&lt;ul>
&lt;li>SMO 使用平滑双曲函数实时估计外部扰动，以避免抖振。&lt;/li>
&lt;li>自适应增益调节机制动态调整灵敏度以实现更好的收敛。&lt;/li>
&lt;/ul>
&lt;h3 id="3-扰动感知-mpc">3. 扰动感知 MPC&lt;/h3>
&lt;ul>
&lt;li>MPC 被重新构建以纳入来自 AST-SMO 的实时估计：
\[
x_{k+1} = Ax_k + Bu_k + E(\hat{d}_k)
\]&lt;/li>
&lt;li>目标：最小化跟踪误差和控制能耗，同时维持系统约束。&lt;/li>
&lt;/ul>
&lt;h2 id="仿真结果">仿真结果&lt;/h2>
&lt;ul>
&lt;li>在正弦和噪声扰动下的 12 自由度四旋翼模型上进行了评估。&lt;/li>
&lt;li>ROAM 实现了：
&lt;ul>
&lt;li>早期跟踪精度提升 16.9%&lt;/li>
&lt;li>计算时间减少 38.7%&lt;/li>
&lt;li>在强外部扰动下相比经典 MPC 具有更优的轨迹跟随性能&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h2 id="结论">结论&lt;/h2>
&lt;p>ROAM 表明，&lt;strong>RL、观测器与 MPC 的深度集成&lt;/strong>可产生具有更快收敛速度、更好稳定性和更高韧性的控制系统。其轻量化和模块化设计使其非常适合在嵌入式 UAV 平台上进行&lt;strong>实时部署&lt;/strong>。&lt;/p>
&lt;!-- [Hugo Blox Builder](https://hugoblox.com) is designed to give technical content creators a seamless experience. You can focus on the content and the Hugo Blox Builder which this template is built upon handles the rest.
**Embed videos, podcasts, code, LaTeX math, and even test students!**
On this page, you'll find some examples of the types of technical content that can be rendered with Hugo Blox.
## Video
Teach your course by sharing videos with your students. Choose from one of the following approaches:
**Youtube**:
{{&lt; youtube w7Ft2ymGmfc >}}
**Bilibili**:
{{&lt; bilibili id="BV1WV4y1r7DF" >}}
**Video file**
Videos may be added to a page by either placing them in your `assets/media/` media library or in your [page's folder](https://gohugo.io/content-management/page-bundles/), and then embedding them with the _video_ shortcode:
{{&lt; video src="my_video.mp4" controls="yes" >}}
## Podcast
You can add a podcast or music to a page by placing the MP3 file in the page's folder or the media library folder and then embedding the audio on your page with the _audio_ shortcode:
{{&lt; audio src="ambient-piano.mp3" >}}
Try it out:
&lt;audio controls >
&lt;source src="https://junfei-z.github.io/zh/research/rl-enhanced-disturbance-aware-mpc-for-robust-uav-trajectory-tracking/ambient-piano.mp3" type="audio/mpeg">
&lt;/audio>
## Test students
Provide a simple yet fun self-assessment by revealing the solutions to challenges with the `spoiler` shortcode:
```markdown
{{&lt; spoiler text="👉 Click to view the solution" >}}
You found me!
{{&lt; /spoiler >}}
```
renders as
&lt;details class="spoiler " id="spoiler-2">
&lt;summary class="cursor-pointer">👉 Click to view the solution&lt;/summary>
&lt;div class="rounded-lg bg-neutral-50 dark:bg-neutral-800 p-2">
You found me 🎉
&lt;/div>
&lt;/details>
## Math
Hugo Blox Builder supports a Markdown extension for $\LaTeX$ math. You can enable this feature by toggling the `math` option in your `config/_default/params.yaml` file.
To render _inline_ or _block_ math, wrap your LaTeX math with `{{&lt; math >}}$...${{&lt; /math >}}` or `{{&lt; math >}}$$...$${{&lt; /math >}}`, respectively.
&lt;div class="flex px-4 py-3 mb-6 rounded-md bg-primary-100 dark:bg-primary-900">
&lt;span class="pr-3 pt-1 text-primary-600 dark:text-primary-300">
&lt;svg height="24" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">&lt;path fill="none" stroke="currentColor" stroke-linecap="round" stroke-linejoin="round" stroke-width="1.5" d="m11.25 11.25l.041-.02a.75.75 0 0 1 1.063.852l-.708 2.836a.75.75 0 0 0 1.063.853l.041-.021M21 12a9 9 0 1 1-18 0a9 9 0 0 1 18 0m-9-3.75h.008v.008H12z"/>&lt;/svg>
&lt;/span>
&lt;span class="dark:text-neutral-300">We wrap the LaTeX math in the Hugo Blox &lt;em>math&lt;/em> shortcode to prevent Hugo rendering our math as Markdown.&lt;/span>
&lt;/div>
Example **math block**:
```latex
{{&lt; math >}}
$$
\gamma_{n} = \frac{ \left | \left (\mathbf x_{n} - \mathbf x_{n-1} \right )^T \left [\nabla F (\mathbf x_{n}) - \nabla F (\mathbf x_{n-1}) \right ] \right |}{\left \|\nabla F(\mathbf{x}_{n}) - \nabla F(\mathbf{x}_{n-1}) \right \|^2}
$$
{{&lt; /math >}}
```
renders as
$$\gamma_{n} = \frac{ \left | \left (\mathbf x_{n} - \mathbf x_{n-1} \right )^T \left [\nabla F (\mathbf x_{n}) - \nabla F (\mathbf x_{n-1}) \right ] \right |}{\left \|\nabla F(\mathbf{x}_{n}) - \nabla F(\mathbf{x}_{n-1}) \right \|^2}$$
Example **inline math** `{{&lt; math >}}$\nabla F(\mathbf{x}_{n})${{&lt; /math >}}` renders as $\nabla F(\mathbf{x}_{n})$
.
Example **multi-line math** using the math linebreak (`\\`):
```latex
{{&lt; math >}}
$$f(k;p_{0}^{*}) = \begin{cases}p_{0}^{*} &amp; \text{if }k=1, \\
1-p_{0}^{*} &amp; \text{if }k=0.\end{cases}$$
{{&lt; /math >}}
```
renders as
$$
f(k;p_{0}^{*}) = \begin{cases}p_{0}^{*} &amp; \text{if }k=1, \\
1-p_{0}^{*} &amp; \text{if }k=0.\end{cases}
$$
## Code
Hugo Blox Builder utilises Hugo's Markdown extension for highlighting code syntax. The code theme can be selected in the `config/_default/params.yaml` file.
```python
import pandas as pd
data = pd.read_csv("data.csv")
data.head()
```
renders as
```python
import pandas as pd
data = pd.read_csv("data.csv")
data.head()
```
## Inline Images
```go
{{&lt; icon name="python" >}} Python
```
renders as
&lt;span class="inline-block pr-1">
&lt;svg style="height: 1em; transform: translateY(0.1em);" xmlns="http://www.w3.org/2000/svg" height="1em" viewBox="0 0 448 512" fill="currentColor">&lt;path d="M439.8 200.5c-7.7-30.9-22.3-54.2-53.4-54.2h-40.1v47.4c0 36.8-31.2 67.8-66.8 67.8H172.7c-29.2 0-53.4 25-53.4 54.3v101.8c0 29 25.2 46 53.4 54.3 33.8 9.9 66.3 11.7 106.8 0 26.9-7.8 53.4-23.5 53.4-54.3v-40.7H226.2v-13.6h160.2c31.1 0 42.6-21.7 53.4-54.2 11.2-33.5 10.7-65.7 0-108.6zM286.2 404c11.1 0 20.1 9.1 20.1 20.3 0 11.3-9 20.4-20.1 20.4-11 0-20.1-9.2-20.1-20.4.1-11.3 9.1-20.3 20.1-20.3zM167.8 248.1h106.8c29.7 0 53.4-24.5 53.4-54.3V91.9c0-29-24.4-50.7-53.4-55.6-35.8-5.9-74.7-5.6-106.8.1-45.2 8-53.4 24.7-53.4 55.6v40.7h106.9v13.6h-147c-31.1 0-58.3 18.7-66.8 54.2-9.8 40.7-10.2 66.1 0 108.6 7.6 31.6 25.7 54.2 56.8 54.2H101v-48.8c0-35.3 30.5-66.4 66.8-66.4zm-6.7-142.6c-11.1 0-20.1-9.1-20.1-20.3.1-11.3 9-20.4 20.1-20.4 11 0 20.1 9.2 20.1 20.4s-9 20.3-20.1 20.3z"/>&lt;/svg>
&lt;/span> Python
## Did you find this page helpful? Consider sharing it 🙌 --></description></item><item><title>基于 Reinforcement Learning 的接触网络随机疫苗分配策略</title><link>https://junfei-z.github.io/zh/project/2_stock/</link><pubDate>Mon, 17 Mar 2025 00:00:00 +0000</pubDate><guid>https://junfei-z.github.io/zh/project/2_stock/</guid><description>&lt;p>将确定性最优控制与 Reinforcement Learning 相结合，开发了个体级接触网络上的随机疫苗分配策略，实现了鲁棒的疫情响应建模。&lt;/p>
&lt;h2 id="项目亮点">项目亮点&lt;/h2>
&lt;ul>
&lt;li>在接触图上使用高维连续时间马尔可夫过程 (CTMP) 对疫情传播进行建模。&lt;/li>
&lt;li>设计了基于 Policy Gradient 的 RL 疫苗接种策略，并以 Mean-Field ODE 解作为热启动。&lt;/li>
&lt;li>在合成和真实世界网络拓扑上评估了策略在死亡率和住院率等指标上的表现。&lt;/li>
&lt;/ul>
&lt;h2 id="工具">工具&lt;/h2>
&lt;p>Python, PyTorch, NetworkX, OpenAI Gym&lt;/p></description></item></channel></rss>