<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Privacy | Junfei Zhan's Website</title><link>https://junfei-z.github.io/tags/privacy/</link><atom:link href="https://junfei-z.github.io/tags/privacy/index.xml" rel="self" type="application/rss+xml"/><description>Privacy</description><generator>Hugo Blox Builder (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Tue, 20 Jan 2026 00:00:00 +0000</lastBuildDate><image><url>https://junfei-z.github.io/media/icon_hu70bcee51a3cd7a7338014254a2e0c844_1401285_512x512_fill_lanczos_center_3.png</url><title>Privacy</title><link>https://junfei-z.github.io/tags/privacy/</link></image><item><title>Slide - PhD Interview Talk: Research Interests in Cloud-Edge AI</title><link>https://junfei-z.github.io/samples/2_ic_interview/</link><pubDate>Tue, 20 Jan 2026 00:00:00 +0000</pubDate><guid>https://junfei-z.github.io/samples/2_ic_interview/</guid><description>&lt;p>PhD interview presentation for Imperial College Computing, covering research interests in cloud-edge collaborative AI inference.&lt;br>
Topics include privacy-aware inference routing, distributed LLM deployment on heterogeneous edge devices, and system-level optimization for resource-constrained environments.&lt;/p></description></item><item><title>PRISM: Privacy-Aware Routing for Adaptive Cloud–Edge LLM Inference with Semantic Sketch Collaboration</title><link>https://junfei-z.github.io/research/prism/</link><pubDate>Wed, 30 Jul 2025 00:00:00 +0000</pubDate><guid>https://junfei-z.github.io/research/prism/</guid><description>&lt;a href="https://junfei-z.github.io/prism_full.pdf" target="_blank">
&lt;img src="https://img.shields.io/badge/View%20Full%20Paper-PDF-red?logo=adobeacrobatreader&amp;logoColor=white" alt="PDF">
&lt;/a>
&lt;p>📄 [Accepted at 2026 AAAI Conference on Artificial Intelligence] — To appear&lt;/p>
&lt;p>This project introduces &lt;strong>PRISM&lt;/strong>, a context-aware cloud–edge inference framework that balances privacy, utility, and efficiency for &lt;strong>Large Language Model (LLM)&lt;/strong> services. It addresses the key limitations of uniform privacy mechanisms by adapting protection based on &lt;strong>semantic sensitivity&lt;/strong> of user inputs.&lt;/p>
&lt;h2 id="objectives">Objectives&lt;/h2>
&lt;p>The primary goal is to enable &lt;strong>privacy-preserving LLM inference&lt;/strong> in real-world deployments, where sensitive user prompts are routed intelligently between edge devices and the cloud. PRISM is designed to:&lt;/p>
&lt;ul>
&lt;li>Avoid unnecessary noise for benign inputs&lt;/li>
&lt;li>Preserve semantic coherence in sensitive prompts&lt;/li>
&lt;li>Reduce latency and energy consumption without compromising utility&lt;/li>
&lt;/ul>
&lt;h2 id="key-contributions">Key Contributions&lt;/h2>
&lt;h3 id="semantic-sensitive-execution-routing">Semantic-Sensitive Execution Routing&lt;/h3>
&lt;ul>
&lt;li>A &lt;strong>soft gating controller&lt;/strong> on the edge scores entity-level risk using contextual features (e.g., named entities, first-person references)&lt;/li>
&lt;li>Routes prompts to one of three execution paths:
&lt;ul>
&lt;li>&lt;strong>Edge-only&lt;/strong> for high-risk prompts&lt;/li>
&lt;li>&lt;strong>Cloud-only&lt;/strong> for low-risk prompts&lt;/li>
&lt;li>&lt;strong>Cloud–Edge Collaboration&lt;/strong> for mid-sensitivity prompts&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h3 id="adaptive-two-layer-local-differential-privacy-ldp">Adaptive Two-Layer Local Differential Privacy (LDP)&lt;/h3>
&lt;ul>
&lt;li>Each sensitive entity is obfuscated through:
&lt;ul>
&lt;li>Category-level perturbation (e.g., masking &amp;ldquo;Diagnosis&amp;rdquo;)&lt;/li>
&lt;li>Value-level perturbation (e.g., replacing &amp;ldquo;HIV&amp;rdquo; with &amp;ldquo;Flu&amp;rdquo;)&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Privacy budget allocation is guided by a sensitivity weight model ensuring &lt;strong>fine-grained protection without semantic collapse&lt;/strong>&lt;/li>
&lt;/ul>
&lt;h3 id="semantic-sketch-collaboration-protocol">Semantic Sketch Collaboration Protocol&lt;/h3>
&lt;ul>
&lt;li>Noisy prompts are processed in the cloud to generate &lt;strong>semantic sketches&lt;/strong> (e.g., high-level abstract responses)&lt;/li>
&lt;li>The edge-side &lt;strong>Small Language Model (SLM)&lt;/strong> refines these sketches using the original context&lt;/li>
&lt;li>Enables &lt;strong>high-utility responses under strong privacy constraints&lt;/strong>&lt;/li>
&lt;/ul>
&lt;h2 id="results--insights">Results &amp;amp; Insights&lt;/h2>
&lt;ul>
&lt;li>PRISM achieves &lt;strong>up to 3× lower latency&lt;/strong> and &lt;strong>2.5× lower energy consumption&lt;/strong> than baselines like Uniform and Selective LDP&lt;/li>
&lt;li>Delivers &lt;strong>higher LLM-Judge scores (up to 7.2)&lt;/strong> under strong privacy budgets&lt;/li>
&lt;li>Outperforms state-of-the-art methods (e.g., Split-and-Denoise, DP-Forward) in terms of both utility and efficiency&lt;/li>
&lt;li>Robust across &lt;strong>8 different model combinations&lt;/strong> (e.g., GPT-4o + StableLM)&lt;/li>
&lt;/ul>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Method&lt;/th>
&lt;th>Ct.(s)&lt;/th>
&lt;th>Ec.(J)&lt;/th>
&lt;th>IQ.&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>PRISM&lt;/td>
&lt;td>7.92&lt;/td>
&lt;td>687.2&lt;/td>
&lt;td>6.88&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Uniform LDP&lt;/td>
&lt;td>20.56&lt;/td>
&lt;td>1707.6&lt;/td>
&lt;td>5.72&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Selective LDP&lt;/td>
&lt;td>21.22&lt;/td>
&lt;td>1770.8&lt;/td>
&lt;td>5.94&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Edge-Only&lt;/td>
&lt;td>17.84&lt;/td>
&lt;td>1573.9&lt;/td>
&lt;td>5.09&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Cloud-Only&lt;/td>
&lt;td>&lt;strong>5.13&lt;/strong>&lt;/td>
&lt;td>&lt;strong>296.3&lt;/strong>&lt;/td>
&lt;td>&lt;strong>8.14&lt;/strong>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h2 id="broader-impact">Broader Impact&lt;/h2>
&lt;p>PRISM enables &lt;strong>selective privacy-preserving inference&lt;/strong> for sensitive domains such as &lt;strong>medical, financial, and personal assistants&lt;/strong>, paving the way for:&lt;/p>
&lt;ul>
&lt;li>Deploying LLMs responsibly in &lt;strong>privacy-critical environments&lt;/strong>&lt;/li>
&lt;li>Reducing energy costs in &lt;strong>cloud-edge infrastructure&lt;/strong>&lt;/li>
&lt;li>Bridging the tradeoff between &lt;strong>privacy and inference quality&lt;/strong>&lt;/li>
&lt;/ul></description></item><item><title>PRISM: Privacy-Aware Routing for Adaptive Cloud–Edge LLM Inference with Semantic Sketch Collaboration</title><link>https://junfei-z.github.io/zh/research/prism/</link><pubDate>Wed, 30 Jul 2025 00:00:00 +0000</pubDate><guid>https://junfei-z.github.io/zh/research/prism/</guid><description>&lt;a href="https://junfei-z.github.io/prism_full.pdf" target="_blank">
&lt;img src="https://img.shields.io/badge/View%20Full%20Paper-PDF-red?logo=adobeacrobatreader&amp;logoColor=white" alt="PDF">
&lt;/a>
&lt;p>[已被 2026 AAAI Conference on Artificial Intelligence 录用] — 即将发表&lt;/p>
&lt;p>本项目提出了 &lt;strong>PRISM&lt;/strong>，一个上下文感知的云-边推理框架，为 &lt;strong>Large Language Model (LLM)&lt;/strong> 服务在隐私、效用和效率之间取得平衡。它通过根据用户输入的&lt;strong>语义敏感度&lt;/strong>自适应调整保护策略，解决了统一隐私机制的关键局限。&lt;/p>
&lt;h2 id="目标">目标&lt;/h2>
&lt;p>主要目标是在实际部署中实现&lt;strong>隐私保护的 LLM 推理&lt;/strong>，将敏感的用户提示智能地路由到边缘设备和云端之间。PRISM 旨在：&lt;/p>
&lt;ul>
&lt;li>避免对无害输入添加不必要的噪声&lt;/li>
&lt;li>保持敏感提示的语义连贯性&lt;/li>
&lt;li>在不损害效用的前提下降低延迟和能耗&lt;/li>
&lt;/ul>
&lt;h2 id="主要贡献">主要贡献&lt;/h2>
&lt;h3 id="语义敏感的执行路由">语义敏感的执行路由&lt;/h3>
&lt;ul>
&lt;li>边缘端的&lt;strong>软门控控制器&lt;/strong>利用上下文特征（例如命名实体、第一人称引用）评估实体级风险&lt;/li>
&lt;li>将提示路由到三条执行路径之一：
&lt;ul>
&lt;li>&lt;strong>仅边缘&lt;/strong>：用于高风险提示&lt;/li>
&lt;li>&lt;strong>仅云端&lt;/strong>：用于低风险提示&lt;/li>
&lt;li>&lt;strong>云-边协作&lt;/strong>：用于中等敏感度提示&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h3 id="自适应两层-local-differential-privacy-ldp">自适应两层 Local Differential Privacy (LDP)&lt;/h3>
&lt;ul>
&lt;li>每个敏感实体通过以下方式进行混淆：
&lt;ul>
&lt;li>类别级扰动（例如掩蔽&amp;quot;诊断&amp;quot;）&lt;/li>
&lt;li>值级扰动（例如将&amp;quot;HIV&amp;quot;替换为&amp;quot;Flu&amp;quot;）&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>隐私预算分配由敏感度权重模型引导，确保&lt;strong>细粒度保护且不造成语义崩塌&lt;/strong>&lt;/li>
&lt;/ul>
&lt;h3 id="语义草图协作协议">语义草图协作协议&lt;/h3>
&lt;ul>
&lt;li>带噪声的提示在云端处理，生成&lt;strong>语义草图&lt;/strong>（例如高层次的抽象回复）&lt;/li>
&lt;li>边缘端的 &lt;strong>Small Language Model (SLM)&lt;/strong> 利用原始上下文精化这些草图&lt;/li>
&lt;li>在&lt;strong>强隐私约束下实现高效用回复&lt;/strong>&lt;/li>
&lt;/ul>
&lt;h2 id="结果与洞察">结果与洞察&lt;/h2>
&lt;ul>
&lt;li>PRISM 相比 Uniform 和 Selective LDP 等基线方法，实现了&lt;strong>最高 3 倍的延迟降低&lt;/strong>和 &lt;strong>2.5 倍的能耗降低&lt;/strong>&lt;/li>
&lt;li>在强隐私预算下提供&lt;strong>更高的 LLM-Judge 评分（最高 7.2）&lt;/strong>&lt;/li>
&lt;li>在效用和效率方面均优于现有最先进方法（例如 Split-and-Denoise、DP-Forward）&lt;/li>
&lt;li>在 &lt;strong>8 种不同模型组合&lt;/strong>（例如 GPT-4o + StableLM）上表现稳健&lt;/li>
&lt;/ul>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Method&lt;/th>
&lt;th>Ct.(s)&lt;/th>
&lt;th>Ec.(J)&lt;/th>
&lt;th>IQ.&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>PRISM&lt;/td>
&lt;td>7.92&lt;/td>
&lt;td>687.2&lt;/td>
&lt;td>6.88&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Uniform LDP&lt;/td>
&lt;td>20.56&lt;/td>
&lt;td>1707.6&lt;/td>
&lt;td>5.72&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Selective LDP&lt;/td>
&lt;td>21.22&lt;/td>
&lt;td>1770.8&lt;/td>
&lt;td>5.94&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Edge-Only&lt;/td>
&lt;td>17.84&lt;/td>
&lt;td>1573.9&lt;/td>
&lt;td>5.09&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Cloud-Only&lt;/td>
&lt;td>&lt;strong>5.13&lt;/strong>&lt;/td>
&lt;td>&lt;strong>296.3&lt;/strong>&lt;/td>
&lt;td>&lt;strong>8.14&lt;/strong>&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h2 id="更广泛的影响">更广泛的影响&lt;/h2>
&lt;p>PRISM 为&lt;strong>医疗、金融和个人助理&lt;/strong>等敏感领域提供了&lt;strong>选择性隐私保护推理&lt;/strong>，为以下方向铺平了道路：&lt;/p>
&lt;ul>
&lt;li>在&lt;strong>隐私关键环境&lt;/strong>中负责任地部署 LLM&lt;/li>
&lt;li>降低&lt;strong>云-边基础设施&lt;/strong>的能耗成本&lt;/li>
&lt;li>弥合&lt;strong>隐私与推理质量&lt;/strong>之间的权衡&lt;/li>
&lt;/ul></description></item></channel></rss>