Privacy | Junfei Zhan's Website

Slide - PhD Interview Talk: Research Interests in Cloud-Edge AI

Tue, 20 Jan 2026 00:00:00 +0000

PhD interview presentation for Imperial College Computing, covering research interests in cloud-edge collaborative AI inference.
Topics include privacy-aware inference routing, distributed LLM deployment on heterogeneous edge devices, and system-level optimization for resource-constrained environments.

PRISM: Privacy-Aware Routing for Adaptive Cloud–Edge LLM Inference with Semantic Sketch Collaboration

Wed, 30 Jul 2025 00:00:00 +0000

📄 [Accepted at 2026 AAAI Conference on Artificial Intelligence] — To appear

This project introduces PRISM, a context-aware cloud–edge inference framework that balances privacy, utility, and efficiency for Large Language Model (LLM) services. It addresses the key limitations of uniform privacy mechanisms by adapting protection based on semantic sensitivity of user inputs.

Objectives

The primary goal is to enable privacy-preserving LLM inference in real-world deployments, where sensitive user prompts are routed intelligently between edge devices and the cloud. PRISM is designed to:

Avoid unnecessary noise for benign inputs
Preserve semantic coherence in sensitive prompts
Reduce latency and energy consumption without compromising utility

Key Contributions

Semantic-Sensitive Execution Routing

A soft gating controller on the edge scores entity-level risk using contextual features (e.g., named entities, first-person references)
Routes prompts to one of three execution paths:
- Edge-only for high-risk prompts
- Cloud-only for low-risk prompts
- Cloud–Edge Collaboration for mid-sensitivity prompts

Adaptive Two-Layer Local Differential Privacy (LDP)

Each sensitive entity is obfuscated through:
- Category-level perturbation (e.g., masking “Diagnosis”)
- Value-level perturbation (e.g., replacing “HIV” with “Flu”)
Privacy budget allocation is guided by a sensitivity weight model ensuring fine-grained protection without semantic collapse

Semantic Sketch Collaboration Protocol

Noisy prompts are processed in the cloud to generate semantic sketches (e.g., high-level abstract responses)
The edge-side Small Language Model (SLM) refines these sketches using the original context
Enables high-utility responses under strong privacy constraints

Results & Insights

PRISM achieves up to 3× lower latency and 2.5× lower energy consumption than baselines like Uniform and Selective LDP
Delivers higher LLM-Judge scores (up to 7.2) under strong privacy budgets
Outperforms state-of-the-art methods (e.g., Split-and-Denoise, DP-Forward) in terms of both utility and efficiency
Robust across 8 different model combinations (e.g., GPT-4o + StableLM)

Method	Ct.(s)	Ec.(J)	IQ.
PRISM	7.92	687.2	6.88
Uniform LDP	20.56	1707.6	5.72
Selective LDP	21.22	1770.8	5.94
Edge-Only	17.84	1573.9	5.09
Cloud-Only	5.13	296.3	8.14

Broader Impact

PRISM enables selective privacy-preserving inference for sensitive domains such as medical, financial, and personal assistants, paving the way for:

Deploying LLMs responsibly in privacy-critical environments
Reducing energy costs in cloud-edge infrastructure
Bridging the tradeoff between privacy and inference quality

PRISM: Privacy-Aware Routing for Adaptive Cloud–Edge LLM Inference with Semantic Sketch Collaboration

Wed, 30 Jul 2025 00:00:00 +0000

[已被 2026 AAAI Conference on Artificial Intelligence 录用] — 即将发表

本项目提出了 PRISM，一个上下文感知的云-边推理框架，为 Large Language Model (LLM) 服务在隐私、效用和效率之间取得平衡。它通过根据用户输入的语义敏感度自适应调整保护策略，解决了统一隐私机制的关键局限。

目标

主要目标是在实际部署中实现隐私保护的 LLM 推理，将敏感的用户提示智能地路由到边缘设备和云端之间。PRISM 旨在：

避免对无害输入添加不必要的噪声
保持敏感提示的语义连贯性
在不损害效用的前提下降低延迟和能耗

主要贡献

语义敏感的执行路由

边缘端的软门控控制器利用上下文特征（例如命名实体、第一人称引用）评估实体级风险
将提示路由到三条执行路径之一：
- 仅边缘：用于高风险提示
- 仅云端：用于低风险提示
- 云-边协作：用于中等敏感度提示

自适应两层 Local Differential Privacy (LDP)

每个敏感实体通过以下方式进行混淆：
- 类别级扰动（例如掩蔽"诊断"）
- 值级扰动（例如将"HIV"替换为"Flu"）
隐私预算分配由敏感度权重模型引导，确保细粒度保护且不造成语义崩塌

语义草图协作协议

带噪声的提示在云端处理，生成语义草图（例如高层次的抽象回复）
边缘端的 Small Language Model (SLM) 利用原始上下文精化这些草图
在强隐私约束下实现高效用回复

结果与洞察

PRISM 相比 Uniform 和 Selective LDP 等基线方法，实现了最高 3 倍的延迟降低和 2.5 倍的能耗降低
在强隐私预算下提供更高的 LLM-Judge 评分（最高 7.2）
在效用和效率方面均优于现有最先进方法（例如 Split-and-Denoise、DP-Forward）
在 8 种不同模型组合（例如 GPT-4o + StableLM）上表现稳健

Method	Ct.(s)	Ec.(J)	IQ.
PRISM	7.92	687.2	6.88
Uniform LDP	20.56	1707.6	5.72
Selective LDP	21.22	1770.8	5.94
Edge-Only	17.84	1573.9	5.09
Cloud-Only	5.13	296.3	8.14

更广泛的影响

PRISM 为医疗、金融和个人助理等敏感领域提供了选择性隐私保护推理，为以下方向铺平了道路：

在隐私关键环境中负责任地部署 LLM
降低云-边基础设施的能耗成本
弥合隐私与推理质量之间的权衡