# iCoLoc 中文 | [English](#english) iCoLoc 是一个基于本地大语言模型(LLM)的交互式空间共现模式挖掘系统,支持自然语言查询理解、模式挖掘、反馈学习与多轮迭代优化。 ## 项目亮点 - 自然语言到挖掘参数的自动转换 - Co-location 模式挖掘与关联规则生成 - Stage0:意图理解与冷启动偏好初始化 - Stage3:基于用户反馈的偏好学习与重排序 - Stage4:多轮迭代式交互优化 - 同时支持 CLI 与 Web 界面 ## 技术栈 - Python 3.9+ - FastAPI + Uvicorn(Web 服务) - PyTorch / Transformers(主 LLM 推理) - Sentence Transformers、NumPy(Stage0 / Stage3 句向量;与主 LLM 独立,默认自 Hugging Face 拉取 MiniLM 类模型) ## 项目结构 下列为**纳入版本管理**的顶层布局(本地数据、缓存、日志、未跟踪脚本等通常写在 `.gitignore` 中,故不单独列在目录树里): ```text iCoLoc/ ├── main.py ├── run_experiment.py ├── run_plotter.py ├── README.md ├── LICENSE ├── WEB_README.md ├── requirements.txt ├── config/ ├── data/ ├── document/ ├── results/ ├── memory/ ├── models/ ├── logs/ └── src/ ├── llm/ ├── preference/ ├── core/ ├── controller/ ├── learning/ ├── embedding/ ├── memory/ ├── web/ ├── experiment/ └── download/ ``` 若你当前仅在 `document/` 下维护 Web 说明,可复制或软链接为仓库根目录的 `WEB_README.md`,与上表一致。 ## 快速开始 ### 1) 安装依赖 ```bash cd /home/ubuntu/codebase/yexijia/保研/iCoLoc pip install -r requirements.txt ``` ### 2) 配置模型与数据 编辑 `config/config.yaml`: ```yaml model: model_name_or_path: "/path/to/your/model" adapter_name_or_path: null template: "qwen" data: data_path: "./data/beijing_poi.json" ``` #### 主 LLM 权重:用 Ollama 下载与 `config.yaml` 的关系 iCoLoc 通过 `transformers`(可选 LlamaFactory)从**本地目录**加载主模型,该目录必须是 **Hugging Face 布局**(含 `config.json`、分词器相关文件等)。`model_name_or_path` 应指向这一目录。 若你使用 **[Ollama](https://ollama.com/)** 在本机拉取模型,可先安装 Ollama,再按 [Ollama Library](https://ollama.com/library) 中的名称执行,例如: ```bash # 示例(标签以官网为准,需与你在 HF 侧准备的底座一致或兼容) ollama pull qwen2.5:7b ``` **说明**:Ollama 将模型放在自有目录(如 Linux 上多为 `~/.ollama/models`),格式服务于 `ollama run`,**不能**把该路径直接填进 `model_name_or_path`。 运行 iCoLoc 时,请**另外**准备一份 **Hugging Face 格式**权重,例如: ```bash pip install -U huggingface_hub huggingface-cli download Qwen/Qwen2.5-7B-Instruct --local-dir /你的路径/qwen2.5-7b-instruct ``` 也可从 ModelScope、国内镜像等下载同名/兼容的 HF 模型并解压到本地。将 `model.model_name_or_path` 设为该目录(建议绝对路径),`template` 与模型家族一致(如 Qwen 使用 `qwen`)。Ollama 与 HF 两套可以并存:前者便于命令行试用,后者供本项目加载。 主 LLM 与 **Sentence Transformer 嵌入模型**仍是两套权重:后者用于 Stage0 意图向量与 Stage3 模式向量,默认在 `src/learning/embedder.py` 中使用 `sentence-transformers/all-MiniLM-L6-v2`(首次运行需联网,缓存一般在 `~/.cache/huggingface/hub`)。网络受限时可使用镜像(如 `HF_ENDPOINT`),或自行下载后修改 `embedder.py` 中的模型名为本地目录路径。 ### 3) 运行系统 ```bash # CLI 交互模式 python main.py # CLI 单次查询 python main.py --query "我更关注高置信度的三阶模式" # Web 模式(默认端口 8000;监听地址默认为 0.0.0.0,局域网内可用本机 IP 访问) python main.py --web ``` ### 4) 数据集下载与转换(可选) 如果你想自己生成 POI 数据集,可以使用 `src/download` 下的脚本: ```bash # 进入项目根目录 cd /home/ubuntu/codebase/yexijia/保研/iCoLoc # 1) 安装下载所需依赖(如已安装可跳过) pip install osmnx # 2) 下载北京 POI GeoJSON(示例脚本) python src/download/1.py # 3) 转换为 iCoLoc 使用的 JSON 格式(id/type/x/y) python src/download/convert_geojson.py src/download/beijing_poi.geojson data/beijing_poi.json ``` 说明: - `src/download/1.py` 会生成 `src/download/beijing_poi.geojson` - `convert_geojson.py` 会将 GeoJSON 转为系统可直接读取的点数据 JSON - 你也可以传入自己的 GeoJSON 文件路径与输出路径 ## 常用命令 ```bash # 指定配置文件 python main.py --config config/config.yaml # 手动触发训练(Stage3) python main.py --train # Stage4 指定迭代轮数 python main.py --query "推荐早餐店选址" --iter 3 # 自定义 Web 监听地址 python main.py --web --host 127.0.0.1 --port 8080 ``` ## 数据格式 输入数据为 JSON 数组,每条记录至少包含: - `id`:实例 ID - `type`:POI 类型 - `x`、`y`:空间坐标 ```json [ {"id": 1, "type": "A", "x": 24, "y": 14}, {"id": 2, "type": "B", "x": 13, "y": 3} ] ``` ## 运行流程 1. 用户输入自然语言查询 2. LLM 解析查询并提取挖掘参数(必要时执行 Stage0) 3. 执行 Co-location 模式挖掘 4. 基于 Stage3/Stage4 进行反馈学习与迭代优化 5. 返回结果与解释,并持续收集反馈 ## 实验与可视化 ```bash # 运行实验 python run_experiment.py # 使用偏好加权配置运行实验 python run_experiment.py --preference-weighted-config config/config_preference_weighted.yaml # 绘制图表 python run_plotter.py # 根据指定指标文件绘图 python run_plotter.py --preference-weighted results/metrics_preference_weighted.json --output results/learning_curve.png ``` 实验输出默认位于 `results/`。 ## 文档 - Web 说明:`WEB_README.md`(与 `document/WEB_README.md` 内容可保持一致) - 模块说明:`src/*/README.md` - 总览:`document/项目说明.md` - 其他:`document/执行流程文档.md`、`document/对比算法说明.md` 等 ## 故障排查 - 主 LLM 加载失败:检查 `model_name_or_path`、显存与 `requirements.txt` 依赖 - Sentence Transformer 下载失败或 SSL 报错:检查网络、使用镜像,或改为本地路径(见上文「嵌入模型」) - 可选:使用 LlamaFactory 加载时若提示 `datasets` 版本不符,可按日志调整或设置 `DISABLE_VERSION_CHECK=1` - Web 启动失败:检查 `fastapi`、`uvicorn` 与端口占用 - 结果异常:检查 `data_path` 与 JSON 格式 - 日志位置:`logs/mvp.log` ## 许可 本项目以 [MIT License](LICENSE) 发布:允许自由使用、修改与商业使用,但须在分发中保留原始版权声明与许可全文。可将 `LICENSE` 中的 `iCoLoc authors` 替换为你的姓名或单位。 --- ## English iCoLoc is an interactive co-location pattern mining system powered by local LLMs. It supports natural-language query understanding, pattern mining, feedback-driven preference learning, and iterative optimization. ## Features - Natural-language query to mining-parameter translation - Co-location pattern mining and rule generation - Stage0: intent understanding and cold-start preference initialization - Stage3: feedback-based preference learning and re-ranking - Stage4: multi-round interactive refinement - Both CLI and Web interfaces ## Tech Stack - Python 3.9+ - FastAPI + Uvicorn (Web service) - PyTorch / Transformers (main LLM inference) - Sentence Transformers and NumPy (Stage0/Stage3 sentence embeddings; separate from the main LLM; default MiniLM-style model from Hugging Face) ## Project Structure Top-level layout below matches **what is tracked in version control**. Local data, caches, logs, and extra scripts are usually listed in `.gitignore` and are not shown in the tree. ```text iCoLoc/ ├── main.py ├── run_experiment.py ├── run_plotter.py ├── README.md ├── LICENSE ├── WEB_README.md ├── requirements.txt ├── config/ ├── data/ ├── document/ ├── results/ ├── memory/ ├── models/ ├── logs/ └── src/ ├── llm/ ├── preference/ ├── core/ ├── controller/ ├── learning/ ├── embedding/ ├── memory/ ├── web/ ├── experiment/ └── download/ ``` If you only maintain the Web guide under `document/`, copy or symlink it to `WEB_README.md` at the repo root to match the layout above. ## Quick Start ### 1) Install dependencies ```bash cd /home/ubuntu/codebase/yexijia/保研/iCoLoc pip install -r requirements.txt ``` ### 2) Configure model and data Edit `config/config.yaml`: ```yaml model: model_name_or_path: "/path/to/your/model" adapter_name_or_path: null template: "qwen" data: data_path: "./data/beijing_poi.json" ``` #### Main LLM weights: Ollama vs `config.yaml` iCoLoc loads the main model from a **local directory** via `transformers` (optionally LlamaFactory). That directory must be in **Hugging Face layout** (`config.json`, tokenizer files, etc.). Set `model_name_or_path` to that folder. If you use **[Ollama](https://ollama.com/)** to pull models locally, install Ollama and run (names follow [Ollama Library](https://ollama.com/library)): ```bash # Example; tags must match what you use on the HF side ollama pull qwen2.5:7b ``` **Note:** Ollama stores files under its own tree (e.g. `~/.ollama/models` on Linux) for `ollama run`. **Do not** point `model_name_or_path` at that path. For iCoLoc, prepare a separate **Hugging Face–format** checkpoint, for example: ```bash pip install -U huggingface_hub huggingface-cli download Qwen/Qwen2.5-7B-Instruct --local-dir /path/to/qwen2.5-7b-instruct ``` You can also download from ModelScope or other mirrors. Set `model.model_name_or_path` to that directory (absolute path recommended) and match `template` to the family (e.g. `qwen` for Qwen). Ollama and HF copies can coexist: Ollama for CLI chats, HF folder for this project. The main LLM and the **Sentence Transformer** are still separate: the latter is used in Stage0/Stage3 via `src/learning/embedder.py` (default `sentence-transformers/all-MiniLM-L6-v2`; first run needs network; cache under `~/.cache/huggingface/hub`). Use a mirror (`HF_ENDPOINT`) or a local path in `embedder.py` if needed. ### 3) Run ```bash # CLI interactive mode python main.py # One-shot CLI query python main.py --query "Find 3-order patterns with high confidence" # Web mode (default port 8000; default bind 0.0.0.0 — reachable from LAN via host IP) python main.py --web ``` ### 4) Dataset Download and Conversion (Optional) If you want to generate your own POI dataset, use scripts in `src/download`: ```bash # Go to project root cd /home/ubuntu/codebase/yexijia/保研/iCoLoc # 1) Install dependency for downloading data (skip if already installed) pip install osmnx # 2) Download Beijing POI GeoJSON (example script) python src/download/1.py # 3) Convert GeoJSON to iCoLoc JSON format (id/type/x/y) python src/download/convert_geojson.py src/download/beijing_poi.geojson data/beijing_poi.json ``` Notes: - `src/download/1.py` generates `src/download/beijing_poi.geojson` - `convert_geojson.py` converts GeoJSON into the point-based JSON used by iCoLoc - You can also provide your own input GeoJSON and output JSON paths ## Common Commands ```bash # Use custom config python main.py --config config/config.yaml # Trigger training manually (Stage3) python main.py --train # Set iteration rounds for Stage4 python main.py --query "Recommend locations for breakfast stores" --iter 3 # Customize Web host and port python main.py --web --host 127.0.0.1 --port 8080 ``` ## Input Data Format Input should be a JSON array. Each record should include: - `id`: instance ID - `type`: POI type - `x`, `y`: spatial coordinates ```json [ {"id": 1, "type": "A", "x": 24, "y": 14}, {"id": 2, "type": "B", "x": 13, "y": 3} ] ``` ## Pipeline Overview 1. User submits a natural-language query 2. LLM parses intent and mining parameters (Stage0 when needed) 3. Co-location mining is executed 4. Stage3/Stage4 updates ranking with user feedback and iterations 5. Results and explanations are returned, with continuous feedback collection ## Experiments and Plots ```bash # Run experiments python run_experiment.py # Run experiments with preference-weighted config python run_experiment.py --preference-weighted-config config/config_preference_weighted.yaml # Draw plots python run_plotter.py # Draw plots from a specific metrics file python run_plotter.py --preference-weighted results/metrics_preference_weighted.json --output results/learning_curve.png ``` Outputs are saved to `results/` by default. ## Documentation - Web guide: `WEB_README.md` (may mirror `document/WEB_README.md`) - Module docs: `src/*/README.md` - Overview: `document/项目说明.md` - More: `document/执行流程文档.md`, `document/对比算法说明.md`, etc. ## Troubleshooting - Main LLM load fails: verify `model_name_or_path`, GPU memory, and `requirements.txt` - Sentence Transformer download or SSL errors: check network, use a mirror, or switch to a local path in `embedder.py` (see “Sentence Transformer” above) - Optional LlamaFactory: if `datasets` version warnings appear, adjust per logs or set `DISABLE_VERSION_CHECK=1` - Web startup fails: verify `fastapi`, `uvicorn`, and port availability - Unexpected results: verify `data_path` and JSON format - Logs: `logs/mvp.log` ## License Released under the [MIT License](LICENSE). You may use, modify, and redistribute this software, including for commercial purposes, provided you retain the copyright notice and license text. Replace `iCoLoc authors` in `LICENSE` with your name or organization if you wish.