上海市二手房成交数据监控

2025-03-19 · 3 分钟 · python prometheus linux ·

最近一两年比较关注上海市二手房市场成交量信息，和股市的成交量类似，可以辅助判断房价走势。

以前年轻，一心只知道上班学技术搞钱，有一次和同事聊天，其置换的房子价值800w，普通人打工上班人听到这个可以说相当震惊😲，心想一辈子都不可能达到。这种踩中时代红利的机会，可遇不可求。回想起来，当我想很想买房子的时候，外地单身政策限制；熬到落户能买时，房价已一路阴跌！结果是只能观望，毕竟囊中羞涩掏不出800w，关注总是可以的，春江水暖鸭先知，不敏锐把握机会，来世还要做苦命打工人。而且，去年炒股赚钱比上班多，这种感觉非常奇妙，班还是要上的，除非你真的非常富裕。

题外话不多说，每天上海市都会准时公布二手房成交量数据，网址是这个https://www.fangdi.com.cn/old_house/old_house.html，页面虽丑但是很权威，不服的话，政府可以让你多缴税。每天中午，我都会打开瞄一眼。天长日久，觉得这样历史信息无法查看，每次要打开这个网站也挺麻烦的。所以，想用程序自动采集数据，放到Prometheus中监控起来，甚至可以设定报警条件，比如日成交量大于1500套报警。

下面是我的操作过程记录：

1.自动采集程序

代码如下：

 1import requests
 2from flask import Flask
 3from prometheus_client import Gauge, generate_latest, CollectorRegistry
 4
 5app = Flask(__name__)
 6registry = CollectorRegistry()
 7# 定义一个 Gauge 指标来存储二手房销量数据
 8sell_count_gauge = Gauge('shanghai_second_hand_house_sell_count',
 9                         'Shanghai Second Hand House Sell Count', registry=registry)
10
11
12@app.route('/metrics')
13def metrics():
14    try:
15        # 创建一个会话对象，用于管理 cookie
16        session = requests.Session()
17
18        # 先访问主页面以获取初始 cookie
19        main_url = 'https://www.fangdi.com.cn/old_house/old_house.html'
20        main_headers = {
21            'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36'
22        }
23        session.get(main_url, headers=main_headers)
24
25        # 定义请求的 URL
26        api_url = 'https://www.fangdi.com.cn/oldhouse/getSHYesterdaySell.action'
27
28        # 设置请求头
29        headers = {
30            'Accept': 'application/json, text/javascript, */*; q=0.01',
31            'Accept-Language': 'en-US,en;q=0.9',
32            'Connection': 'keep-alive',
33            'Content-Type': 'application/x-www-form-urlencoded;charset=utf-8',
34            'Origin': 'https://www.fangdi.com.cn',
35            'Referer': 'https://www.fangdi.com.cn/old_house/old_house.html',
36            'Sec-Fetch-Dest': 'empty',
37            'Sec-Fetch-Mode': 'cors',
38            'Sec-Fetch-Site': 'same-origin',
39            'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36',
40            'X-Requested-With': 'XMLHttpRequest',
41            'sec-ch-ua': '"Not:A-Brand";v="24", "Chromium";v="134"',
42            'sec-ch-ua-mobile': '?0',
43            'sec-ch-ua-platform': '"Linux"'
44        }
45
46        # 使用会话对象发送 POST 请求
47        response = session.post(api_url, headers=headers, data="")
48        response.raise_for_status()
49
50        # 解析 JSON 响应
51        data = response.json()
52        sell_count = data.get('sellcount')
53
54        if sell_count is not None:
55            # 设置 Gauge 指标的值
56            sell_count_gauge.set(sell_count)
57            # 生成 Prometheus 格式的指标数据
58            metrics_data = generate_latest(registry).decode('utf-8')
59            return metrics_data, 200, {'Content-Type': 'text/plain; version=0.0.4; charset=utf-8'}
60        else:
61            return "未找到成交套数信息", 500
62    except requests.RequestException as e:
63        return f"请求出错: {e}", 500
64    except ValueError:
65        return "无法解析响应的 JSON 数据", 500
66
67
68if __name__ == '__main__':
69    app.run(host='0.0.0.0', port=8000)

上面的程序非常简单，访问得到cookie后模拟浏览器拿到页面上的目标数据，注意接口url：https://www.fangdi.com.cn/oldhouse/getSHYesterdaySell.action，可能以后会变动，万一对方改版了呢？万一真的改版了，用浏览器调试工具找下修改即可。新手可能会问，那么多header头，你怎么知道的？浏览器里面copy as curl，再让AI帮你转成Requests库需要的样子，AI适合干这样的活，而且不怎么出错，效率高。

上述程序不懂的地方，也可以让AI解释，反正善于利用工具解决问题，搞到钱才是王道。

2. 部署Prometheus自动采集数据

上面的代码其实已经生成指标数据了，部署Prometheus到本地电脑，网上有大把教程，就不赘述了。

上述程序，也就是采集客户端也部署在本机，我是用supervisd接管的：

 1➜  ershoufang sudo supervisorctl status shanghai
 2[sudo] password for mephisto:
 3shanghai                         RUNNING   pid 781, uptime 2:12:17
 4➜  ershoufang cat /etc/supervisor.d/shanghai.ini
 5[program:shanghai]
 6command=/home/mephisto/github/ershoufang/.venv/bin/python shanghai.py  # 替换为你的 Flask 程序的实际路径
 7directory=/home/mephisto/github/ershoufang    # 替换为你的 Flask 程序所在的目录
 8autostart=true
 9autorestart=true
10stderr_logfile=/var/log/app.err.log
11stdout_logfile=/var/log/app.out.log

确认客户端(采集程序)和服务端(Prometheus)都跑起来了：

1➜  ershoufang sudo ss -lntp | grep -E  "python|prometheus"
2LISTEN 0      128          0.0.0.0:8000       0.0.0.0:*    users:(("python",pid=781,fd=3))
3LISTEN 0      4096               *:9090             *:*    users:(("prometheus",pid=796,fd=6))

没有问题，端口正常监听，也可以用浏览器访问对应的服务地址试试看。

3. 设置采集频率

这一步就是配置prometheus.yml文件

 1➜  ershoufang sudo cat /etc/prometheus/prometheus.yml
 2# my global config
 3global:
 4  scrape_interval: 30m # Set the scrape interval to every 15 seconds. Default is every 1 minute.
 5  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
 6  # scrape_timeout is set to the global default (10s).
 7
 8# Alertmanager configuration
 9alerting:
10  alertmanagers:
11    - static_configs:
12        - targets:
13          # - alertmanager:9093
14
15# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
16rule_files:
17  # - "first_rules.yml"
18  # - "second_rules.yml"
19
20# A scrape configuration containing exactly one endpoint to scrape:
21# Here it's Prometheus itself.
22scrape_configs:
23  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
24  - job_name: "prometheus"
25
26    # metrics_path defaults to '/metrics'
27    # scheme defaults to 'http'.
28
29    static_configs:
30      - targets: ["localhost:9090"]
31
32  - job_name: 'shanghai_second_hand_house'
33    scrape_interval: 1h
34    static_configs:
35      - targets: ['localhost:8000']

最后的那个job： shanghai_second_hand_house 就是要添加的，采集间隔scrape_interval: 1h，一个小时拿一次数据，频率太高，给你定个扰乱社会秩序，拘留5天可不太好。不浪费公共资源，争做守法好市民，采多了，那天关了岂不是给大家添麻烦，你应该懂的。

4. 结果核对

浏览器打开：http://localhost:9090，截图如下所示：

注意切换到Graph，采用Stacked的图表模式查看，其它不会的设置看上面截图。

结果中同一天可能有多个同值采样点，问题不大，毕竟一小时采集一次嘛，能看出每天成交量就可以了。

可以看出最近工作日每天成交800套左右，周末成交量显著上升可达1400套，去年的时候，成交量通常是400-600的样子，记不住了，所以才想把数据保存下来。

我设置的保留3年的数据:Storage retention 3y，不懂怎么设置的搜索下，懒得敲字了。

当然，你也可以直接发信息到你的微信或者邮箱，不放在Prometheus里面，只是我个人觉得这样对我比较合适而已。

如果你还有疑问，可以关注微信公众号留言。另外上述接口还包含成交面积和成交总价，也就是可以算出均价，可以修改下程序，投递多个指标。

最后修改于： Wednesday, March 19, 2025

版权申明:

未标注来源的内容全部为原创，未经授权请勿转载（因转载后排版往往错乱、内容不可控、无法持续更新等）;
非营利为目的，演绎本博客任何内容，请以'原文出处'或者'参考链接'等方式给出本站相关网页地址(方便读者)。