用nginx和heroku实现一个免费的http proxy

用nginx做forward proxy,借助于免费又支持SSL的heroku app,实现http proxy。

首先申请一个免费的heroku app做测试,heroku会分配一个域名,比如xxx.herokuapp.com,同时还支持SSL访问,这个是关键。

heroku的app不能直接用作代理,因为访问heroku app大概路径是,
xxx.herokuapp.com解析到了heroku的前端nginx集群,然后再反向代理到自己的app。nginx会检查Host是否是heroku的app,不是的话会报404 Object Not Found。

思路是,

把要访问的网站嵌入到url里,比如http://xxx.herokuapp.com/p/www.google.com,然后我们在app里去请求www.google.com,然后把结果返回(包括response headers),这样我们访问http://xxx.herokuapp.com/p/www.google.com返回了google的内容!可以把这个app强化一下,处理一下refer,url等,完全就是heroku app的壳,里面套了其他网站的内容。

为了偷懒,简化这里的处理,可以在本地用nginx做一个forward proxy,把header里的host rewrite到url里。(开始是用flask写的一个程序做这个事情,后来发现还是nginx简单)

最后在浏览器里配置一下http proxy就行了。

PS,

heroku的免费app比较坑爹的地方是,如果一段时间inactive后,会自动关闭。

最后奉上nginx的配置和示例代码:

nginx server段配置如下,其他省略,

1
2
3
4
5
6
7
server {
listen 8080;
location / {
resolver 223.5.5.5;
proxy_pass https://xxx.herokuapp.com/p/$http_host$uri$is_args$args;
}
}

app.py用的flask(网络上搜索到的一个例子改了一点,用其他也是OK的)

cat Procfile

1
web: python app.py

cat requirements.txt(flask已经有新版本了)

1
2
3
4
5
Flask==0.9
Jinja2==2.6
Werkzeug==0.8.3
wsgiref==0.1.2
requests==2.3.0

cat app.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
"""
A simple proxy server. Usage:
http://hostname:port/p/(URL to be proxied, minus protocol)
For example:
http://localhost:8080/p/www.google.com
"""
import os
from flask import Flask, render_template, request, abort, Response, redirect
from werkzeug.serving import WSGIRequestHandler
import requests
import logging
app = Flask(__name__.split('.')[0])
logging.basicConfig(level=logging.INFO)
LOG = logging.getLogger("main.py")
@app.route('/<path:url>')
def root(url):
LOG.info("Root route, path: %s", url)
# If referred from a proxy request, then redirect to a URL with the proxy prefix.
# This allows server-relative and protocol-relative URLs to work.
proxy_ref = proxy_ref_info(request)
if proxy_ref:
redirect_url = "%s/%s%s" % (proxy_ref[0], url, ("?" + request.query_string if request.query_string else ""))
LOG.info("Redirecting referred URL to: %s", redirect_url)
return proxy(redirect_url)
abort(404)
@app.route('/p/<path:url>')
def proxy(url):
"""Fetches the specified URL and streams it out to the client.
If the request was referred by the proxy itself (e.g. this is an image fetch for
a previously proxied HTML page), then the original Referer is passed."""
r = get_source_rsp(url)
LOG.info("Got %s response from %s",r.status_code, url)
headers = dict(r.headers)
if headers.has_key('transfer-encoding'):
del(headers['transfer-encoding'])
if headers.has_key('content-encoding'):
del(headers['content-encoding'])
return Response(r.content, headers = headers)
def get_source_rsp(url):
url = 'http://%s' % url
LOG.info("Fetching %s", url)
# Pass original Referer for subsequent resource requests
proxy_ref = proxy_ref_info(request)
headers = { "Referer" : "http://%s/%s" % (proxy_ref[0], proxy_ref[1])} if proxy_ref else {}
# Fetch the URL, and stream it back
LOG.info("Fetching with headers: %s, %s", url, headers)
return requests.get(url, stream=False, params = request.args, headers=headers)
def split_url(url):
"""Splits the given URL into a tuple of (protocol, host, uri)"""
proto, rest = url.split(':', 1)
rest = rest[2:].split('/', 1)
host, uri = (rest[0], rest[1]) if len(rest) == 2 else (rest[0], "")
return (proto, host, uri)
def proxy_ref_info(request):
"""Parses out Referer info indicating the request is from a previously proxied page.
For example, if:
Referer: http://localhost:8080/p/google.com/search?q=foo
then the result is:
("google.com", "search?q=foo")
"""
ref = request.headers.get('referer')
if ref:
_, _, uri = split_url(ref)
if uri.find("/") < 0:
return None
first, rest = uri.split("/", 1)
if first in "pd":
parts = rest.split("/", 1)
r = (parts[0], parts[1]) if len(parts) == 2 else (parts[0], "")
LOG.info("Referred by proxy host, uri: %s, %s", r[0], r[1])
return r
return None
@app.route('/')
def hello():
return 'Hello World!'
if __name__ == '__main__':
# Bind to PORT if defined, otherwise default to 5000.
port = int(os.environ.get('PORT', 5000))
WSGIRequestHandler.protocol_version = "HTTP/1.1"
app.run(host='0.0.0.0', port=port, threaded=True)