User:Dan Bron/WgetAndProxies

From J Wiki
Jump to navigation Jump to search

using wget at work

J scripts often use wget for windows to fetch internet resources (for example, JAL). Unfortunately, at work, you may not have a direct connection to the internet: your connections may be proxied (for auditing or censoring purposes).

problem

Because of that, a simple call to wget (e.g. wget http://www.google.com), will fail without a little additional work.

solution

To get wget functioning, you have to inform of your HTTP proxy and provide it with authentication credentials. The simplest method to do this is to set the environment variable HTTP_PROXY to the URI of your proxy. If this is done properly, wget will use it & function normally.

Another option is to use wget's configuration file, .wgetrc. This guide describes the environment variable method, but for details on using .wgetrc, please see the wget manual sections dedicated to the startup file and using proxies.

specifics

In short, to get wget to work transparently, you can set your user environment variable HTTP_PROXY to a URI in the format http://__user__:__password__@__host__:__port__.

There are two steps to deriving this URI:

A. Discovering what your proxy is (i.e. __host__ and __port__) A. Authenticating yourself (or wget running as you) to the proxy (i.e. __user__ and __password__).

In most setups, solving (A) is easy: simply check Internet Explorer's connection settings. In my case, corporate policy has removed that tab from the Internet Options dialog, so I had to use a different method.

As a sub-problem of (A): your proxy may not be static; it may be derived from a PAC file. For example, instead of a URI like http://corporate_proxy:8080/, you may see a URI like http://ip.add.re.ss:8080/path/filename.pac. If this is your situation, you can work through it.

With the HTTP proxy in hand, all that's left is step (B), authentication. Internet Explorer uses NTLM authentication, which means it uses your LAN credentials. So simply substitute your Windows username and password for __user__ and __password__ in the URI (i.e. in the userinfo component).

caveats

Other software may rely upon these environment variables. That can be good, because the same configuration can allow other tools to connect to the internet: for example, see svn and proxies. But care must be taken.

Further, if your proxy or credentials change, you must remember to update your HTTP_PROXY environment variable. This can occur, for example, when you change your Windows password, or if your administrator changes your network configuration. If wget stops working, this environment variable is the first thing to check.

appendix

how to discover the URI of your HTTP proxy

If it works, the simplest method is to check Internet Explorer's LAN Settings. To do this, open Internet Explorer, and select Tools>Internet Options:

Ie internet-options.png

then select the Connections tab ( if you don't see this tab, you'll have to do a little extra work):

Ie connections-tab.png

then press the LAN Settings button:

Ie lan-settings.png

If the Use a proxy server for your LAN box is checked, your proxy's __host__ and __port__ are specified, respectively, in the Address and Port fields of the Proxy Server section. You can move on.

Otherwise, if the Use automatic configuration settings box is checked, you must do a little more investigating. Download the PAC file specified in the Address field of the Automatic configuration section. You can do this by simply entering this URI into Internet Explorer's address bar. Then read through the script (whose syntax and logic should be relatively simple), to derive the URI of the proxy host you should be using. You can now move on.

However, this means if your LAN administrator ever changes the PAC file (or your path through the PAC file's logic changes), you'll have go through this exercise again and update your HTTP_PROXY user environment variable.

If you couldn't even get this far because you don't see a Connections tab (as I didn't), your company has removed it by policy. One recourse left to you is to get this information from the same place Internet Explorer is: the registry.

Specifically, you need the value of the key AutoConfigURL under HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Internet Settings (or possibly HKEY_LOCAL_MACHINE instead of HKEY_CURRENT_USER).

There are many ways to get this. The easiest (if company policy hasn't prevented it) is to go to Start>Run, enter regedit and press the OK button. A window like this should appear:

Reged root.png

Using the tree control in the left panel, navigate to the "folder" HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Internet Settings and check the value of the key AutoConfigURL (if you can't get to this folder, or the key is missing or has no value, try starting at HKEY_LOCAL_MACHINE instead of HKEY_CURRENT_USER).

Once again, this may be a "PAC" file, so you'll have to do a little more investigating. Once you have, you can move on.

If you can't start regedit, try using this script instead; it will attempt to interrogate the registry to get the value of this key.

After all this research, you should now have the __host__ and __port__ for your proxy server (the __port__ is usually 8080). Your HTTP proxy's base URI will then be http://__host__:__port__. If your proxy doesn't require authentication, this is all you need, and you can move on.

If proxy does require authentication, you will need to enhance this URI a bit by adding the userinfo component, so that in the end it will look like http://__user__:__password__@__host__:__port__ (where __user__ and __password__ are probably your LAN credentials).

This means you'll have to remember to change my HTTP_PROXY environment variable every time you change your password (which is often in my case, because it expires by policy). However, I don't think storing your password in cleartext in a user env. var. has security implictions. This is because you will set a USER, not SYSTEM, env. var., and if someone has access to your user environment, he has or can obtain your credentials, anyway)

Now that you have your proxy URI, you must set your HTTP_PROXY user environment variable to it. See the next section.

how to set environment variables

In order to set a user environment variable so that it persists (after a reboot), you must go to Start>Settings>Control Panel>System and select the Advanced tab:

Cpl advanced-tab.png

Then press the Environment Variables button:

Cpl env-vars.png

and add a new variable to the User variables for __user__ section by pressing the New button in the __top__ portion:

Cpl new-user-env-var.png

(the System variables section would work, too, but that would be insecure if you're going to store your password in the URI)

see also