Many schools,workplaces, and colleges restrict websites and online services that they provide. This is done with a specialized proxy, called a content filter (both commercial and free products available), or by a cache-extension protocol such as ICAP, which allows the plug-in extensions to a storage architecture cached open.
Applications submitted to the open internet must first pass through a proxy filter output. Web filtering company offers a database of URL patterns (regular expressions) with associated content attributes. This database is updated weekly by subscription throughout the site, such as a virus filter subscription. The administrator instructs the web filter to ban broad categories of content (such as sports, pornography, online shopping, gambling, or social networking). Requests that match a banned URL pattern are rejected immediately.
Assuming the requested URL is acceptable, the content is brought by the proxy. At this point a dynamic filter may be applied on the return path. For example, JPEG files may be blocked based on Fleshtones parties or dynamically filters the language could detect the language. If the content is rejected an HTTP error is returned to seek and nothing is cached.
Most web filtering companies use a robot to crawl the Internet to scale that assesses the probability that a content is a certain type (ie “This content is 70% chance of porn, 40% chance sports, and 30% chance of news “could be the result for a website). The resulting database are corrected by manual labor based on complaints or known flaws in the content matching algorithms.
Filtering Web proxies are not capable of looking inside Secure Sockets HTTP transactions, assuming the trust chain of SSL / TLS has been handled. As a result, users who wish to avoid web filtering normally search the internet for an open and anonymous HTTPS transparent proxy. Next, set your browser to proxy all requests through the Web filter to this anonymous proxy. These applications will be encrypted with https. Web Filter can not distinguish these transactions from, say, a legitimate access to a personal finance website. Thus, content filters are only effective against unsophisticated users.
As mentioned above, / TLS SSL chain-of-trust is based on trusted root certificate authorities, working in an environment where the customer is managed by the organization, trust can be given to a root certificate which public key is known by the proxy. In particular, a root certificate generated by the proxy is installed in the form of lists of CA by the IT staff. In such scenarios, the analysis of proxy for the contents of an SSL / TLS transaction is possible. The proxy is effective functioning of an attack “man-in-the-middle, allowed by customer trust a root certificate has representation.
A special case of web proxies are “CGI proxies.” These are websites that allow a user to access a site through them. Usually use PHP or CGI to implement the proxy functionality. These types of proxy servers are often used to access web sites blocked by corporate or school proxies. Since it also hides the user’s own IP address for web sites that access via the proxy, it is sometimes used to gain a degree of anonymity, called “Proxy Prevention.”