4th Mar 2003 [SBWID-6042]
COMMAND
Log corruption via specially crafted reverse DNS data
SYSTEMS AFFECTED
WebExpert, LoganPro, Iplanet Log Analyzer, WebTrends, SurfStats,
WebLogExpert
PROBLEM
In Hugo Vázquez Caramés & Toni Cortés Martínez of Infohacking
Research 2001-2003, whitepaper :
ILLC - Inverse Lookup Log Corruption
We are using a technique that we have called “ILLC” (Inverse Lookup Log
Corruption) that allows us to corrupt the logs generated by many web
servers that are doing inverse address resolution.
Impact of this technique:
- “IP spoofing” on the logs
- Code execution (XSS) on boxes that are running log analyzers (web servers that have buit-in report analisys tools,etc.)
On some specific scenarios, we have been able to hide the entire http
request to the log viewer.
Most of the actions were possible because of the lack of filtering when
parsing host names between different applications.
Related RFC´s about Internet Host Names convention:
RFC 952:
“(…)
1. A "name" (Net, Host, Gateway, or Domain name) is a text string up
to 24 characters drawn from the alphabet (A-Z), digits (0-9), minus sign (-
), and period (.). Note that periods are only allowed when they serve to
delimit components of "domain style names". (See RFC-921, "Domain Name
System Implementation Schedule", for background). No blank or space
characters are permitted as part of a name. No distinction is made between
upper and lower case. The first character must be an alpha character. The
last character must not be a minus sign or period.... Single character
names or nicknames are not allowed....
(…)”
RFC 1034:
“(…)
3.5. Preferred name syntax
... The labels must follow the rules for ARPANET host names. They must
start with a letter, end with a letter or digit, and have as interior
characters only letters, digits, and hyphen. There are also some
restrictions on the length. Labels must be 63 characters or less. (...)”
RFC 1123:
“(…)
2.1 Host Names and Numbers
The syntax of a legal Internet host name was specified in RFC-952 [DNS:4].
One aspect of host name syntax is hereby changed: the restriction on the
first character is relaxed to allow either a letter or a digit. Host
software MUST support this more liberal the RFCs. Note that under BIND 8,
you may need to add "check-names master ignore" to the zone definition
when defining these names.(…)”
RFC 2181:
“(…)
11. Name syntax
Occasionally it is assumed that the Domain Name System serves only the
purpose of mapping Internet host names to data, and mapping Internet
addresses to host names. This is not correct, the DNS is a general (if
somewhat limited) hierarchical database, and can store almost any kind of
data, for almost any purpose.
The DNS itself places only one restriction on the particular labels that
can be used to identify resource records. That one restriction relates to
the length of the label and the full name. The length of any one label is
limited to between 1 and 63 octets. A full domain name is limited to 255
octets (including the separators).(…)”
Independently of what should be the legal host name syntax, it seems that
operating systems allows host names with arbitrary characters.
To succesfully attack a server with “ILLC” technique is mandatory that web
server/log analyzer,etc., will be doing inverse address resolution and
that the attacker could control in any way the responses to those inverse
lookup requests.
---------------------------------------------------------
Exploiting web server/log analyzers through “ILLC”
---------------------------------------------------------
Examples of attacks:
-Log “IP Spoofing”
(exploited succesfully on Apache 2.0.44 on Windows/Linux, and Iplanet 6
on Windows) Scenario: a machine with a host name as "123.123.123.123"
makes a request to an Apache server. If the server dosn`t generate any
error, on the access log you will see an access request from a client
called "123.123.123.123", what apparently seems to be a valid request
from a client that server was unable to resolve to a host name. So the
real IP wouldn't appear in the access log file.
access.log
123.123.123.123 - - [28/Feb/2003:10:39:01 +0100] "GET / HTTP/1.1" 200
1786 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
123.123.123.123 - - [28/Feb/2003:10:39:46 +0100] "GET /badrequest.html
HTTP/1.1" 404 294 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
If the request produces some error, you will see an entry in the error
log file were you could see the real IP, although the web server has
the inverse lookup activated.
error.log
[Fri Feb 28 10:39:46 2003] [error] [client 172.26.50.45] File does not
exist: C:/Archivos de programa/Apache Group/Apache2/htdocs/badrequest.html
So, while there aren’t errors, the real IP is not showed. This can lead
in a complete anonymous http access for a client in a usual web surfing
activity, that is, if there are not broken links,etc.
In the case of Iplanet 6, the real IP wouldn’t appear in the “access”
preview (see link below):
http://www.infohacking.com/INFOHACKING_RESEARCH/Our_Advisories/ILLC/cap-access-log1.gif
Neither in the “errors” preview (see link below):
http://www.infohacking.com/INFOHACKING_RESEARCH/Our_Advisories/ILLC/cap-errors.gif
-CODE INYECTION
(Succesfully exploited on Apache 2.0.44 on Windows/Linux, on IIS 6.0
and Iplanet 6 on Windows)
Scenario: a machine with a hostname as
“<scrip>alert(‘a’)</script>” that makes an HTTP request leaves
javascript code on the log. When generating a report, with some log
analyzers (that show results in html), the script will be executed.
*Note: in IIS 6.0 case we needed to restrict access on webserver by domain
name in order to force inverse lookup resolution.
*Note2: in the Iplanet case we needed to simulate a FQDN client host name
like this:
“<scrip>alert(‘a’)</script>.infohacking.com”.
You can also set a host name were the script is only part of the entire
string label:
“nop<scrip>alert(‘a’)</script>.infohacking.com”
so when html formatted it will appear as a valid domain name:
“nop.infohacking.com.”
Meanwhile the script will be executed…
Some log analyzers proved to be vulnerable to "ILLC": WebTrends (see
link below):
http://www.infohacking.com/INFOHACKING_RESEARCH/Our_Advisories/ILLC/cap-webtrends-illc.gif
SurfStats (see link below):
http://www.infohacking.com/INFOHACKING_RESEARCH/Our_Advisories/ILLC/cap-surfstats_loganalizer.gif
WebLogExpert (see link below):
http://www.infohacking.com/INFOHACKING_RESEARCH/Our_Advisories/ILLC/weblogexpert_illc.gif
And probably many more…
Iplanet comes with a buil-in tool to generate html reports based on
access and error logs. This tool is part of the administration web
interface. Moreover, Iplanet log analyzer always uses a web broser to
show the results of the report, although the user selects “Only text
output”, so it will be always exploitable.
Iplanet Log Analyzer (HTML report, see link below):
http://www.infohacking.com/INFOHACKING_RESEARCH/Our_Advisories/ILLC/cap-report-html.gif
Iplanet Log Analyzer (“txt” report, see link below):
http://www.infohacking.com/INFOHACKING_RESEARCH/Our_Advisories/ILLC/cap-report-text.gif
On the other hand we have to notice that the access log previewer(“View
Access Log”) in the Iplanet web interface is doing some kind of
filtering on some characters (for example <>).
Iplanet “View Access Log” (see link below):
http://www.infohacking.com/INFOHACKING_RESEARCH/Our_Advisories/ILLC/cap-iplanet-filtra.gif
-HIDING REQUESTS (Iplanet 6 on Windows)
In the specific case of Iplanet 6, we coul realise that there`s a way
to trick the server on not showing the request in the log preview
(“View Acces Log” and “View Error Log”). The requests from boxes whose
host name begins with “format=” will not be showed, that is, those
requests still are visible in the access and error log files, but they
would be “invisible” from the built-in access and error log viewer of
the administration web interface…(“Last 25 accesses to…”). As an
example we made requests from a box with this host name:
“format=.infohacking.com”, and we realize that we can see the request
in the access log:
format=%Ses->client.ip% - %Req->vars.auth-user% [%SYSDATE%] "%Req-
>reqpb.clf-request%" %Req->srvhdrs.clf-status% %Req->srvhdrs.content-
length% "%Req->headers.referer%" "%Req->headers.user-agent%" %Req-
>reqpb.method% %Req->reqpb.uri% %Req->reqpb.query% "%Req->reqpb.protocol%"
%vsid%
format=.winmat.com - - [28/Feb/2003:10:22:25 +0100] "GET /evilrequest.html
HTTP/1.1" 404 292 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
GET /evilrequest.html - "HTTP/1.1" https-script.winmat.com
But on the “View Access Log” nothing is showed (see link below):
http://www.infohacking.com/INFOHACKING_RESEARCH/Our_Advisories/ILLC/cap-NOlog-iplanet.gif
We suppose that server is processing the first part of the host name
string “format=” as it would be the directive that sets the log format,
and the rest of the string is not recognized as valid format, so
nothing is showed.
Combining the possibility of hiding a request and the Cross Site
Scripting technique we could execute scripts on the machine that runs
the Report Generator of the Iplanet Web Server in a “stealth” way. We
have done this establishing a host name like this:
“format=<script>alert(document.cookie)</script>.infohacking.com”
(See link below):
http://www.infohacking.com/INFOHACKING_RESEARCH/Our_Advisories/ILLC/cap-iplanet-cookie.gif
Many more evil actions can be done... it only depends on the attacker's
imagination.
We haven’t checked “ILLC” on other daemons as ftp, smtp, or firewalls,
IDSs, etc. We think that probably this technique could be used in the
same way in other scenarios.
---------------------------------------------------------
Exploiting http headers for log corruption
---------------------------------------------------------
Controlling inverse lookup responses is not always possible for the
attacker. We tried to figure out another, more generic attack to
corrupt web logs. The first that came to us was to use faked http
headers in order to achieve the same result: execution of scripts by
log analyzers. There are a lot of http headers that can be used to
inject code in a log file. We are not going to discuss all of them in
this paper, but only to outline some generic ways to do it.
The main objective here is to choose the right header to inject code in
the http request… For example, the “RequestResource” is always showed
in web logs, but probably it will be filtered by many application
firewalls or it will be detected by IDSs… On the other hand the
“UserAgent” header usually is not being checked for suspicious secuence
of characters, and web masters usually like to have this info on their
log files…
An example on how to trick a log analizer to execute a script we set
“UserAgent” header of our client as:
“<script>alert(‘UserAgent’)</script>”.
The requests of our client with this faked “UserAgent” will inject code
in the web server log. Some log analyzers reading this logs and
generating HTML formatted reports without filtering the output, will
execute the script.
-Examples of vulnerables log analyzers-
WebExpert (see link below):
http://www.infohacking.com/INFOHACKING_RESEARCH/Our_Advisories/ILLC/log_anal_XSS.gif
LoganPro (see link below):
http://www.infohacking.com/INFOHACKING_RESEARCH/Our_Advisories/ILLC/cap-loganpro-agent.gif
To solve this kind of problems it would be nice a more aggressive
filtering on DNS responses and HTTP requests on all the headers.
To finish this short analisys we would like to make some questions:
Are log analyzers thrusting too mutch on log files?
Maybe, are web servers the ones that would have to filter what they
write to log files…?
Is the operating system the one that have to filter the returned values
from DNS servers?
Are the actual legal domain name hosts allowed too mutch liberal?
SOLUTION
See above