4th Mar 2003 [SBWID-6042]
COMMAND
	Log corruption via specially crafted reverse DNS data
SYSTEMS AFFECTED
	WebExpert,  LoganPro,  Iplanet  Log  Analyzer,   WebTrends,   SurfStats,
	WebLogExpert
PROBLEM
	In Hugo Vázquez  Caramés  &  Toni  Cortés  Martínez  of  Infohacking
	Research 2001-2003, whitepaper :
	                    ILLC - Inverse Lookup Log Corruption
	We are using a technique that we have called “ILLC” (Inverse Lookup  Log
	Corruption) that allows us to corrupt the logs  generated  by  many  web
	servers that are doing inverse address resolution.
	Impact of this technique:
	 
	-	“IP spoofing” on the logs
	-	Code execution (XSS) on boxes that are running log analyzers (web  servers that have buit-in report analisys tools,etc.)
	
	On some specific scenarios, we have been able to hide  the  entire  http
	request to the log viewer.
	Most of the actions were possible because of the lack of filtering  when
	parsing host names between different applications.
	Related RFC´s about Internet Host Names convention:
	
	RFC 952: 
	“(…)
	1.	A "name" (Net, Host, Gateway, or Domain name) is a text string up 
	to 24 characters drawn from the alphabet (A-Z), digits (0-9), minus sign (-
	), and period (.). Note that periods are only allowed when they serve to 
	delimit components of "domain style names". (See RFC-921, "Domain Name 
	System Implementation Schedule", for background). No blank or space 
	characters are permitted as part of a name. No distinction is made between 
	upper and lower case. The first character must be an alpha character. The 
	last character must not be a minus sign or period.... Single character 
	names or nicknames are not allowed....
	(…)”
	RFC 1034:
	“(…)
	3.5. Preferred name syntax 
	... The labels must follow the rules for ARPANET host names. They must 
	start with a letter, end with a letter or digit, and have as interior 
	characters only letters, digits, and hyphen. There are also some 
	restrictions on the length. Labels must be 63 characters or less. (...)”
	RFC 1123: 
	“(…)
	2.1 Host Names and Numbers 
	The syntax of a legal Internet host name was specified in RFC-952 [DNS:4]. 
	One aspect of host name syntax is hereby changed: the restriction on the 
	first character is relaxed to allow either a letter or a digit. Host 
	software MUST support this more liberal the RFCs. Note that under BIND 8, 
	you may need to add "check-names master ignore" to the zone definition 
	when defining these names.(…)”
	RFC 2181: 
	“(…)
	11. Name syntax 
	Occasionally it is assumed that the Domain Name System serves only the 
	purpose of mapping Internet host names to data, and mapping Internet 
	addresses to host names. This is not correct, the DNS is a general (if 
	somewhat limited) hierarchical database, and can store almost any kind of 
	data, for almost any purpose. 
	The DNS itself places only one restriction on the particular labels that 
	can be used to identify resource records. That one restriction relates to 
	the length of the label and the full name. The length of any one label is 
	limited to between 1 and 63 octets. A full domain name is limited to 255 
	octets (including the separators).(…)”
	Independently of what should be the legal host name syntax, it seems that 
	operating systems allows host names with arbitrary characters.
	To succesfully attack a server with “ILLC” technique is mandatory that web 
	server/log analyzer,etc., will be doing inverse address resolution and 
	that the attacker could control in any way the responses to those inverse 
	lookup requests.
	
	        ---------------------------------------------------------
	             Exploiting web server/log analyzers through “ILLC”
	        ---------------------------------------------------------
	Examples of attacks:
	 
	-Log “IP Spoofing”
	
	(exploited succesfully on Apache 2.0.44 on Windows/Linux, and Iplanet  6
	on Windows) Scenario: a machine with a host  name  as  "123.123.123.123"
	makes a request to an Apache server. If the server dosn`t  generate  any
	error, on the access log you will see an access request  from  a  client
	called "123.123.123.123", what apparently seems to be  a  valid  request
	from a client that server was unable to resolve to a host name.  So  the
	real IP wouldn't appear in the access log file.
	
	access.log
	123.123.123.123 - - [28/Feb/2003:10:39:01 +0100] "GET / HTTP/1.1" 200 
	1786 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
	123.123.123.123 - - [28/Feb/2003:10:39:46 +0100] "GET /badrequest.html 
	HTTP/1.1" 404 294 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
	
	If the request produces some error, you will see an entry in  the  error
	log file were you could see the real IP, although  the  web  server  has
	the inverse lookup activated.
	
	error.log
	[Fri Feb 28 10:39:46 2003] [error] [client 172.26.50.45] File does not 
	exist: C:/Archivos de programa/Apache Group/Apache2/htdocs/badrequest.html
	
	So, while there aren’t errors, the real IP is not showed. This can  lead
	in a complete anonymous http access for a client in a usual web  surfing
	activity, that is, if there are not broken links,etc.
	In the case of Iplanet 6, the real IP wouldn’t appear  in  the  “access”
	preview (see link below):
	
	http://www.infohacking.com/INFOHACKING_RESEARCH/Our_Advisories/ILLC/cap-access-log1.gif 
	
	Neither in the “errors” preview (see link below):
	
	http://www.infohacking.com/INFOHACKING_RESEARCH/Our_Advisories/ILLC/cap-errors.gif 
	
	
	-CODE INYECTION
	
	(Succesfully exploited on Apache 2.0.44 on  Windows/Linux,  on  IIS  6.0
	and Iplanet 6 on Windows)
	Scenario:      a      machine       with       a       hostname       as
	“<scrip>alert(‘a’)</script>” that makes  an  HTTP  request  leaves
	javascript code on the log. When generating  a  report,  with  some  log
	analyzers (that show results in html), the script will be executed.
	 *Note: in IIS 6.0 case we needed to restrict access on webserver by domain 
	 name in order to force inverse lookup resolution.
	 *Note2: in the Iplanet case we needed to simulate a FQDN client host name 
	 like this:
	
	“<scrip>alert(‘a’)</script>.infohacking.com”.
	You can also set a host name were the script is only part of the entire 
	string label:
	“nop<scrip>alert(‘a’)</script>.infohacking.com”
	
	so when html formatted it will appear as a valid domain name:
	
	“nop.infohacking.com.”
	
	Meanwhile the script will be executed…
	Some log analyzers proved to be vulnerable  to  "ILLC":  WebTrends  (see
	link below):
	
	http://www.infohacking.com/INFOHACKING_RESEARCH/Our_Advisories/ILLC/cap-webtrends-illc.gif 
	
	SurfStats (see link below):
	
	http://www.infohacking.com/INFOHACKING_RESEARCH/Our_Advisories/ILLC/cap-surfstats_loganalizer.gif 
	
	WebLogExpert (see link below):
	
	http://www.infohacking.com/INFOHACKING_RESEARCH/Our_Advisories/ILLC/weblogexpert_illc.gif 
	
	And probably many more…
	Iplanet comes with a buil-in tool to  generate  html  reports  based  on
	access and error logs. This tool  is  part  of  the  administration  web
	interface. Moreover, Iplanet log analyzer always uses a  web  broser  to
	show the results of the report, although the  user  selects  “Only  text
	output”, so it will be always exploitable.
	Iplanet Log Analyzer (HTML report, see link below):
	
	http://www.infohacking.com/INFOHACKING_RESEARCH/Our_Advisories/ILLC/cap-report-html.gif 
	
	Iplanet Log Analyzer (“txt” report, see link below):
	
	http://www.infohacking.com/INFOHACKING_RESEARCH/Our_Advisories/ILLC/cap-report-text.gif 
	
	On the other hand we have to notice that the access log  previewer(“View
	Access Log”) in  the  Iplanet  web  interface  is  doing  some  kind  of
	filtering on some characters (for example <>).
	Iplanet “View Access Log” (see link below):
	
	http://www.infohacking.com/INFOHACKING_RESEARCH/Our_Advisories/ILLC/cap-iplanet-filtra.gif 
	
	
	-HIDING REQUESTS (Iplanet 6 on Windows)
	
	In the specific case of Iplanet 6, we coul realise that  there`s  a  way
	to trick the server on not  showing  the  request  in  the  log  preview
	(“View Acces Log” and “View Error Log”). The requests from  boxes  whose
	host name begins with “format=” will  not  be  showed,  that  is,  those
	requests still are visible in the access and error log files,  but  they
	would be “invisible” from the built-in access and error  log  viewer  of
	the  administration  web  interface…(“Last  25  accesses  to…”).  As  an
	example  we  made  requests  from   a   box   with   this   host   name:
	“format=.infohacking.com”, and we realize that we can  see  the  request
	in the access log:
	
	format=%Ses->client.ip% - %Req->vars.auth-user% [%SYSDATE%] "%Req-
	>reqpb.clf-request%" %Req->srvhdrs.clf-status% %Req->srvhdrs.content-
	length% "%Req->headers.referer%" "%Req->headers.user-agent%" %Req-
	>reqpb.method% %Req->reqpb.uri% %Req->reqpb.query% "%Req->reqpb.protocol%" 
	%vsid%
	format=.winmat.com - - [28/Feb/2003:10:22:25 +0100] "GET /evilrequest.html 
	HTTP/1.1" 404 292 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" 
	GET /evilrequest.html - "HTTP/1.1" https-script.winmat.com
	
	But on the “View Access Log” nothing is showed (see link below):
	
	http://www.infohacking.com/INFOHACKING_RESEARCH/Our_Advisories/ILLC/cap-NOlog-iplanet.gif 
	
	We suppose that server is processing the first part  of  the  host  name
	string “format=” as it would be the directive that sets the log  format,
	and the rest of the  string  is  not  recognized  as  valid  format,  so
	nothing is showed.
	Combining the possibility  of  hiding  a  request  and  the  Cross  Site
	Scripting technique we could execute scripts on the  machine  that  runs
	the Report Generator of the Iplanet Web Server in a  “stealth”  way.  We
	have   done   this    establishing    a    host    name    like    this:
	“format=<script>alert(document.cookie)</script>.infohacking.com”
	(See link below):
	
	http://www.infohacking.com/INFOHACKING_RESEARCH/Our_Advisories/ILLC/cap-iplanet-cookie.gif
	
	Many more evil actions can be done... it only depends on the  attacker's
	imagination.
	We haven’t checked “ILLC” on other daemons as ftp, smtp,  or  firewalls,
	IDSs, etc. We think that probably this technique could be  used  in  the
	same way in other scenarios.
	           ---------------------------------------------------------
	                     Exploiting http headers for log corruption
	           ---------------------------------------------------------
	Controlling inverse lookup responses is  not  always  possible  for  the
	attacker. We tried  to  figure  out  another,  more  generic  attack  to
	corrupt web logs. The first that came  to  us  was  to  use  faked  http
	headers in order to achieve the same result:  execution  of  scripts  by
	log analyzers. There are a lot of http  headers  that  can  be  used  to
	inject code in a log file. We are not going to discuss all  of  them  in
	this paper, but only to outline some generic ways to do it.
	The main objective here is to choose the right header to inject code  in
	the http request… For example, the “RequestResource”  is  always  showed
	in web logs, but probably  it  will  be  filtered  by  many  application
	firewalls or it will  be  detected  by  IDSs…  On  the  other  hand  the
	“UserAgent” header usually is not being checked for suspicious  secuence
	of characters, and web masters usually like to have this info  on  their
	log files…
	An example on how to trick a log analizer to execute  a  script  we  set
	“UserAgent” header of our client as:
	
	“<script>alert(‘UserAgent’)</script>”.
	
	The requests of our client with this faked “UserAgent” will inject  code
	in the web  server  log.  Some  log  analyzers  reading  this  logs  and
	generating HTML formatted reports without  filtering  the  output,  will
	execute the script.
	 -Examples of vulnerables log analyzers-
	WebExpert (see link below):
	
	http://www.infohacking.com/INFOHACKING_RESEARCH/Our_Advisories/ILLC/log_anal_XSS.gif 
	
	LoganPro (see link below):
	
	http://www.infohacking.com/INFOHACKING_RESEARCH/Our_Advisories/ILLC/cap-loganpro-agent.gif 
	
	To solve this kind of problems  it  would  be  nice  a  more  aggressive
	filtering on DNS responses and HTTP requests on all the headers.
	To finish this short analisys we would like to make some questions:
	Are log analyzers thrusting too mutch on log files?
	Maybe, are web servers the ones that would  have  to  filter  what  they
	write to log files…?
	Is the operating system the one that have to filter the returned  values
	from DNS servers?
	Are the actual legal domain name hosts allowed too mutch liberal?
SOLUTION
	See above