Which words have the letter P in it and what’s the percentage?

November 21st, 2009

Another code snippet from the useless dept, this time for PHP. If you’ve ever written a post and wondered how many words use the letter P (or any character), here’s something for you in PHP:

function find_words_with_letters($letter, $str) {

	$a = preg_split("#[ \n\r]+#", $str);

	$letterupper = strtoupper($letter);
	$letterlower = strtolower($letter);

	$result = array();
	$result['word_count'] = count($a);
	$result['words_with_letter'] = array();

	foreach($a as $i) {
		if (strpos($i, $letterlower) !== false || strpos($i, $letterupper) !== false) {
			$result['words_with_letter'][] = $i;
		}
	}
	$result['words_with_letter_ratio'] = count($result['words_with_letter']) / count($a);

	return $result;
}

// use it
print_r(find_words_with_letters('p', 'This message has one P'));

By the way, 17.3% of this post have P’s in it!

Javascript snippet to convert raw UTF8 to unicode

October 3rd, 2009

For the I-don’t-a-sane-use-for-this department comes this piece of code which takes a stream of raw UTF-8 bytes, decodes it and fromCharCode it, rendering it in a unicode supported browser. A possible use would be if the web page character set is not UTF-8 and you want to display UTF-8. To use it, just put it in a script tag and call utf8decode(myrawutf8string). But seriously, all web pages should be UTF-8 by default nowadays. Here it is, in case anyone wants it:

function TryGetCharUTF8(c, intc, b, i, count)
		{
			/*
			 * 10000000 80
			 * 11000000 C0
			 * 11100000 E0
			 * 11110000 F0
			 * 11111000 F8
			 * 11111100 FC
			 *
			 * FEFF = 65279 = BOM
			 *
			 * string musicalbassclef = "" + (char)0xD834 + (char)0xDD1E; 119070 0x1D11E
			 */

			if ((b.charCodeAt(i) & 0x80) == 0)
			{
				intc = b.charCodeAt(i);
			}
			else
			{
				if ((b.charCodeAt(i) & 0xE0) == 0xC0)
				{
					//if (i+1 >= count) return false;
					intc = ((b.charCodeAt(i) & 0x1F) << 6) | ((b.charCodeAt(i + 1) & 0x3F));

					i += 1;
				}
				else if ((b.charCodeAt(i) & 0xF0) == 0xE0)
				{
					// 3 bytes Covers the rest of the BMP
					//if (i+2 >= count) return false;
					intc = ((b.charCodeAt(i) & 0xF) << 12) | ((b.charCodeAt(i + 1) & 0x3F) << 6) | ((b.charCodeAt(i + 2) & 0x3F));
					alert(b.charCodeAt(i) + ' '+b.charCodeAt(i + 1) +' '+b.charCodeAt(i + 2));
					i += 2;
				}
				else if ((b.charCodeAt(i) & 0xF8) == 0xF0)
				{
					intc = ((b.charCodeAt(i) & 0x7) << 18) | ((b.charCodeAt(i + 1) & 0x3F) << 12) | ((b.charCodeAt(i + 2) & 0x3F) << 6) | ((b.charCodeAt(i + 3) & 0x3F));

					i += 1;
				}
				else
					return false;
			}
window.utf8_out_intc = intc;
window.utf8_out_i = i;
			return true;
		}

function utf8decode(s) {
	var ss = "";
	for(utf8_out_i = 0; utf8_out_i < s.length; utf8_out_i++) {
		TryGetCharUTF8(window.utf8_out_c, window.utf8_out_intc, s, window.utf8_out_i, s.length);
		ss += String.fromCharCode(window.utf8_out_intc);
	}
	return ss;
}

NTLM authentication in PHP – Now with NTLMv2 hash checking

September 20th, 2009

A few years ago, I investigated NTLM and PHP and managed to write a simple PHP script that can retrieve the current windows username. However, it was only partly finished as it did not authenticate the user. Inspired by a recent comment, I’ve decided to revisit this problem and solve the authentication issue. For the better part of the afternoon, I wrestled with the NTLMv2 hash checking as detailed here.

It turns out that NT passwords are stored as a MD4 hash of the UTF16LE password string. So once we have this hash, we can begin verifying passwords. Obtaining this hash for a user is relatively easy with Samba, but seems to pose a challenge on Windows. In a later blog post I will detail how to integrate PHP authentication with Samba. For now, the code will assume you have already obtained a database of your users MD4 hashed passwords.

The crux of the NTLMv2 authentication involves using HMAC-MD5 on challenges and nonces using the MD4 hashed password as the key. The result is a 150 line source code that perform authentication on clients supporting NTLMv2. On the support NTLMv2, Internet Explorer supports it fine. Firefox on the other hand only has limited support for NTLMv2. In Firefox on Windows, if you have whitelisted your server with network.automatic-ntlm-auth.trusted-uris, Firefox will attempt to use Windows’ SSPI support (sys-ntlm) to perform single sign on. The SSPI module supports NTLMv2 fine. However, if you are using Firefox’s own cross platform NTLM module, you’re out of luck, it only supports the legacy NTLM and LM hashes. Perhaps it will support NTLMv2 in the future.

For Internet Explorer 8, intranet settings are now off by default, which means single sign on won’t automatically activate. To fix this, you should see a yellow bar prompting you whether to apply intranet settings.

The php ntlm authentication library is available here: PHP NTLM Github

To use it just put

include('ntlm.php');

function get_ntlm_user_hash($user) {
	$userdb = array('loune'=>'test', 'you'=>'gg', 'a'=> 'a');

	if (!isset($userdb[strtolower($user)]))
		return false;
	return mhash(MHASH_MD4, ntlm_utf8_to_utf16le($userdb[strtolower($user)]));
}

session_start();
$auth = ntlm_prompt("testwebsite", "testdomain", "mycomputer", "testdomain.local", "mycomputer.local", "get_ntlm_user_hash");

if ($auth['authenticated']) {
	print "You are authenticated as $auth[username] from $auth[domain]/$auth[workstation]";
}

You need to provide your own implementation of the callback function get_ntlm_user_hash($user) which should return the MD4/Unicode hashed password of the requested $user. You can get that by doing mhash(MHASH_MD4, ntlm_utf8_to_utf16le("password")). You also need session_start() as the script needs to persist challenge information across http requests.

Next time, I will blog about the best way to integrate it with Samba on Linux.

Quick Tip: The platform “Windows Mobile 6 Professional SDK (ARMV4I)” is not defined within Visual Studio.

September 6th, 2009

If you are setting up Qt Visual Studio Addin in Vista and get this message when trying to add the Qt folder (in Qt Options > Qt Versions):
The platform “Windows Mobile 6 Professional SDK (ARMV4I)” is not defined within Visual Studio. Make sure you have installed the required SDK.
-or-
The platform “Windows Mobile 5.0 Pocket PC SDK (ARMV4I)” is not defined within Visual Studio. Make sure you have installed the required SDK.

You need to make sure you have Visual Studio running in Administrator mode. You also need to be in Administrator mode when creating your WinCE project.

You can also type in the command line: checksdk -list to check that you have the correct Windows CE SDK installed.

Getting the intersection points of two [path] geometries in WPF

August 28th, 2009

While working on an app that utilises geometries in WPF, I needed a way to get the intersection points of the lines of two arbitrary geometries. A google search didn’t yield any useful hints except a post suggesting to use mathematics. That would be ideal for very simple geometries like lines, but with complex geometries, it becomes tiresome real quick. The framework doesn’t seem to have any built in functions that calculate that, so it’s time for some hack and slash.

After a few days of thinking, I came up with a simple, yet effective (but not the most efficient solution). For those who just want the intersection between two geometries, there is the CombinedGeometry geometry class which takes input in the form of two geometries. Setting the GeometryCombineMode to Intersect gives a geometry which is the intersection of the two. At the otherside of the WPF realm, we have Geometry.GetWidenedPathGeometry(). This method basically converts/strokes path lines to an approximate geometry. Combining these two concepts, we can produce an intersection from two widened path geometries (the two paths which we want the intersection of).

public static Point[] GetIntersectionPoints(Geometry g1, Geometry g2)
{
Geometry og1 = g1.GetWidenedPathGeometry(new Pen(Brushes.Black, 1.0));
Geometry og2 = g2.GetWidenedPathGeometry(new Pen(Brushes.Black, 1.0));

CombinedGeometry cg = new CombinedGeometry(GeometryCombineMode.Intersect, og1, og2);

PathGeometry pg = cg.GetFlattenedPathGeometry();
Point[] result = new Point[pg.Figures.Count];
for (int i = 0; i < pg.Figures.Count; i++)
{
Rect fig = new PathGeometry(new PathFigure[] { pg.Figures[i] }).Bounds;
result[i] = new Point(fig.Left + fig.Width / 2.0, fig.Top + fig.Height / 2.0);
}
return result;
}

The function will return an array of zero or more points of intersection. To test it

sg1 = StreamGeometry.Parse("M0,0 L100,100");
sg2 = StreamGeometry.Parse("M0,100 L100,0");
Point[] pts = GetIntersectionPoints(sg1, sg2);
// pts[0] is {50,50}

Hope this helps someone.

The curious case of WindowsFormsParkingWindow

July 27th, 2009

I was debugging a problem the other day involving WebKit.NET, a webkit wrapper control for winforms. When the webkit control was hosted inside winforms, it had a strange problem of always thinking that it is out of focus. This had the effect of drawing all selections as grayed out.

Everything I threw at it (WM_SETFOCUS, WM_ACTIVATE) seemed to be going into a void, so it was time to break into the source and figure out what exactly was wrong. After a few breakpoints and step overs, I found that the WebKit control searches for the parent window and listens to its messages by subclassing the window. It listens for WM_NCACTIVATE event and then determines whether it has focus or not.

This was working exactly as expected on a normal non-winforms window, so what was different? Debugging the code, I found that it got the root parent window as WindowsFormsParkingWindow. Why is it returning that as the parent window and not the actual parent window I’m not entirely sure. Maybe it’s some special super root window like the Application window in Delphi? It wasn’t shown in Spy++ either. Then again what’s a WindowsFormsParkingWindow? It’s something that I ought to google. Unfortunately google wasn’t very forth coming. Puzzled, I devised a hack by simply temporarily unassigning the immediate parent window of the webkit control WS_CHILD, simply because the webkit find ancestry detection algorithm stops when it encounters a non-child. After the setHostWindow() call, I make the parent window WS_CHILD again. That seemed to work but something lingered in the back of my mind telling that there is a better solution.

The next day as I was about to post a solution on the mailing list describing my solution, just when I was about to talk about this weird WindowsFormsParkingWindow, something in my mind clicked. Suddenly it all seemed to make sense. There was a reason WindowsFormsParkingWindow wasn’t in the tree ancestry when I looked at Spy++. There’s also a reason for the name. It turns out that on creation, all controls get put into WindowsFormsParkingWindow. It just so happens the setHostWindow call was made in the constructor when the parent control was still parked. The solution seemed obvious now. The setHostWindow should be called after the parent is correctly set, ie on the Load event. So, another case closed.

Compiling WebKit/Cairo on Windows with Visual C++ Express

July 26th, 2009

Just for interest, I decided to build webkit on Windows. This supposed to be a painless task but unfortunately, anything that involves build scripts creates drama. The instructions on compiling are a bit scant, giving the impression that it’s super easy. Maybe it is and I’m just a bit dumb, but anyways after a few hours battling with it I finally got it working. Don’t bother trying with Visual Studio 2008 as you might as well jump off a cliff from compilation errors – Get the Visual C++ Express 2005 edition and SP1 patch. Here are the steps from scratch:

Go to http://nightly.webkit.org/ and get the source tar.bz2.

Extract it somewhere (WinRAR is a good tool for this). This somewhere will now be referred to as {EXTRACTED}.

For compiling on windows, you need windows Download the WebKitSupportLibrary.zip then copy this zip file on the root of your extracted webkit folder. Don’t extract it. This zip gives you unicode/uchar.h or pthread.h which would otherwise be reported missing.

So what’s the difference between webkit/cairo and regular webkit? The regular webkit depends on proprietary Apple libraries such as CoreGraphics. The webkit/cairo build substitutes that dependency with the free Cairo 2D graphics library. The library has a clean C API and contains very compehensive drawing routines. It also sports multiple output backends allowing you to create PDFs or SVGs.

Webkit/cairo has more dependencies on open source libraries such as curl. You either build them or get the pre-built ones some nice folks have put online – Get it here and extract it to a place like {EXTRACTED}\requirements.

Now you have to install Cygwin as noted here on point number 3. The important point I want to highlight is due to some assumptions made in the VS project files, the directory HAS to be: at c:\cygwin (or $(SYSTEMROOT)\cygwin). For Vista, you need to perform additional minor gymnastics which I’m not going to outline here. Read it on the previous link.

You also need to get the Quick Time SDK. Annoyingly, to download quicktime SDK you need to register a Apple Developer Connection (ADC) account. After entering details of your daily life, you’re grant access to the file. Good thing was that you don’t need quick time or itunes installed. You should probably keep the default install directory as well.

Next is setting two environment variables. This involves going into the System Properties in your Control Panel. You can just put add it under the User variables:

  • WEBKITLIBRARIESDIR – point it to your {EXTRACTED}\WebKit\WebKitLibraries\win
  • WEBKITOUTPUTDIR – point it somewhere like c:\webkitdist (create this directory as well)

You will now need to add some VC++ paths. Open C++ Express and go to Tools > Options > Projects > VC++ Directories

On the top drop down, select show directories for include files and add

  • {EXTRACTED}\requirements\include
  • {EXTRACTED}\requirements\include\cairo
  • {EXTRACTED}\requirements\include\curl
  • {EXTRACTED}\requirements\include\libpng13

then select show directories for library files and add

  • {EXTRACTED}\requirements\lib

Lastly in the Executables files section add

  • C:\cygwin\bin – Put this just above $(PATH) but after all the other directories. This is important as if you put it on top, cygwin’s link.exe will conflict with MSVC’s, and you need to put it above $(PATH) or the PATH directories that have similarly named binaries to cygwin’s will conflict.

You will also need to get the Platform SDK as noted in the build instructions if you don’t have it already. Any recent version of the SDK should be alright. Add the executable, include and lib directories to the VC++ directories like the above, described here. The instructions about vcprops editing aren’t really important.

For the include path, you also need to add the mfc includes or the compiler will complain about missing winres.h

Due to some bootstrapping scripts, the first time you compile you can’t do it through the Visual C++ IDE. You will need to compile it via the command line in cygwin. Open the cygwin shell and change directory to {EXTRACTED} ie cd c:/webkit/

Then run:

WebkitTools/Scripts/update-webkit

After it completes, you can start building

WebkitTools/Scripts/build-webkit --wincairo --debug

Remove the –debug flag if you want to compile for Release. Note: Previously the wincairo flag was cairo-win32.

After the first successful build, you can now open {EXTRACTED}\WebKit\win\WebKit.vcproj in Visual Studio. Make sure the build profile is set to Debug_Cairo, set it. You can choose Release_Cairo as well if you so wish. Finally, winlauncher is the test browser you use to try out your new build, so set that as your active project. Don’t forget to copy all the requirements dlls (ie cairo.dll) into the dist\bin folder where winlauncher.exe is or else it’ll complain about missing DLLs.

Detecting the back (or refresh) button click

July 11th, 2009

While developing a web app, I came across an interesting problem: I had a page which had a button to perform an action. If the button is clicked, the action request is sent to the server side script and redirected back to the same page but with a message displayed on the top of the page (ie Your post has been submitted).

If you then navigate to another page but click back, you would see the same page with the same message popping up. I want to detect that we’re clicking back so we will hide the message. There are plenty of solutions in google, but a lot of them involved setting a cookie (what if cookies are disabled), or a server side script detecting referer (what if page is still cached?), or using time by detecting if the server page load time and the current time differs by a large amount (what if client time is wrong?). Without an ideal solution, I set about finding a new solution. Surely it can’t be hard to detect that we’ve already been in that same page. If only there was a way to save a flag just for that page and for the duration of the page session. I tried modifying the DOM, but that gets reverted when you click back. The onload event also get called again, so you can’t use that to differentiate.

I then remembered that at least on recent browsers, there exists a functionality in forms that retained form field information if you clicked back – very handy if you’re submitting a post and the connection died, you can just click back and your long winded post would be intact.

Solution – Use a hidden form field to detect that we’ve been on this page before

Building on this idea, it’s possible to temporarily store a flag on a hidden form field that says, yep I’ve been on this page before. Here is a code snippet:

<html>
<body>
Try
<a href="http://www.google.com/">jumping to another page</a>
</body>

<script>

document.write("<form style='display: none'><input name='__detectback' id='__detectback' value=''></form>");

function checkPageBackOrRefresh(load_id) {
if (document.getElementById('__detectback').value == load_id) {
return true;
} else {
document.getElementById('__detectback').value = load_id;
return false;
}
}

window.onload = function() {
if (checkPageBackOrRefresh('tt'))
alert('You clicked back or refreshed the page');
}

</script>

</html>

Unfortunately, this solution does not work in some browsers where “fast back” (ie, fbcache in firefox) is enabled, as the fast back stores the scripting state so a onload does not trigger again.

The script should work fine with IE7 and IE8. With Firefox, it only works on certain pages. These pages seems to be pages that link to heavy javascripts (ie jquery?).

With fbcache enabled browsers, a possible solution would be to hide the message at the event onbeforeunload so it will not appear even when clicking back.

Pondering per user accounting in Linux

July 11th, 2009

I’ve been researching for the better part of the day on what the best method to account for bandwidth (and cpu/memory) used by a particular user is. This is useful if you run a hosting business and give out shell access. At first I was looking for a way to meter SSH. There seems to be an old patch for it, but as I continued reading, a old mailing from a mailing list pointed out that there are heaps of ways to generate traffic when you have a shell account (ie wget). In fact you don’t even need shell access – any scripting language that could download will consume bandwidth that may not be accounted for.

So this began my quest to find the best solution to per user accounting in linux. The basic concept is that since the bandwidth consumption is triggered by a process, and owned by a specific user, we should be able to trace traffic to a user and record as such. The advantage is even greater if you run peruser mpm apache or suexec’d php.

I began looking at netfilter/iptables, which had a match -m owner uid. This works only on the OUTPUT chain and will tell you who sent the packet, but unfortunately doesn’t tell you who a packet was destined for.

iptables has a connection tracking feature, that tracks active connections, allowing for stateful packet inspection. If you have the kernel feature enabled, it will also count the traffic numbers, which you can then view in /proc/net/ip_conntrack (or /proc/net/nf_conntrack for newer installations). Using that, and cross referencing it with the netstat -anp and process table will give you an idea of which user owns the connection. This is assuming of course that the process doesn’t setuid to change users.

But then, how are we going to collect all the data? Polling would be extremely slow and tedious and you might miss short lived connections. It seems that using libnetfilter_conntrack, you can subscribe to an event that notifies when connection states have changed (CONFIG_IP_NF_CONNTRACK_EVENTS). Using that, you can record when connections are opened and when they are closed as they happen.

What about processes? Processing accounting can be easily taken care of by the unix acct tools, which monitors processes as they are created and destroyed, provided you have the correct kernel options enabled. But what if you don’t have this option, ie on a VPS – Is there an alternative? The answer is yes, but ugly. You might remember that process information can be access via /proc. What if I set inotify, the file system change mechanism to tell me when /proc has changed? Somebody already thought of this and found it didn’t work quite as expected. The reason for this was mentioned in the linked thread, but the responders did give a good alternative – using ptrace ().

The ptrace command is a powerful unix system call that can manipulate processes it has attached to. It is what the debugger gdb uses to debug running applications. Using the ptrace function, you can set an option to notify the controlling process via SIGTRAP that the ptrace’d process has terminated, or forked/execed. Using this, you can potentially hook into every process and closely monitor their lifecycle. The downside is that you cannot have two ptrace active on the same process, which means application like gdb will fail if your monitoring system is active. Since ptrace is primarily used for debugging, it may also degrade performance of application it has been attached to. So the bottom line is that it looks like it is too extravagant and thus the wrong way to go for implementing a process accounting/monitoring system.

Looks like my quest to find a viable way of accounting per user accounting has so far eluded me. Perhaps the old ways of individual accounting in every application service – apache/ftp/imap/smtp is here to stay.

Why is my {mysql, htpassswd, getpass} prompts not showing?

June 21st, 2009

I had problem on a VM running linux of password prompts not showing. What would happen is that the mysql command would just say wrong password without prompting even if I added the -p switch. Furthermore, the less command and man pages were broken as well! As a programmer I set out to narrow down the scope of the problem via looking a source code. I traced down the offending function as getpass(). Having searched every where on the web, I finally came across a reference telling the tty might be broken and to run:

mknod /dev/tty c 5 0

That immediately fixed it! device nodes are a bit of mystery to me, tty’s more so. One day I’ll understand it thoroughly.