I have a feeling that this is a futile plea, but if anyone out there is familiar with the Quake 2 source code and willing to help me, please drop me a line. I'm using the Quake 2 graphics/physics engine for my PhD dissertation, and I'm having some problems with one of John Carmack's collision detection functions.
Specifically, in the SV_AreaEdicts_r() function in sv_world.c, there is a loop as so:
for (l=start->next ; l != start ; l = next)The problem is that sometimes, somehow, the variables l, start, start->next, and l->next are NULL. I can't figure out why. The pointers should create a circular linked-list. Note: I did not write this code, or the code that builds the linked list. I think John Carmack wrote it.
{
next = l->next;
check = EDICT_FROM_AREA(l);if (check->solid == SOLID_NOT)
continue; // deactivated
if (check->absmin[0] > area_maxs[0]
|| check->absmin[1] > area_maxs[1]
|| check->absmin[2] > area_maxs[2]
|| check->absmax[0] < area_mins[0]
|| check->absmax[1] < area_mins[1]
|| check->absmax[2] < area_mins[2])
continue; // not touchingif (area_count == area_maxcount)
{
Com_Printf ("SV_AreaEdicts: MAXCOUNT\n");
return;
}area_list[area_count] = check;
area_count++;
}
It appears that the problem stems from some of my entities colliding with the worldspawn entity and with each other, but I don't have any idea why these collisions would create these bad pointer problems.
I have reverted back to the baseline Q2 engine source code; whatever it causing the problem is in my gamex86.dll code (probably?!)... but I have no idea where to look.
In my dream world, this plea for help spreads across the internet, and John Carmack emails me with a one-line code fix.









There are two conventions which are used for terminating a linked list. One is for the last element's "next" pointer to be null, which is what most people use. The other is for the last element to point back to the list header, or back to the first element, and from the construction of the for-loop's termination condition, this one is using the first element. That's why the for-loop's termination condition is for the pointer which is tracing the loop to have a value equal to the value in the list header. That means you've once again returned to the beginning of the list, which means you've run all the way around it once.
That's an archaic approach, and I think the reason it may have been adopted back in the day was that the machines didn't have much memory by modern standards (as little as 4 kilowords in many cases), and this didn't waste the zero address as a potential location for a list element.
It also means that a runaway loop which doesn't terminate properly doesn't escape into the wide world and potentially trash random memory.
But it's also confusing and makes the code opaque (since different linked lists have different values as terminators, making all the loops look different from one another) and it isn't done much any longer.
Whichever is used, it has to be consistent. (And it bloody well ought to be documented in the comments describing the linked list structure.) Have you added code anywhere which tries to terminate this list with NULL?
If so, the fact of both 'l' and 'next' being 0 isn't all that surprising. Given the overwhelming importance of NULL and the fact that it's usually arbitrarily defined as '(void *)0', it is not uncommon to reserve space at that address sufficient to hold a pointer and to make it contain a NULL itself (i.e. to make it point to itself if one ignores the fact that it is a NULL). Sometimes that's done explicitly by the coders, and some linkers do it automatically. This is done as a defensive measure.
With that in place, any attempt to treat NULL as a pointer to another entry would again point to it, instead of merrily rampaging through memory randomly choosing locations as if they were parts of the linked list. It doesn't fix the bug in the code handling the linked list which incorrectly follows the NULL pointer, but it does tend to limit the havoc such a bug would create, which otherwise could destroy so much as to leave no evidence behind of exactly what was responsible for the problem in the first place.
By the way, another problem with the approach of using a link back to the beginning as a terminator is that if you insert a new entry at the beginning of the list, you have to trace to the last element of the list and update its "next" pointer. Otherwise it would point to what had become the second element of the list rather than the first.
Also, if you remove the first element, you'd have to update the last element. Otherwise it would point to the now-removed element on the free list or whatever else it might end up being after being allocated again later. (Yeesh.)
Of course, not all linked lists are subject to that kind of editing. But it's still a lousy way to handle it, which is why few people use that convention any longer and most instead use NULL as the terminator.
It is an odd way of doing things, and it has made it rather confusing for me. I haven't quite been able to figure out how this linked list is actually built.
I didn't write this code. The problem isn't in this code itself, but in the code that builds this linked circuit (elsewhere). The list should be built as a closed loop, but under the error conditions I'm seeing the loop isn't being built properly, and I can't figure out why.
I didn't write the code that builds the loop, either. I didn't write any of the related code, but somehow the code that I did write is causing this OTS code (written by John Carmack, I presume) to fail. I have struggled with it for a while, and I'm hoping that someone familiar with the entity parsing routines will be able to tell me "oh yes, if that list doesn't build properly it's probably because you XYZ."
Have you emailed John? Sometimes he replies. Of course with impending Doom3 now it may be a bad time, but he surprised the ever living crap out of me one day when he replied. I went and saw him speak at QuakeCon in like 99 or so and was writing before hand to ask him something and he sent me a short, concise answer. Very likable guy. Wish I was brilliant.. :P
cheers..
I found the error myself: a buggardly structure overflow.