Page 6 of 14

Re: NT Crash Thread

Posted: Wed Jan 14, 2015 5:37 pm
by zippyzee
In any case knowing that we are tiptoeing on the ledge will force us (me) into some very careful programming safeguards, which I'm not sure ever existed in the live Vanguard game. Crashing on minimizing happened to me for years; I could never tab out or minimize on my system until the last year or so.

Re: NT Crash Thread

Posted: Wed Jan 14, 2015 5:59 pm
by John Adams
Our code will exceed theirs, guaranteed

Current SVN is on NT, and the server is now running on a physical platform (non-VM) so we'll see if it has anything to do with instability. I even have a localhost MySQL instance, to be sure to cut out the ESX cluster entirely from the equation. I seriously doubt it's my hardware, though. Within 5 mins of being up on the bare metal, I had the packet spam issue again.

Re: NT Crash Thread

Posted: Fri Jan 23, 2015 3:20 pm
by John Adams
Crash UDPServer deque

Stack

Code: Select all

 	WorldServer.exe!std::_Deque_const_iterator<std::_Deque_val<std::_Deque_simple_types<SocketData *> > >::operator*() Line 329	C++
 	WorldServer.exe!std::_Deque_iterator<std::_Deque_val<std::_Deque_simple_types<SocketData *> > >::operator*() Line 605	C++
 	WorldServer.exe!std::deque<SocketData *,std::allocator<SocketData *> >::front() Line 1426	C++
>	WorldServer.exe!UDPServer::HandleWrite() Line 503	C++
 	WorldServer.exe!WriterThread(void * data) Line 137	C++
 	WorldServer.exe!ThreadRun(void * arg) Line 77	C++
 	WorldServer.exe!_callthreadstart() Line 255	C
 	WorldServer.exe!_threadstart(void * ptd) Line 239	C
 	kernel32.dll!7689338a()	Unknown
 	[Frames below may be incorrect and/or missing, no symbols loaded for kernel32.dll]	
 	ntdll.dll!76f99f72()	Unknown
 	ntdll.dll!76f99f45()	Unknown
UDPServer.cpp, line 503:

Code: Select all

sdata = outgoing_noseq->front();
Vars:

Code: Select all

-		outgoing_noseq	0x19655984 { size=16646382 }	std::deque<SocketData *,std::allocator<SocketData *> > *
		[0]	<Unable to read memory>	
		[1]	<Unable to read memory>	
		[2]	<Unable to read memory>	
		[3]	<Unable to read memory>	
		[4]	<Unable to read memory>	
		[5]	<Unable to read memory>	
		[6]	<Unable to read memory>	
		[7]	<Unable to read memory>	
		[8]	<Unable to read memory>	
		[9]	<Unable to read memory>	
		[10]	<Unable to read memory>	
.
.
.
and on and on

Re: NT Crash Thread

Posted: Sat Jan 24, 2015 5:13 pm
by Lokked
It is possibly due to this:
When a Client object is destroyed:

Code: Select all

m_outgoing_noseq.WriteLock();
for (auto& itr2 : outgoing_noseq)
	delete itr2;
m_outgoing_noseq.WriteUnlock();
The outgoing and outgoing_noseq deques are deleted, but the containers aren't emptied, nor is the deleted pointer set to NULL (it still points to the same memory block it had before). The error is saying the iterator was made invalid, which doesn't exactly make sense with all this, but it's possible that it's a side effect.

There is a very slight chance that right after the above block was run, another thread running the UDPServer tried to read the list to send a packet and caused the crash.

Committed Rev 990

Re: NT Crash Thread

Posted: Tue Jan 27, 2015 2:17 pm
by John Adams
Sure surprised me when I saw this just happen while I was staring at the screen

World Crash in ChunkServer::HandleClientTrainingBegin

Code: Select all

 	ntdll.dll!7702e725()	Unknown
 	[Frames below may be incorrect and/or missing, no symbols loaded for ntdll.dll]	
 	ntdll.dll!7702f659()	Unknown
 	ntdll.dll!76f93cfe()	Unknown
 	ntdll.dll!76f93cfe()	Unknown
 	WorldServer.exe!_heap_alloc_base(unsigned int size) Line 57	C
 	WorldServer.exe!_heap_alloc_dbg_impl(unsigned int nSize, int nBlockUse, const char * szFileName, int nLine, int * errno_tmp) Line 431	C++
 	WorldServer.exe!_nh_malloc_dbg_impl(unsigned int nSize, int nhFlag, int nBlockUse, const char * szFileName, int nLine, int * errno_tmp) Line 239	C++
 	WorldServer.exe!_nh_malloc_dbg(unsigned int nSize, int nhFlag, int nBlockUse, const char * szFileName, int nLine) Line 302	C++
 	WorldServer.exe!malloc(unsigned int nSize) Line 56	C++
 	WorldServer.exe!SOEProtocolData::SetData(PacketStruct * packet_struct) Line 100	C++
 	WorldServer.exe!PacketStruct::Serialize() Line 724	C++
>	WorldServer.exe!ChunkServer::HandleClientTrainingBegin(const std::shared_ptr<Client> & client) Line 4885	C++
 	WorldServer.exe!ChunkServer::ProcessPackets() Line 403	C++
 	WorldServer.exe!ChunkPacketThread(void * data) Line 134	C++
 	WorldServer.exe!ThreadRun(void * arg) Line 77	C++
 	WorldServer.exe!_callthreadstart() Line 255	C
 	WorldServer.exe!_threadstart(void * ptd) Line 239	C
 	kernel32.dll!7689338a()	Unknown
 	ntdll.dll!76f99f72()	Unknown
 	ntdll.dll!76f99f45()	Unknown
VS2012 says:
[quote]Critical error detected c0000374
WorldServer.exe has triggered a breakpoint.[/quote]

Didn't see anything wrong with the vars in the call, but down into the Serialize, SetData:

Code: Select all

-		data	0x00000000 <NULL>	char *
			<Unable to read memory>	char
		opcode	1172	unsigned int
+		packet_struct	0x1915b648 {name=0x1915b648 "WS_ServerAbilityTrainerList" opcode=1172 server_id=2 ...}	PacketStruct *
		packet_type	2	unsigned short
		size	700	int
+		this	0x13cf5a00 {protocol_opcode=9 sequence=0 packet_type=2 ...}	SOEProtocolData *

Re: NT Crash Thread

Posted: Tue Jan 27, 2015 4:27 pm
by Lokked
c0000374 is an indicator of Heap Corruption (same thing as receiving _CtrDebugHeapCorruption, or whatever).

This likely has NOTHING to do with ChunkServer::HandleClientTrainingBegin. This function just happened to be the one that cause something to be read or written to a corrupted portion of the Heap, which triggered the error.

When you allocate something to the heap, there is bookkeeping information also written to memory. For example, when you allocate an array with new[] and delete it with delete[], you don't specify how large the array was or how much memory was consumed, so how does the program know what to delete? It's this bookkeeping information. If something overflows or writes somewhere it shouldn't, and it corrupts this bookkeeping information, when the bookkeeping information is read, the OS realizes it's been corrupted and tells you so. What's potentially worse is that non-bookkeeping information is overwritten, which results in NO error, just wrong data (which may cause a crash down the road anyways).

There is no definitely method by which your program may fail if memory isn't managed properly, but we've seen enough examples to indicate we have a problem.

Heap Corruption is caused by:
- Allocating an Array with int *myIntArray = new int[some_array_length] and freeing with delete instead of delete[]
- Opposite of above: Allocating a single object with new Object and deleting with delete[] instead of delete
- Writing past the end of an array
- Writing to Free'd / delete'd memory
- Writing to unallocated memory
- Free'ing / delete'ing unallocated memory
- Using the wrong form of * vs & (which can be typos or misunderstanding when using combinations of * and &)
- Bad Cast / Casting to wrong type
- Copying an object with inappropriate Copy Constructor

Re: NT Crash Thread

Posted: Tue Jan 27, 2015 8:10 pm
by Faux
HandleClientTrainingBegin is next on my chopping block to rewrite to increase its performance. I'll take a look at it tomorrow night to see if its maybe doing anything wrong.

Re: NT Crash Thread

Posted: Tue Jan 27, 2015 10:09 pm
by Lokked
Everyone who's using Raw Pointers needs to review their code. Ignore the fact that this crash happened occurred in something to do with Abilities.

Re: NT Crash Thread

Posted: Wed Jan 28, 2015 12:45 am
by Kandra
We are using Coverity to analyse code quality where I'm working and it's really good at finding heap corruptions. I have not tested any free alternative tool for static code analysis but it could be worth a try. We are also using Valgrind but that requires you to actually run the program to find memory corruptions.

Some time ago I was fixing things in the server code which caused it to not compile using Clang, but those fixes are obsolete by now, and Clang is much better at finding errors then gcc or Visual Studio is.

Re: NT Crash Thread

Posted: Wed Jan 28, 2015 1:00 am
by Lokked
Thanks for the info. I'm not a professional programmer and have very little knowledge of these tools. I appreciate your insight. I have used Valgrind and I think that'll be a last resort for me, as 1) I'd have to set up Linux and 2) I'd have to use Valgrind.

I will look into Clang and Coverity.