Archive for October, 2007

more bug fixes

Tuesday, October 30th, 2007

Lance Haig has done some great testing and pointed out a couple of bugs.  the first was due to his using a tool to import his imap stuff off netmail to bongo.  when the imap daemon got *ported* from the hula trunk to the hula branch (which is now bongo) a lot of things had changed.  the most important of those changes was obviously the store.  in trunk an mbox format was used that had problems with the concept of sharing a mailbox.  functions existed to do crazy things in the case of one person expunging a message and letting someone else know (at least that is what it seemed like in the beginning).  the namespace imap command (which his tool needed), had been commented out and replaced with an unknown command error message.  i tried to enable the code but found those “crazy things” functions missing from the code.  in fact i could not find them anywhere in our codebase.  on a whim i decided to check the hula source and bingo there they were.  i didn’t spend too much time trying to figure out what they were doing, i just removed the code and re-enabled the namespace command.  with the new backend, i’m not sure that we need that information any longer as the store takes care of all of that.

after that he decided to play with his new system and needed to know how global domains worked.  i, thankfully, did some testing on that before sending instructions over to him and found that some work needed to be done to get it working.  i decided that i might as well get it going with global domains, a concept that we’d discussed and decided to continue.

global domains you say?  what are they.  well i think we should come up with a better name for them.  really what they are is:  “my usernames are not fully distinguished.  if mail comes in for a user at any global domain, strip off the domain portion and use that for the username.”  for example.  on my feltonline server here, my username is not fully qualified.  so which of the domains that i service does that email address belong to?  it belongs to ALL of the domains in my global domain list, at the same time.  i figured hey!  the aliasing system should be able to handle that if it is possible to alias one domain to the empty string.

i had to code a little bit for it to work, but now the system automagically adds a domain level alias for any domain in the global domains list to the empty string.  pretty much changing “pfelt@swedepop.com” into “pfelt”  (where swedepop.com is a global domain).  i had to add another property to the queue agent’s configuration document “domains”  which i think should be changed once we get a decent name for the concept.

the aliaising system is still somewhat limited though so don’t forget!!  you still can’t do somegroupname@somedomain.com => user1, user2, user3 type stuff, that’s coming.  i’m gonna be doing some more testing on this, but the code has been comitted and i’d appreciate any testing.  after all of this coding to getting it working it feels a little hackish how i did it, and i’ll probably end up changing it when i get around to fixing the one to many type situation anyhow.

i hope to be able to get to some python stuff tomorrow so that we can square away the installation stuff (actually setting hosted and global domains on configuration).

sheesh! what a patch

Wednesday, October 24th, 2007

so, i’ve just committed another couple of revisions to the branch.  i figure i’ll go through them and give another state of the state as i’ll be out of town till monday.

back when i was debugging the store/queue system due to my mail importing stuff i noticed that there were messages that weren’t showing up properly in imap.  one of them being a 10meg video that my brother sent me.  for some reason i couldn’t open it at all.  (which incidentally brings up another bug that i can’t forget to send to halex in DF regarding that same email).  i barely scratched the surface before moving to something else, filing the issue away for a later time.  i figured, at the time, that imap was something i didn’t really want to dive into as it is a pretty complex agent.  well, i came back to it tonight and found where the error was.  a simple bug where the store sends back the structure.  the GetMimeInfo() function would allocate a character string and store all the information in a BongoArray.  we would allocate say 50 bytes (by using a strlen) and then we’d say memcpy(src, dest, sizeof(char *)).  oops.  we’d only really get the 2002 response code from nmap and not the structure.  basically any multipart messages would not be readable properly by imap.

then, i committed a ~1300 line patch to queue/imap/pop/smtp where i stripped out setting from their configuration documents and put them in the global document.  because of this patch all existing branch installations will be broken till they get the new global config document.  there is a sample one included with the bongo-config application and if you don’t mind losing current settings you can just re-run the bongo-config install process.  that system does not yet currently modify the documents before writing them out, so you’ll have to change the document to show correct values.

if you want to run it by hand:

  1. t elnet localhost 689
  2. auth user admin bongo
  3. store _system
  4. write /config 7 NUMBEROFCHARSINDOC Fglobal
  5. paste in new document

after that you should be running again.  i’ve had the patch running on swedepop for the last day and not seen any issues with it, but please test it out.

other things this patch did include some basic housecleaning in smtp.  there was an unused variable taking up stack space that i cleared out.  smtp also got a new queue registration loop.  occasionally on startup here, smtp would not correctly register with the queue becuase it hadn’t been up yet (which i just realized in my commit message i said store instead of queue, oops!).  this should work now regardless as there is a loop with a sleep in it.  i also change a ton of XplConsolePrintf() statements into Log() statements giving us a little better logging.  i’m sure i misclassified some of them, but those are pretty easy changes.

in case some of you missed in the logs, i subscribed to the lkml just for traffic (i’ve gotten 176 mails in a little over 24hours — if you have any ideas for other high traffic mailing lists let me know, i’d love to subscribe).  bongo is handling it ok and the convo stuff in dragonfly works pretty well, though i wish there was a way to have the “inbox” view formatted more like the summary view.  perhaps just a different button or something.

state of the state:  i’m all committed except for the experimental aliasing code that doesn’t work.  branch feels very stable and useable at the moment except for that oddness with the antispam stuff marking messages as spam due to an odd header thrown in somewhere in the system (that’s next on my list unless something else comes up).  after that aliasing, then perhaps a major overhaul of smtp or the requested changes to the imap mail importer to run once for all mailboxes.  that is gonna be tricky i think especially since my python isn’t all that good.  if anyone wants to volunteer for that i’d be happy to send it over :)

all that being said, i’ll be back sunday night some time.  i hope to check my email some though i don’t exactly know what the status will be on online time.  now’s the time i really miss having a laptop.

bugs, bugs, bugs

Wednesday, October 17th, 2007

it’s been about a month since my last blog post. guess it’s time for another :)

lots has happened in the time since the last post. i didn’t realize after i’d checked in the aliasing code that i’d left a huge hanging section that hadn’t been completed and that pretty much stopped the remove branch from working. so i dug and found new reserves and have been a busy guy since then trying to get stuff going for the upcoming m3. i figure i’ll just go down the commit log and where i’m at now to explain what is going on in my portion of the m3 bongoverse.

added a DOMAIN LOCATION command to the queue agent. this allows protocol consumers to pass in an email address and get a result if the domain is local or remote. eventually we’ll add back in the relay domain stuff (smarthosts)

one of the problems with mail not going out was that the default configuration was incorrectly set to smarthost outgoing mail through another server which had never been configured.

i found when playing with gass’ odbc work that return codes could be integers when we were comparing as booleans. depending on how the comparison was done, it could produce runtime errors because of conversions and posted an example application to the -devel list which should compile anywhere regardless of a bongo repo. this caused a couple of issues in code to determine if users existed and if their passwords were correct most of the agents do something with those functions so for example smtp auth and imap login both were suspect

as stated above getting aliasing code actually running in smtp and getting it to accept mail. this led to bugs in the queue along the same types of lines

by the time i had all this working i had a fully working bongo server and i set about getting things set up to run it. i can’t run it on my normal server as i have too many users that *need* email to expose them to the alpha code. i consulted with my amazing isp and now have two new shiny ip addresses that i can bind. i ping’d my brother who hosts some domains on my server regarding availability of stealing the MX for one of them. he probably won’t ever user the MX for swedepop.com so i set up bongo on one of the ips and subscribed to all the bongo lists. this has provided amazing opportunity to test and debug the system as you’ll see below.  (mail me if you’d like an account.  it’s open to anyone who won’t spam ;) )

we found that bounces weren’t functioning properly. this is kinda important in the email world so i moved on to figuring this out. it took a very long time as the queue code is a little messy. it feels like it might be a lot of original code, though i can’t be entirely sure. i didn’t dive into the queue code too much back in the day at Novell. i committed the fix in two steps. the first being mostly correct, but mainly it was so that i could put the new code out on swedepop.com to test it accurately with some live servers. around 1am on the 15th that problem was tracked down to some broken code and a configuration setting not set properly.

the next step was importing. alex and i had been chatting and he asked if i’d tried importing. i know other had and it didn’t work too well or died in the process. he was busy this week so i headed in that direction. i spent a ton of time trying to import my 14600 email hula trash folder and would quickly run out of memory on my 128m servers. i tracked this down to bad python code, however i couldn’t find a memory efficient way to do it. python just doesn’t do a good job of freeing memory and returning it to the system. as i found this to be a lost cause, i decided to write my own importer to not use the filesystem mailbox stuff and just use imap. i figured this would be good since then the implementation of the underlying filestructure wouldn’t matter any more. the server would hide that from me. i extended -storetool to allow for passing in imap information. this was run on the trash bin and worked.

the live server showed its true colors yet again when it started throwing odd errors that i tracked down to calling fclose() then trying to fseek() the same handle. got that fixed so mail delivery could continue.

in the mean time gass had responded to my email to the list as to how to import lots of mailboxes. my response email crashed the server guaranteed every time it ran. this was a puzzler. it took a long time of crazy debugging in both queue and store, but i finally tracked it down to a bug in connio’s ConnWriteFile() this function shouldn’t really have been called, but because of another bug that i have yet to tackle was. in the queue, if the recipients store is local the queue, we run a queue command that says “deliver the file on the filesystem at <insertpathhere> into this user’s mailbox”. that determination is done by calling a library function that should return the ip address bongo is bound to. that function was failing. because of that, queue assumed that the store was on a remote system and used connio to connect to it and then passed the file’s contents over the connection (which calls ConnWriteFile() ). this function wasn’t properly delivering the files full contents which caused an abort since both ends of the connection were not in protocol sync anymore. (one thought it was done sending info and could run the next command and the other was waiting for more information). what a mess!

this whole error condition pointed out a missing unlink() command of a temporary file used when receiving the data from the file on the store end.

“dogs and cats living together, mass hysteria!!!” <bill murray in ghostbusters>

anyhow, most of the situation is under control and now (once i fix the ip address problem) i can move on to other stuff. i’ve got aliasing work to do (fixing it to work properly and setting it up so that you can say somegroup@domain.com => ‘user1@domain.com’, ‘user2@domain.com’,… along with that alex and i have chatted about an i idea i had thought about a long time ago but not mentioned to anyone until rprice told me MA had done this, and then alex came up with it too on his own. splitting the smtpd daemon into two pieces. one for incoming mail and one for outgoing mail.

oh one other thing on aliasing is that once we get aliasing figured out fully and implemented somewhat in the queue, we can remove the requirement that smtp listen on both queue 6 and 7. it does this because mail dropped off “must be dropped off as a remote domain if location is unknown” <paraphrased from the nmap docs> so the agent on queue6 just makes sure that the “remote domain” isn’t really a local domain and creates a whole new queue entry for it! what an amazing waste!!!! the mail just went through everything (spam, virus, any other queue agents) and now we re-create it just so it can get delivered locally. queue really could do with some optimization and by doing it we’d increase overall performance a ton!

speaking of performance, using imap sucking off my hula box i was able to insert all 14627 mails in about 43 minutes. not a bad clip considering all the processing overhead we do. i think we could improve that (along with the odd memory leak in there i mentioned in my -devel email).

more to come in a bit.