<< Microsoft Workaround for Stale Reference in VS2005 | Home | Bypassing a Proxy with HttpWebRequest >>

.NET C# and Com Interoperability

posted @ Wednesday, September 14, 2005 1:19 PM

I've recently been using DotLucene at work to implement a search engine for a client's web site. Recommended by Steve Eichert, I've really enjoyed working with it, and I've only scratched the surface of what it can do. One of the things I need to do for this search engine is index some binary files like PDF and DOC. Most examples use the IFilter objects used by Windows Indexing Service to parse these files (you have to download the free Adobe IFilter for PDFs to parse them). You then need to use .Net Interop services to access the COM object that will return to you the correct IFilter for whatever file type you are parsing. I found some great code for using IFilters in a Desktop Search example on CodeProject. This works great if your indexing routine has access to the the files on the file system.

The problem I've run into is that the files I need to parse are stored in a SQL Server database. Using the same COM object, query.dll, there is a method BindIFilterFromStream that is supposed to take a stream and retrieve the appropriate IFilter for you, in the same way that LoadIFilter does for a file on the file system. The problem is, I am no Interop expert, and I can't get this method to run properly for me. Mattias Sjogren, a .net Interop guru, was kind enough to give me some pointers on how to properly pass parameters to the method. It required copying a byte[] to the native heap, and converting it to a UCOMIStream object. It was actually pretty simple (assuming I'm right - this part seems to work):

// copy stream to byte array
byte[] b = new byte[stream.Length];
stream.Read(b, 0, b.Length);
// allocate space on the native heap
IntPtr nativePtr = Marshal.AllocHGlobal(b.Length);
// copy byte array to native heap
Marshal.Copy(b, 0, nativePtr, b.Length);
// Create a UCOMIStream from the allocated memory
UCOMIStream comStream;
CreateStreamOnHGlobal(nativePtr, true, out comStream);

The problem is, when I finally call the BindIFilterToStream method, I get a result code back of -2147467259 and a Win32 error 127 ("The specified procedure could not be found.” ). I don't know what procedure can't be found, but since I'm getting a result code (and have gotten other error messages from BindIFilterToStream previously) I don't think it's not finding the BindIFilterFromStream method. Again, not being an Interop expert (or even having USED interop previously) I'm stumped. I emailed Mattias again, risking being a nuissance, and am hoping he can help. If I get this working, I'll definitely post about it because I can't find ANY examples of someone calling BindIFilterFromStream from managed C# code. I've been googling for days. I feel like it is something really simple that is just eluding me due to my lack of experience with this stuff. If you have any suggestions, please let me know!

Comments

  1. Anonymous

    Posted on: 5/8/2006 12:19 PM

    # .NET C# and Com Interoperability

    Brian,

    I would also like to get to the bottom of using bindifilterfromstream.

    For me it is mostly the security implications of having to save the files to disk in order to index them that is bothering me. Performance gains would be a nice added bonus.

    Since this is a rather old post, I must first ask if you have managed to find a solution yet? If not, we could perhaps cooperate a little, as there does not seem to be that many out there interested in this.

    I disagree with the conclusion that this has anything to do with lacking support in the ifilters (Sjögren), and I also think that "The specified procedure could not be found." is relevant. The problem must be that the method is not getting the parameters it is expecting, and I suspect it is passing the stream object that really is the culprit.

    I found a gmail address on you in one of the posts. Do you still use it? If you are interested in cooperating, just respond to this, and I will try to get in touch.

    I am also rather new to interop, but after 4 days of trying to solve this, I am slowly beginning to gain some insight. In the end, I think it is up to the guys at MS to uncover the problem, but I guess that the more people showing and interest, the higher are the chances we could get them to help us.

    Regards,

    Jo

  2. Brian

    Posted on: 5/8/2006 1:06 PM

    # .NET C# and Com Interoperability

    Hi Jo,

    This was a while back, so my memory is a little cloudy, but I spent a few hours with a Microsoft tech support person (one knowledgeable on interop) and we did some tests that drove us to conclude that the Adobe's IFilter did not support the bindifilterfromstream interface. I can't remember the exact steps we took to rule it out, but I remember it being pretty obvious that it worked for other IFilters such as word documents, but failed for PDFs.

    I don't have much time to work on it, as that project is over now, but I'd be happy to bounce ideas back and forth with you if you like.

    You can email me at [my first name] at pigeonmoon dot com

  3. Anonymous

    Posted on: 5/10/2006 10:00 AM

    # .NET C# and Com Interoperability

    Thanks! I will drop you a mail if/when I get a little further with my investigation.

    For the record (if someone else should stumble upon the same problem): The Adobe PDF Ifilter does not seem to support IPersisStream and IPersistStorage, but only IPersistFile, at least according to Citeknet Ifilter explorer.

    The office filter (offilt.dll) does support these interfaces, but if the IFilter Explorer of Citeknet is right, the biggest problem would be that the nullfilter (in query.dll) does not even support these intefaces, which would obviously make our attempts pointless.

    That would be highly surprising, given that Microsoft's own guidelines for ifilter development clearly state that these interfaces should be included. It logically follows that also SQL Server Full-text indexing then also has to write ntext and blob content to file before it can index it. Since this is the most inefficient approach they could possibly choose, I refuse to believe that to be the case before I hear it from someone at the SQL Server team.

    The error returned in the current implementation is E_FAIL Unspecified error (0x80004005), which does not tell me very much about what the actual problem is. As you pointed out, the same error is returned regardless of how you pass in the parameters (by ref or value).

    Hence, I still suspect that the real problem is the conversion from stream to IStream, and I am trying to find out if an IStream wrapper class could be used instead. I have found several examples of how an IStream is converted to a memory stream (i.e. getting a stream from unmanaged code), but I can't find any on how it should be done the other way around - other than the method added by Sjögren, which I suspect is only half the story.

    This became a little longer than intended, but there you have it.

  4. Levi

    Posted on: 7/6/2006 2:58 PM

    # .NET C# and Com Interoperability

    Brian,

    Believe it or not, you may be the de-facto Internet expert on this subject; your project is the only one that I've seen that tries to work with IFilters in this manner. I've been having similar problems with BindIFilterFromStream, except I haven't been able to get it to work with any IFilter. I have a feeling I'm not passing parameters to BindIFilterFromStream correctly. Like Jo, it seems no matter what combination of ref/out/value parameters I try, BindIFilterFromStream always returns E_FAIL.

    For me, it's not a big deal that Adobe's IFilter doesn't support IPersistStream; I'm just trying to get it to work with Microsoft's IFilters. Further, I plan on making a web page with what works, because it seems no one else has tried.

    Here's what I've tried:
    [DllImport("query.dll", CharSet = CharSet.Unicode, SetLastError=true)]
    private extern static int BindIFilterFromStream(IStream istream, [MarshalAs(UnmanagedType.IUnknown)] object pUnkOuter, out IFilter ppIUnk);

    Did you ever get BindIFilterFromStream to work with any IFilter? Do you mind if I contact you through email on this subject?

  5. Frank

    Posted on: 2/13/2008 11:40 AM

    # re: .NET C# and Com Interoperability

    Hi Brian,

    Can you post the full code somewhere or to my email address?
    I can use IPersistFile to read text from files. But I cannot have IPersistStream work. Even for .doc/.html document.

    Thanks

    Frank

  6. Frank

    Posted on: 2/13/2008 12:38 PM

    # re: .NET C# and Com Interoperability

    Hmm, I had it work on some extension such as CHM by using citeknet's IFilter.

    But cannot build to others such as .doc, .xls, .html.. Strange

Your comment:



 (will not be displayed)


  Please add 6 and 6 and type the answer here: