Official eMule-Board: Language Option In Search - Official eMule-Board

Jump to content

  • (3 Pages)
  • +
  • 1
  • 2
  • 3
  • You cannot start a new topic
  • You cannot reply to this topic

Language Option In Search Rate Topic: -----

#1 User is offline   jax4_20 

  • Newbie
  • Pip
  • Group: Members
  • Posts: 1
  • Joined: 27-January 05

Posted 27 January 2005 - 12:12 AM

i think that u guys should put a language option in the search menu
0

#2 User is offline   CiccioBastardo 

  • Doomsday Executor
  • PipPipPipPipPipPipPip
  • Group: Italian Moderators
  • Posts: 5529
  • Joined: 22-November 03

Posted 27 January 2005 - 12:52 PM

To match against...???
The problem is not the client, it's the user
0

#3 User is offline   Xhaos 

  • I Have A Splendid Member
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 945
  • Joined: 31-January 04

Posted 27 January 2005 - 01:46 PM

There is no provision to store the language of the file at the moment, so the search thingy can't distinguish between languages.
Xhaos: Burninating The Countryside
0

#4 User is offline   Akenathon 

  • EmuSnail
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 534
  • Joined: 19-September 04

Posted 27 January 2005 - 05:57 PM

How is going to know emule if a certain file is en english, romanian, dutch or bask ? if it is not wrtitten in the name?
for that reason some uploaders put [eng], [esp], etc....in their files names
user posted image
---
Abadía del Crimen - Recuerda cuando los juegos iban a a 8 bits >>
0

#5 User is offline   Devil Doll 

  • feature request writer
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 2570
  • Joined: 19-February 04

Posted 08 March 2005 - 03:41 PM

Ah, there's already a thread about this feature. So I'm going to start here instead of opening another thread...




This is going to be a feasibility study for the time being. If all goes well it, technically as well as content-wise, it will become a feature request in the end.

There are many videos in the ed2k network that are available in a lot of different languages. So eMule users are facing the challenge how to find a file in the most appropriate language without having to download each and every file of related content, thus slowing down the network with unnecessary transfers.
The only information about languages in videos (be that audio streams or subtitles) currently available is the file name. Now we all know that file names can be changed arbitrarily; what's more, many fansubbers don't care to specify the language in their file names, they rather name their own fansubber group codes there, convinced that the users of their language will know these codes from some web sites/forums of the corresponding content. But what about users of other languages? For these a fansubber group code is almost worthless, and if there is no translation of the video which is obviously in the user's primary language the best he/she can do is try to download all insufficiently labelled variants and check for themselves. Obviously this is a waste of bandwidth - I had to do this myself a lot of times, unfortunately.

eMule's internal search provides a number of parameters, such as the video codec. But it does not allow to restrict a search to one (or a list of) language(s), which would be the solution to the problem above.
This posting is an attempt to find out what would have to be done as to provide such an additional search parameter.

I don't know how the codec information (which is already available as a search parameter) is derived from the video containers. But I'm positive that it must be some ed2k client analyzing the file and extracting this kind of information, as the lugdunum servers don't have access to the file content - they only receive the information supplied by the client when publishing a share. (We'll need this fact later, see below.)
So there must already be some code to parse video containers in eMule or related client software. Let's start from here, because...

...last week I happened to stumble over the project coordinator of the Matroska project (Christian) in some German forum, and we discussed this issue there. What he told me was this:
1. The Matroska video container provides language tags for all kinds of streams, be that video, audio or subtitle streams.
2. The Matroska project planned to provide an eMule "plugin" to do exactly what I want but officially it must be done by the eMule coders, not by the Matroska project.
3. The Matroska source code is Open Source, Public Domain (some GNU type license), written in C++ and compiles in MSVC, GCC and mingw. So it might be rather compatible for use in eMule, aMule etc.
4. There is some tool named mkvinfo for extracting information of this kind from a Matroska container. This is a commandline tool but it might serve as an example program how to access said language tags in a Matroska container.
5. Currently there is no API specifically for the requirements of eMule, most likely because there is no specification for any such requirement. One result of this thread might be writing down an API specification and then asking the Matroska project to provide some DLL for this API, which would limit the amount of Matroska specific code in eMule to the absolute minimum. This way the Matroska code could be embedded into eMule in a similar way as the gzip compression code is currently embedded by importing the "zlib" library.

I don't know how important the Matroska format will be in the future, but what Christian told me during this discussion sounded encouraging. That's why I would like to start with Matroska as a "proof of concept", knowing this has to be extended to as many other container formats as possible.
I have little experience with video file containers, thus I don't know in how far language tags are available in AVI, OGM and WMV containers, leave alone all the MPEG1/2 variants (VCD/SVCD/MVCD/XVCD). If anyone has specific knowledge about this then please post it here - I will try to ask experts about these formats in case my idea would be accepted in general.

Now let's assume some Matroska language tags parser were already embedded into eMule. What else would we need?

If the information derived from this parser were to be used as parameter in a lugdunum server search then this information were to be sent to the eserver while publishing a share.
Thus the protocol between eMule and eserver would have to be extended, and the feature would only be available for combinations of a future version of eMule connected to a future version of eserver. As both products appear to have a relatively short product cycle this doesn't look to be a real problem, it is just that we would have a migration phase until the full benefit of the feature can be derived.
But what would happen in case an older version of eserver received a "new style" query including a language description? We would not want this parameter to be ignored by the eserver and the client be flooded by invalid results from eservers with an older version, in case of a "global search". I guess we need Lugdunummaster to make a statement about this problem before we can go any more into details - or maybe eMule could restrict the feature to a server search for the time being? (Or to a required minimum version number of eserver?)

There's one more protocol change necessary - the format in which eMule is sending a search request to a lugdunum server would now have to allow for some language parameter. My impression is that restricting the language-related search to exactly one language might be too restrictive, so I'd suggest using some list of values. The exact format of transferring this list would probably have to be specified by Lugdunummaster.
Because of these protocol changes there should be a thorough discussion and an exact specification of the feature; noone would want to have this feature changing throughout a number of versions, and have to support different formats. This might be done within this thread, or directly between the eMule coders and the eserver coders if they prefer it this way.

Most of the above would apply for the Kademia search as well. Kademlia's protocol would have to be extended for both the share attributes as well as the query parameters. But this could be done any time later, and independently from what I describe in this posting. Then again, if there were any problems with implementing the protocol changes between eMule and eserver, the eMule project could do it on their own, thus providing an additional feature in the Kademlia search. (In fact I don't expect any such problems, this is only for the record - it might just be nice to know that there would be a bypass around some external dependency, and the feature itself won't die in case something bad happens to the servers.)

One more missing part of the implementation: The eMule search GUI would then have to provide an additional field to enter the language(s) within the search form, and an additional column within the result window as to display the languages received within (at least) one search result (I might be searching for German and English versions of a video but prefer to download only the German version if there is any). This part seems to be relatively trivial.
I would prefer to get separate result columns for video, audio and subtitle language tags. Christian told me that language tags of video streams would probably not make that much sense but they're technically possible in Matroska - and in case of a hard-subbed video (being the only stream within a video container) this would be all we can get, as there would not be any separate subtitle stream in this case (which is what AVI containers would be like, for example, if they happen to contain language tags as well).
On the other hand I'd probably not want separate input fields for video, audio and subtitle languages. Of course this would allow for a more specific query, and it might be a good idea to specify the eMule <-> eserver protocol in a way that allows for these queries. But I don't think many users would be able to make use of such a feature, and many would rather be confused instead. As the results would display the kind of tag where the required language was detected, and the user can see this before starting a download, I guess this would suffice and be easier to use.

Ah, I almost forgot an important issue. What about the encoding to be used for languages? There is more than one way to represent a code.
IIRC the language files of eMule ar using a code where German (from Germany) is "de_DE". This appears to be what is used for languages by operating systems.
Unfortunately Matroska is using a different standard to encode languages, named ISO 639-2. Which would mean that eMule either needs to use this code to represent content languages (depending on the codes of other video containers this might be more or less reasonable, which is another reason why we should examine those container format before going too much into details), or convert between the probably more popular two-letter language codes (which might well be ISO 639-1 from the same table but I'm not sure about this) and whatever code a specific video container might happen to be using.
The eMule coders would probably know which type of input widget for language codes they'd consider most user friendly - maybe some dropdown menu to append one language code of whatever standard to the list in the search form field? I'm aware this would require some language specific translation table for the dropdown menu entries but this would not be that much different from other language specific parts of the eMule GUI.

Of course the whole feature would only make sense if the creators of the video containers tagged their videos appropriately.
But I am convinced we would be able to spread the news of the availability of a language-based search feature in one of the most popular file sharing tools rather quickly amongst them, and tagging their containers appropriately might even become a cachet for video containers then.

I probably forgot a number of aspects of the project (it it were ever to become one, that is). Any comments, additions etc. are welcome, including those who explain to me why the whole idea might not work.

This post has been edited by Devil Doll: 09 March 2005 - 11:10 AM

0

#6 User is offline   f5inet 

  • Advanced Member
  • PipPipPip
  • Group: Members
  • Posts: 62
  • Joined: 14-April 05

Posted 28 April 2005 - 02:10 PM

you are proposing that emule search/download with 'content sense', while emule only senses about files hashes...

i agree with your feature request, and i like to view that feature implemented in emule, but i think that is very dificult, almost imposible, to achieve that.

1) you are charging to the releasers to specify the languaje of the file that they are inserting
2) you push MATROSKA (that i think a marvelous container) as default video file for releasers
3) avi file is the most spreaded video format... what can we do with that files?
4) you are requesting a change in the lugdunum server and emule-kad protocol...

i think that extending a system like 'commentaries' for details about the file, we have the necesary... obviously, people can lie... but, can do you sure that the releaser dont lie about his file with your system.

i push the idea of devil-doll for a languaje search, but not in the way that he/she purpouse. i push the idea of 'extending' commentaries to set the languaje of a file/quality/codec used/subtitles.
0

#7 User is offline   Devil Doll 

  • feature request writer
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 2570
  • Joined: 19-February 04

Posted 28 April 2005 - 02:40 PM

f5inet, on Apr 28 2005, 03:10 PM, said:

you are proposing that emule search/download with 'content sense', while emule only senses about files hashes...
Search results already contain codec names, bit rates and such. Modern eMule clients can extract these informations if you enable "Extended" / "Extract meta data" / "DirectShow", and they forward this information to the lugdunum servers already (or else you couldn't used these as query conditions and receive these as search results, which you both can). There's only a small step to extract language tags from video containers.

f5inet, on Apr 28 2005, 03:10 PM, said:

1) you are charging to the releasers to specify the languaje of the file that they are inserting
And they would happily do, I believe. After all the encoders of the videos know what they're doing if they use modern containers. If eMule supported such a feature it would be easy to make the encoder communities aware of it.

f5inet, on Apr 28 2005, 03:10 PM, said:

2) you push MATROSKA (that i think a marvelous container) as default video file for releasers
Not necessarily. I just don't have enough information about AVI and OGM containers, for example; the Matroska developer I mentioned just happened to give me the information I was asking for.
Anyone more into container formats, please post information about language tags in other containers to this thread!

f5inet, on Apr 28 2005, 03:10 PM, said:

3) avi file is the most spreaded video format... what can we do with that files?
Glad you asked. Five minutes of googling led me to the specification of the Microsoft DirectShow 9.0 AVI File Format which apparently contains a "wLanguage" component. Can anyone more familar with this format please check whether that's the one we would need here? (Maybe the developer who implemented the DirectShow parser extracting codec names and stuff?)

f5inet, on Apr 28 2005, 03:10 PM, said:

4) you are requesting a change in the lugdunum server and emule-kad protocol...
I'm aware of this (although it might less of a change than you think). That's why I want to be sure that not only the Matroska format would be supported but as many formats as possible - at this stage it is too early for a feature request.
We need video container file format experts to participate in this discussion - I'm not one of them, unfortunately.
0

#8 User is offline   f5inet 

  • Advanced Member
  • PipPipPip
  • Group: Members
  • Posts: 62
  • Joined: 14-April 05

Posted 28 April 2005 - 02:50 PM

Devil Doll, on Apr 28 2005, 02:40 PM, said:

f5inet, on Apr 28 2005, 03:10 PM, said:

you are proposing that emule search/download with 'content sense', while emule only senses about files hashes...
Search results already contain codec names, bit rates and such. Modern eMule clients can extract these informations if you enable "Extended" / "Extract meta data" / "DirectShow", and they forward this information to the lugdunum servers already (or else you couldn't used these as query conditions and receive these as search results, which you both can). There's only a small step to extract language tags from video containers.
View Post

i don't know that, i apologize...

perhaps directly extracting directshow languaje data is the only necesary, don't you?

too, the releasers must be spoiled about especify the correct languaje in their avi-files...
0

#9 User is offline   CiccioBastardo 

  • Doomsday Executor
  • PipPipPipPipPipPipPip
  • Group: Italian Moderators
  • Posts: 5529
  • Joined: 22-November 03

Posted 28 April 2005 - 07:43 PM

Old files may stay as they are, with a not specified language, while the new ones can benefit from the new piece of info.
That's called evolution with backward compatibility ;)
The problem is not the client, it's the user
0

#10 User is offline   Devil Doll 

  • feature request writer
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 2570
  • Joined: 19-February 04

Posted 19 June 2005 - 01:38 AM

In eMule 0.46a the DirectShow feature has been disabled "until a more reliable way is implemented" (ChangeLog, 2005-03-23).
Let's hope that any later implementation will cover language tags as well.

If the eMule coders would want to implement anything themselves they might consider some open source implementation like MediaInfo.
0

#11 User is offline   PacoBell 

  • Professional Lurker ¬_¬ (so kyoot!)
  • PipPipPipPipPipPipPip
  • Group: Moderator
  • Posts: 7292
  • Joined: 04-February 03

Posted 19 July 2005 - 01:44 AM

Devil Doll, on Apr 28 2005, 06:40 AM, said:

Glad you asked. Five minutes of googling led me to the specification of the Microsoft DirectShow 9.0 AVI File Format which apparently contains a "wLanguage" component.
View Post
Nice find! I've come across a better general link to roughly the same information. That "wLanguage" sure is tempting, as is a number of other pieces of metadata tucked away in the AVI stream header :)
Sed quis custodiet ipsos custodes
Math is delicious!
MmMm! Mauna Loa Milk Chocolate Toffee Macadamias are little drops of Heaven ^_^
Si vis pacem, para bellum DIE SPAMMERS DIE!

#12 User is offline   Devil Doll 

  • feature request writer
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 2570
  • Joined: 19-February 04

Posted 08 August 2005 - 02:43 PM

release notes v0.46c said:

- July, 24. 2005    -
----------------------
.: Added support for MediaInfo DLL versions 0.6.1 and 0.7.x [Thx to Zenitram]
Since the detection of codec information is now back again in eMule 0.46c (after removing the old implementation in 0.46a), there's a little more hope for extracting language information as well.
0

#13 User is offline   SlugFiller 

  • The one and only master slug
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 6988
  • Joined: 15-September 02

Posted 27 November 2005 - 05:23 PM

You're suggesting using audio stream names as language identifiers. The problem is, audio stream names are often "Audio 1", "Audio 2"... In fact, a file with just one audio stream(not multi-lingual) is not likely to have a name to match the language. It might be named simply "Audio" or even some coder default like "English" regardless of the real langauge.

That said, if you do plan on trusting the stream names, it's easy to extract them using direct show. Just feed the file into the wrapper parser, and read the pin names. It's easy to tell apart audio from video using stream flags, but you could probably identify that one just by name. You could also support sub-title stream names that way.

That said, with the protocol it's pretty trivial. You just add another meta-tag. I'm not sure, but I think the protocol already allows search filtering by meta-tags.

So adding a "limit search by pin name" is possible, but don't expect it to really find what you're after.
Why haven't you clicked yet?

SlugFiller rule #1: Unsolicited PMs is the second most efficient method to piss me off.
SlugFiller rule #2: The first most efficient method is unsolicited eMails.
SlugFiller rule #3: If it started in a thread, it should end in the same thread.
SlugFiller rule #4: There is absolutely no reason to perform the same discussion twice in parallel, especially if one side is done via PM.
SlugFiller rule #5: Does it say "Group: Moderators" under my name? No? Then stop telling me about who you want to ban! I really don't care! Go bother a moderator.
SlugFiller rule #6: I can understand English, Hebrew, and a bit of Japanese(standard) and Chinese(mandarin), but if you speak to me in anything but English, do expect to be utterly ignored, at best.
0

#14 User is offline   PacoBell 

  • Professional Lurker ¬_¬ (so kyoot!)
  • PipPipPipPipPipPipPip
  • Group: Moderator
  • Posts: 7292
  • Joined: 04-February 03

Posted 28 November 2005 - 02:26 PM

Devil Doll, on Apr 28 2005, 06:40 AM, said:

Five minutes of googling led me to the specification of the Microsoft DirectShow 9.0 AVI File Format which apparently contains a "wLanguage" component. Can anyone more familar with this format please check whether that's the one we would need here? (Maybe the developer who implemented the DirectShow parser extracting codec names and stuff?)
View Post

SlugFiller, on Nov 27 2005, 09:23 AM, said:

You're suggesting using audio stream names as language identifiers.
View Post
That doesn't sound like what he's suggesting at all. I thought he was referring to the wLanguage field in the AVISTREAMHEADER struct. Still, that's still probably going to suffer from the same oblivion that the stream names currently suffer from. I doubt many encoders make use of that field for its intended purpose. AVI is such a fractured format anyway. It's amazing that it was supported this long.
Sed quis custodiet ipsos custodes
Math is delicious!
MmMm! Mauna Loa Milk Chocolate Toffee Macadamias are little drops of Heaven ^_^
Si vis pacem, para bellum DIE SPAMMERS DIE!

#15 User is offline   SlugFiller 

  • The one and only master slug
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 6988
  • Joined: 15-September 02

Posted 28 November 2005 - 07:17 PM

PacoBell said:

That doesn't sound like what he's suggesting at all.

Devil Doll said:

The Matroska video container provides language tags for all kinds of streams, be that video, audio or subtitle streams.

Devil Doll said:

I don't know in how far language tags are available in AVI, OGM and WMV containers


In any way, as anyone who has ever used DX's GraphEdit would know, every wrapper codec provides stream names, and most wrappers allow them to be custom(I believe that includes AVI). However, most movies do not have intelligible stream names. These are often restricted to multi-lingual movies, and even then it's not always present.

Yes, it's possible to add search by stream name. Shouldn't be too difficult.
However, I wouldn't bank on it having the result you'd expect.

Quote

Can anyone more familar with this format please check whether that's the one we would need here? (Maybe the developer who implemented the DirectShow parser extracting codec names and stuff?)

If you mean a parser using DirectShow, s/he wouldn't know, since DirectShow is there to avoid having to know the file formats. It abstracts anything.
If you're talking about the parser used by DirectShow, as in the codec, that would be Microsoft. Their file format, their DirectShow, their parser.

Fortunately, I can give you the answer, since DX's GraphEdit allows getting DirectShow to be a bit more verbose. While I couldn't find nothing about a "wLanguage", the streams did prove to have custom names. Here are a few:
-Stream 01
-01) Audio Track
-01) ???????(1
-Raw Audio

This post has been edited by SlugFiller: 28 November 2005 - 07:18 PM

Why haven't you clicked yet?

SlugFiller rule #1: Unsolicited PMs is the second most efficient method to piss me off.
SlugFiller rule #2: The first most efficient method is unsolicited eMails.
SlugFiller rule #3: If it started in a thread, it should end in the same thread.
SlugFiller rule #4: There is absolutely no reason to perform the same discussion twice in parallel, especially if one side is done via PM.
SlugFiller rule #5: Does it say "Group: Moderators" under my name? No? Then stop telling me about who you want to ban! I really don't care! Go bother a moderator.
SlugFiller rule #6: I can understand English, Hebrew, and a bit of Japanese(standard) and Chinese(mandarin), but if you speak to me in anything but English, do expect to be utterly ignored, at best.
0

#16 User is offline   PacoBell 

  • Professional Lurker ¬_¬ (so kyoot!)
  • PipPipPipPipPipPipPip
  • Group: Moderator
  • Posts: 7292
  • Joined: 04-February 03

Posted 28 November 2005 - 11:57 PM

SlugFiller, on Nov 28 2005, 11:17 AM, said:

PacoBell said:

That doesn't sound like what he's suggesting at all.

Devil Doll said:

The Matroska video container provides language tags for all kinds of streams, be that video, audio or subtitle streams.

Devil Doll said:

I don't know in how far language tags are available in AVI, OGM and WMV containers
Even Matroska's "language tags" are seperate from the stream names, similar to AVI's wLanguage tag. Relying on stream names alone is a crude method at best.

Quote

If you mean a parser using DirectShow, s/he wouldn't know, since DirectShow is there to avoid having to know the file formats. It abstracts anything.
If you're talking about the parser used by DirectShow, as in the codec, that would be Microsoft. Their file format, their DirectShow, their parser.
View Post
Well, it's not like Microsoft hasn't given us the exact structure of AVI format. It's a simple matter of parsing the RIFF tree for the magic word 'strh', then the 'auds' fourcc, and seeking a few bytes until you hit wLanguage. It's right after wPriority and is defined as a WORD, so I assume it'll be in ISO 639-1:2002 language code format. And it would seem that the VirtualDubMod folks agree with my assessment:

VirtualDubMod 1.5.1.1a Release Notes said:

- Updated the available Language list (Stream comments) to follow the ISO-639-1/2 standards. By default languages defined in ISO-639-1 are listed; other languages (ISO-639-2) can be accessed thanks to a checkbox. If a known language is selected the output format (e.g. OGM) will take into account the standard (English Name / 3 letters code / 2 letters code if available). (information on ISO-639 can be found at http://lcweb.loc.gov.../englangn.html)
You can still use user-defined languages (and overcome the standard) ...
When your abstractions (i.e. graphedit) leave you high and dry, you've gotta dig deeper ;) May I suggest these great tools: AVIMaster & VidTrace (the latter seems to emit a more human-readable trace)? They're a lot more "verbose" than graphedit will ever hope to be and, as a bonus, AVIMaster has this feature:

AVIMaster said:

Special feature

If you've got an eDonkey/eMule *.part and *.part.met file, avimaster file.part will extract the original file name from the accompanying file.part.met and copy all valid frames into the destination file. The first part (header) must have been downloaded already. Note: If only small parts of a large file are downloaded, it may take quite a while to scan the whole file.


One thing I learned from using VidTrace is that the wLanguage tag seems to be optional in the spec, so that's probably why it didn't show up in GraphEdit for your particular sample. HTH.

[EDIT]
Another source of language information can be found in the CSET chunk under the wLanguage & wDialect tags, although the VirtualDub guy had this to say about it:

VirtualDub Guy said:

I was disappointed to find when implementing the AVI tags that the AVI format stores text as 8-bit and doesn't have a functional way to set the character set; the CSET chunk is supposed to do this but Windows Media Player doesn't recognize it, so foreign characters don't work reliably.
Bummer.

Whoops! In my haste, I overlooked something important. Apparently, the CSET chunk applies only to the original RIFF standard and not to AVI, which although it was derived from RIFF, seems to have deprecated this part. Phooey, indeed.
[/EDIT]

This post has been edited by PacoBell: 29 November 2005 - 12:57 AM

Sed quis custodiet ipsos custodes
Math is delicious!
MmMm! Mauna Loa Milk Chocolate Toffee Macadamias are little drops of Heaven ^_^
Si vis pacem, para bellum DIE SPAMMERS DIE!

#17 User is offline   asturcon3 

  • Miembro con bola
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 7459
  • Joined: 26-July 04

Posted 29 November 2005 - 12:22 AM

I see this issue again and again over the years (well, not so many years), with no progress...

A crazy, but simple idea. What about every mule publishing their own GUI language setting? Ok, I know, maybe you like films in a foreign language, and English films are being shared all over the world, but I think that statistically the most 'voted' language for a hash should be significant. Most people like their mule in its own language, and most users share content its own language.

Crazy? :unsure: Easy, it seems to be.

This post has been edited by asturcon3: 29 November 2005 - 12:24 AM

0

#18 User is offline   PacoBell 

  • Professional Lurker ¬_¬ (so kyoot!)
  • PipPipPipPipPipPipPip
  • Group: Moderator
  • Posts: 7292
  • Joined: 04-February 03

Posted 29 November 2005 - 12:39 AM

asturcon3, on Nov 28 2005, 04:22 PM, said:

Most people like their mule in its own language, and most users share content its own language.
View Post
...unless you're into anime where the language can be japanese, korean, or even english! That's why we have subtitles. Now, parsing the RIFF structure won't pick up on hardcoded subs, but if the proper subtitle chunk format is adhered to, it will be detected as such. Unfortunately, the vast majority of encoders (and I'm referring to the programs, not the people) don't strictly follow the set standards, so they end up breaking many rigidly compliant parsers out there. Even VirtualDub had to accomodate a few "quirks" in order to get it to work with most AVI files in the wild.
Sed quis custodiet ipsos custodes
Math is delicious!
MmMm! Mauna Loa Milk Chocolate Toffee Macadamias are little drops of Heaven ^_^
Si vis pacem, para bellum DIE SPAMMERS DIE!

#19 User is offline   SlugFiller 

  • The one and only master slug
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 6988
  • Joined: 15-September 02

Posted 29 November 2005 - 09:35 PM

Quote

One thing I learned from using VidTrace is that the wLanguage tag seems to be optional in the spec

I myself, was suspecting it was reserved, from MS's specifications(it was always "0" in their examples). Let me know if you find an AVI with one set.

Quote

for your particular sample

Samples. I looked at around 20 files. Most seem to call the streams "Stream 00" for video and "Stream 01" for audio. Must be encoder default.
Let me know how the language tag fares.

Quote

I think that statistically the most 'voted' language for a hash should be significant.

As you've mentioned, you could get pretty random results with movies, not to mention anime.
If you have country flags, you can check out your sources, though I've found that they rarely match the language.
Why haven't you clicked yet?

SlugFiller rule #1: Unsolicited PMs is the second most efficient method to piss me off.
SlugFiller rule #2: The first most efficient method is unsolicited eMails.
SlugFiller rule #3: If it started in a thread, it should end in the same thread.
SlugFiller rule #4: There is absolutely no reason to perform the same discussion twice in parallel, especially if one side is done via PM.
SlugFiller rule #5: Does it say "Group: Moderators" under my name? No? Then stop telling me about who you want to ban! I really don't care! Go bother a moderator.
SlugFiller rule #6: I can understand English, Hebrew, and a bit of Japanese(standard) and Chinese(mandarin), but if you speak to me in anything but English, do expect to be utterly ignored, at best.
0

#20 User is offline   xylo9 

  • Golden eMule
  • PipPipPipPipPipPipPip
  • Group: Members
  • Posts: 681
  • Joined: 17-October 02

Posted 30 November 2005 - 03:51 AM

I'm sorry I didn't read all the posts, but here's an idea:

Emule can just add a bunch of footer bytes at the end of the file saying what the language is (and other data). It's quite possible this will not break the file in any way, and will simply be ignored by the applicaiton which reads the file. But emule will be able to read this info. (I.e. add bytes to the end, and change the file's size, without worrying about the file format specs at all.)

While hashing the file, emule could exclude those bytes from the calculation, such that files with the footer and without will be properly recognized as the same file. (also chopped off from the file size info which helps to identify the file, for backward compatibility, but with a new info added to the protocol saying if there's an emule extension and how many bytes).

It's a hack. But there's a chance it will work for many file types, and will keep backward compatibility and support all sloppy releasers who will never add the language info in the proper DirectShow var (they would need a special application to put that info into their file; current apps don't do that, I think).

How I see it:
I download a video file that has no language info. I add that info from within emule. The info is sent with other metadata of the file to the server, so other users will see it when they search. If the releaser is not an idiot, he would do the same in emule when releasing his file.

There's no fear that this data will be lost or cut off when the file is read, and no one will write/re-write this file except emule and the original releaser. The only question is whether this will break the file playback. I even thought how emule can quickly know if the file has a special footer or not: place a special code at the very end of the file, preceded by the number of bytes of the footer, then read and interpret the footer the way you would a header. So it can be a variable-sized footer for future expansion. :)

I might be totally off track. Maybe it's worth investigating.

This post has been edited by xylo9: 30 November 2005 - 04:15 AM

0

  • (3 Pages)
  • +
  • 1
  • 2
  • 3
  • You cannot start a new topic
  • You cannot reply to this topic

1 User(s) are reading this topic
0 members, 1 guests, 0 anonymous users