High Load Errors are annoying...

Practice posts and questions about the boards. The registration code for this board is 'Th3G@m|ngD3n' (Note the use of numbers and symbols!)

Moderator: Moderators

User avatar
Murtak
Duke
Posts: 1577
Joined: Fri Mar 07, 2008 7:54 pm

Re: High Load Errors are annoying...

Post by Murtak »


Ok, then I will take a look :)
Murtak
User avatar
Zherog
Knight-Baron
Posts: 910
Joined: Fri Mar 07, 2008 7:54 pm

Re: High Load Errors are annoying...

Post by Zherog »

Murtak at [unixtime wrote:1200557659[/unixtime]]
But since text is yucky you do not store the user as "Maj" but instead store Maj's unique number, say 104.


You're assuming two things:

First, that the use a database rather than text files. I think that's a safe assumption, but it's still an assumption.

Second, you're assuming they have a normalized database. I think that's a less safe assumption.
You can't fix stupid.

"A life is not important except in the impact it has on other lives." ~ Jackie Robinson
User avatar
Murtak
Duke
Posts: 1577
Joined: Fri Mar 07, 2008 7:54 pm

Re: High Load Errors are annoying...

Post by Murtak »


True. But whatever form the data is in, it's still just text. The only trouble I am likely to run into is undocumented fields with cryptic values (say, a field named 'xplus' which denotes user status with 'b' being admin and '8' being a normal user). I am much more worried about encodings.

Are there any other boards you are interested in?
Murtak
User avatar
Zherog
Knight-Baron
Posts: 910
Joined: Fri Mar 07, 2008 7:54 pm

Re: High Load Errors are annoying...

Post by Zherog »

phpBB is, in my opinion, the best looking free software out there.
You can't fix stupid.

"A life is not important except in the impact it has on other lives." ~ Jackie Robinson
User avatar
Murtak
Duke
Posts: 1577
Joined: Fri Mar 07, 2008 7:54 pm

Re: High Load Errors are annoying...

Post by Murtak »

I downloaded and installed the phpbb stack and am trying to decipher it's table structure. So far it does not look too bad. Does anyone have a bbb backup for me to chew on?
Murtak
User avatar
fbmf
The Great Fence Builder
Posts: 2590
Joined: Fri Mar 07, 2008 7:54 pm

Re: High Load Errors are annoying...

Post by fbmf »

Wish I did. My computer guru could not make it to Ft. Worth this weekend. Next weekend perhaps. Shit.

Game On,
fbmf
Surgo
Duke
Posts: 1924
Joined: Fri Mar 07, 2008 7:54 pm

Re: High Load Errors are annoying...

Post by Surgo »

Murtak: go to the bbboy sample board; you can download a backup from it.
User avatar
Murtak
Duke
Posts: 1577
Joined: Fri Mar 07, 2008 7:54 pm

Re: High Load Errors are annoying...

Post by Murtak »


Just downloaded the sample backup. I'm going to try again later, but either my backup is corrupted or they are using encryption or some weird compression format.
Murtak
User avatar
Murtak
Duke
Posts: 1577
Joined: Fri Mar 07, 2008 7:54 pm

Re: High Load Errors are annoying...

Post by Murtak »

So I had another go over the weekend and I still can't get anything even losely resembling readable text from the sample board backup.

Does anyone have a readable file for me to work on? I don't care whether it's bogus data, but what I get from the sample board won't work.
Murtak
Aycarus
Journeyman
Posts: 110
Joined: Fri Mar 07, 2008 7:54 pm

Re: High Load Errors are annoying...

Post by Aycarus »

Is it possible to parse the forums / messages directly from the HTML (that is, create a spider that reads every thread/page and saves the data into some sort of flatfile)? This would result in whispers and PMs being lost, but it is an option.
User avatar
Maj
Prince
Posts: 4705
Joined: Fri Mar 07, 2008 7:54 pm
Location: Shelton, Washington, USA

Re: High Load Errors are annoying...

Post by Maj »

I've had a programmer friend also take a look at the file, and he said it's undecipherable.

I think what we're going to have to do is upgrade to the version 2 of the boards (in theory, you get a month free), and then take a look at the databases.

What a pain in the ass.
My son makes me laugh. Maybe he'll make you laugh, too.
User avatar
Murtak
Duke
Posts: 1577
Joined: Fri Mar 07, 2008 7:54 pm

Re: High Load Errors are annoying...

Post by Murtak »

Aycarus at [unixtime wrote:1202014450[/unixtime]]Is it possible to parse the forums / messages directly from the HTML (that is, create a spider that reads every thread/page and saves the data into some sort of flatfile)? This would result in whispers and PMs being lost, but it is an option.

Possible, yes.

However with the unbelievably bad markup on these boards (HTML Tidy gives me 190 warnings on this single page) parsing it for contents is going to be a nightmare.
Murtak
Aycarus
Journeyman
Posts: 110
Joined: Fri Mar 07, 2008 7:54 pm

Re: High Load Errors are annoying...

Post by Aycarus »

Murtak at [unixtime wrote:1202027774[/unixtime]]
Aycarus at [unixtime wrote:1202014450[/unixtime]]Is it possible to parse the forums / messages directly from the HTML (that is, create a spider that reads every thread/page and saves the data into some sort of flatfile)? This would result in whispers and PMs being lost, but it is an option.

Possible, yes.

However with the unbelievably bad markup on these boards (HTML Tidy gives me 190 warnings on this single page) parsing it for contents is going to be a nightmare.


It's doable. If you look at "view source" it's actually quite structured. I'm nearly certain I can write a parser that will take the boards and convert it to a flatfile DB - if you can take the flatfile DB and convert it to whatever phpbb (or whatever) needs. Let me know.
User avatar
Murtak
Duke
Posts: 1577
Joined: Fri Mar 07, 2008 7:54 pm

Re: High Load Errors are annoying...

Post by Murtak »


Well, if you can give me some sort of structured data I can give it a try. However I haven't written any spiders yet. Have you?

Oh, and if possible it would be great if you could put your text files in YAML format (like so:)

Code: Select all

[br]post1:[br]  username:"Murtak"[br]  text: "blabla"[br]

Murtak
Aycarus
Journeyman
Posts: 110
Joined: Fri Mar 07, 2008 7:54 pm

Re: High Load Errors are annoying...

Post by Aycarus »

I'd prefer XML myself since then you don't have collisions with quotation marks... or some sort of hybrid. Is the following format okay?

Code: Select all

[br]post1:[br]  username: "Murtak"[br]  date:  "10:36:11 Sun Feb 3 2008"[br]  subject: "Re: High Load Errors are annoying..."[br]  post: <POSTTEXT>[post text]</POSTTEXT>[br]
User avatar
Murtak
Duke
Posts: 1577
Joined: Fri Mar 07, 2008 7:54 pm

Re: High Load Errors are annoying...

Post by Murtak »

That should be fine.
Murtak
Aycarus
Journeyman
Posts: 110
Joined: Fri Mar 07, 2008 7:54 pm

Re: High Load Errors are annoying...

Post by Aycarus »

Proof of concept:
BBBoyParser.cpp

Compile this program using the g++ command line

g++ -o BBBoyParser BBBoyParser.cpp

The parser takes a BBBoy .html file as input and outputs a "parsed" file in the aforementioned format. Not thoroughly tested, but it worked on at least one test page.
Aycarus
Journeyman
Posts: 110
Joined: Fri Mar 07, 2008 7:54 pm

Re: High Load Errors are annoying...

Post by Aycarus »

Does anybody know if one can configure their user_cp to display all pages of a thread or all threads of a forum on a single page? i.e. without having to click through multiple pages of the thread or forum?
Jacob_Orlove
Knight
Posts: 456
Joined: Fri Mar 07, 2008 7:54 pm

Re: High Load Errors are annoying...

Post by Jacob_Orlove »

I couldn't find anything to allow that, but it should be possible for an admin to set the # posts/page to a much higher number, which would do the trick for all but a few threads.
User avatar
Maj
Prince
Posts: 4705
Joined: Fri Mar 07, 2008 7:54 pm
Location: Shelton, Washington, USA

Re: High Load Errors are annoying...

Post by Maj »

There's a limit to the number that an admin can set (it's 50, I think, but I'm not positive of that). The more posts/page, though, the more likely you are to encounter the high load errors, according to support.
My son makes me laugh. Maybe he'll make you laugh, too.
User avatar
Crissa
King
Posts: 6720
Joined: Fri Mar 07, 2008 7:54 pm
Location: Santa Cruz

Re: High Load Errors are annoying...

Post by Crissa »

I've learned three things from this thread...

...We still don't know why we use 'more' cpu time...
...tzor's mother did unspeakable things to him as a child...
...And bbboy must've lost their coders in the web crunch.

-Crissa
Aycarus
Journeyman
Posts: 110
Joined: Fri Mar 07, 2008 7:54 pm

Re: High Load Errors are annoying...

Post by Aycarus »

Seems I had to do the extra work and write the spider to take into account multiple pages on threads. So... this is what we can do:

- Spider all the HTML on nifty [done-ish]
- Run HTML => Flatfile parser [done - need a script to automate this]
- Run Flatfile DB => PHPbb DB [in progress]

I think it's totally manageable, and best of all, free. Tho... anyone else feel kinda treasonous for discussing the idea here?
User avatar
tzor
Prince
Posts: 4266
Joined: Fri Mar 07, 2008 7:54 pm

Re: High Load Errors are annoying...

Post by tzor »

Crissa at [unixtime wrote:1202131029[/unixtime]]...We still don't know why we use 'more' cpu time...
...tzor's mother did unspeakable things to him as a child...
...And bbboy must've lost their coders in the web crunch.


...We still don't know if the cpu thing is true or just a vanillia lie
...well let's not speak about them, OK?
...they were outsourced to India during the outsourcing rush
User avatar
Zherog
Knight-Baron
Posts: 910
Joined: Fri Mar 07, 2008 7:54 pm

Re: High Load Errors are annoying...

Post by Zherog »

Aycarus at [unixtime wrote:1202132157[/unixtime]]Tho... anyone else feel kinda treasonous for discussing the idea here?


No, not in the least. Their software sucks ass; their support sucks more ass; I don't mind telling them to their face (I did, but they opted to delete it and give me warning), so I sure as hell don't mind saying it here.

As for spiders and such... I'll fully admit I know jack shit about html, xml, and so on. Wanna know about Oracle databases or Oracle Applications? Good chance I can help you. Wanna know about alphabet soup mark-up languages? No clue here.

Our current working theory over on Nifty is to convert to BbSuckass v2; that version uses an actual MySQL database, unlike the current BbSuckass version. Once we have the forums in a MySQL database, in theory it shouldn't be difficult to extract the data and insert it into phpBB (or another free forum package). The downside, as Maj said, is we'd have to write, test, and implement the conversion scripts in a month.

Maj's programmer friend she mentioned is helping out with the conversion. I'll be sure he gets a look at your crawler.
You can't fix stupid.

"A life is not important except in the impact it has on other lives." ~ Jackie Robinson
Aycarus
Journeyman
Posts: 110
Joined: Fri Mar 07, 2008 7:54 pm

Re: High Load Errors are annoying...

Post by Aycarus »

Zherog at [unixtime wrote:1202139684[/unixtime]]
Our current working theory over on Nifty is to convert to BbSuckass v2; that version uses an actual MySQL database, unlike the current BbSuckass version. Once we have the forums in a MySQL database, in theory it shouldn't be difficult to extract the data and insert it into phpBB (or another free forum package). The downside, as Maj said, is we'd have to write, test, and implement the conversion scripts in a month.


Inevitably their database formats will be different, which will probably be a pain in the ass when it comes to converting between the two. You'll also still have to go through the trouble of modifying the BBcode itself due to inconsistencies between the formatting. As a whole, it should be fun! :biggrin:

How much are you hoping to salvage, anyway? Converting the messages themselves is not too big of a problem... whispers will be essentially impossible... PMs are doable, but will require some thought.
Post Reply