C正則庫,二為perl正則庫PCRE。相比較而言PCRE要強大些,POSIX
C正則庫就足夠使用。下面,這幾個鏈接裏面有網頁分析的壹些例子,雖然不都是C語言來處理的。但是,思路都是壹致的。
所以,說到底,還是正則表達式、正則函數庫、字符串處理函數這些才是根本。先說這些,希望對妳有所幫助。如果妳在學習過程中還有什麽問題,歡迎隨時交流:)C#的: /blog/static/1059718452009127112226478/這裏還有壹段C語言的代碼,
是將下載下來的網頁源代碼處理成沒有標簽的純文字文本#include
<tidy.h>#include
<buffio.h>#include
<stdio.h>#include
<errno.h>int
main(int
argc,
char
**argv
){const
char*
input
=
"<title>Foo</title>
Foo!";TidyBuffer
output
=
{0};TidyBuffer
errbuf
=
{0};int
rc
=
-1;Bool
ok;TidyDoc
tdoc
=
tidyCreate();
//
Initialize
"document"printf(
"Tidying:\t%s\n",
input
);ok
=
tidyOptSetBool(
tdoc,
TidyXhtmlOut,
yes
);
//
Convert
to
XHTMLif
(
ok
)rc
=
tidySetErrorBuffer(
tdoc,
&errbuf
);
//
Capture
diagnosticsif
(
rc
>=
0
)rc
=
tidyParseString(
tdoc,
input
);
//
Parse
the
inputif
(
rc
>=
0
)rc
=
tidyCleanAndRepair(
tdoc
);
//
Tidy
it
up!if
(
rc
>=
0
)rc
=
tidyRunDiagnostics(
tdoc
);
//
Kvetchif
(
rc
>
1
)
//
If
error,
force
output.rc
=
(
tidyOptSetBool(tdoc,
TidyForceOutput,
yes)
rc
:
-1
);if
(
rc
>=
0
)rc
=
tidySaveBuffer(
tdoc,
&output
);
//
Pretty
Printif
(
rc
>=
0
){if
(
rc
>
0
)printf(
"\nDiagnostics:\n\n%s",
errbuf.bp
);printf(
"\nAnd
here
is
the
result:\n\n%s",
output.bp
);}elseprintf(
"A
severe
error
(%d)
occurred.\n",
rc
);tidyBufFree(
&output
);tidyBufFree(
&errbuf
);tidyRelease(
tdoc
);return
rc;}