
Version History
Version 3.2 (22nd September, 2002) [
Click here to download ]
* BUG FIXES:
- Fixed a couple of bugs which affected the correct reloading of pattern matching settings when DOS style
patterns were being used.
- Fixed a bug which could cause a corrupt prefs file to be written when running on MacOS X 10.2 (Jaguar).
- In previous versions PageSucker removed multiple consecutive occurrences of slashes at the end of URLs.
This could create problems for certain Web servers, and thus multiple trailing slashes will now no longer
be removed.
- Worked around a problem with Apple's Java 1.3.1 update 1 on MacOS X, which made it impossible
or very difficult to enter proxy authentication information.
- Worked around another problem with Apple's Java 1.3.1 update 1 on MacOS X, which could cause the
control window's menus to get deactivated when the "end of download" dialog was dismissed by hitting the
"Return" key while the log window was active.
- Enhanced the log window such as to only show the last 200 lines of log output. Previously the number of
lines shown was unlimited, which could lead to memory shortages upon long downloads.
- Made sure that the log window shows newly added text on Windows machines. Previously, the log window
would remained scrolled all the way up on Windows.
- A relative URL in a redirected page will now be correctly interpreted relative to the redirected URL.
Previously it would be interpreted relative to the original URL, which could lead to incomplete downloads,
error messages or even infinite loops.
- Fixed a couple of bugs which could cause mysterious "File Not Found" messages to appear when the
"Complement Existing File" mode was used and PageSucker encountered name clashes on the site being
downloaded. Due to this bug, multiple threads could attempt to write to a given local file at the same time,
which would cause a "FileNotFoundException" message to be displayed.
- Previously, PageSucker's JavaScript interpretation would handle a single dot as a potential URL, which
would corrupt a downloaded page when a single dot was used with some other meaning inside that page.
This has been changed, so that single dots inside blocks of JavaScript are now always left alone.
- Corrected a bug which could cause problems when downloading a JavaScript include file the URL of
which does not have an extension. Such a file would incorrectly be treated like an HTML file.
- Fixed a bug which would cause certain %xx encoded characters (such as ampersands) in a URL to be
decoded, which could lead to the impossibility to correctly download certain pages.
- In previous releases, initial whitespace inside a quoted string would be moved out of the string, to
show up before the quote character. Now, inital and trailing whitespace inside quoted strings will simply
be removed.
* FEATURE ENHANCEMENTS:
- Added support for the most common cases of cascading stylesheets (CSS).
- Added two options to delete unused empty files, and/or incompletely downloaded files when a download
process is interrupted by the user.
- Enabled the "Recognize unterminated quoted strings" option in the Miscellaneous settings window by
default, as this option makes PageSucker more tolerant toward an HTML error which is very common
in today's Web pages.
- The "Parse HTML pages not in hierarchy" option in the HTML Files settings window is now also available
in the demo version of PageSucker. In previous versions, this option was reserved for registered users,
but this restriction turned out to be too severe considering the structure of many modern Web sites.
Version 3.1.2 (MacOS X only) (10th March, 2002)
* BUG FIXES:
- Worked around some problems with Apple's Java 1.3.1 update 1 on MacOS X,
in particular refresh problems of the main window background, and trouble
with the registration dialog (impossibility to enter the registration number
and occasional layout problems).
Version 3.1.1 (17th February, 2002)
* BUG FIXES:
- Corrected a bug which would cause PageSucker to get stuck when
parsing certain pages which included JavaScript code.
- HTML tag names' case will no longer be changed. Previous versions
would convert all tag names to uppercase.
- The special tag <?xml?> which marks the start of an XHTML file
will no longer be turned into an incorrect <XML?> tag.
- Password authentication will now work even if the link to the
protected document is redirected to a new location by the server,
provided that the authentication was entered as pertaining to the domain
after redirection.
- Fixed a bug in the log window when running under JDK 1.4 on Windows;
line breaks were not always recognized correctly.
- Worked around several window refresh problems when running under JDK
1.4.
Version 3.1 (20th January, 2002)
* BUG FIXES:
- Fixed a bug in the handling of RAM and M3U files: those special files were
not scanned for included URLs unless the ".ram", resp. ".m3u" file types were
checked to be downloaded in the file types settings dialog. This has been
corrected.
- Added support for ISO-Latin-1 entities (" & etc.) in URLs. Such
encoded characters are now transformed into the standard URL encoded form
(%xx).
- Illegal characters (such as spaces) detected in URLs are now correctly
encoded as %xx when accessing remote files. Not encoding such characters could
lead to "Page Not Found" errors in past versions.
- Slightly improved the JavaScript parser to better handle comments inside
JavaScript blocks.
- Corrected a bug which caused an error message to be displayed when the
File Types window or the Authentication window were closed by pressing the
Escape key.
- In previous versions PageSucker removed multiple consecutive occurrences
of slashes in URLs which were detected. While this was not really a bug, it
might still have lead to problems in certain cases, and thus multiple slashes
will now be left alone.
- Corrected a bug which could cause numerous "File Not Found" error messages
to be displayed when using the "Complement incomplete files" mode.
- A top level "index.html" referring to the start page will now only be
created if the start page is actually of HTML type.
- Removed the message "Assuming xxx file to be complete" which was logged
when using "complement" mode when an already complete file was found.
- Removed a useless message about a downloaded file possibly being
incomplete if the connection was interrupted on the user's request and if the
concerned file is a temporary file which will be removed at the end of the
download anyway.
* FEATURE ENHANCEMENTS:
- Added support for MacOS X (10.1 or higher).
- Double-clicking a settings file on Macintosh will now cause PageSucker to
load it. (On MacOS X this currently only works if the application has already
been launched).
- Added an improved, application controlled log window for all platforms.
Previously, a dedicated log window was available only when running on MacOS
Classic. On other platforms a standard shell window was used instead.
- When the application starts up, the base URL field now has its text
preselected, so that a new URL can be typed without the need to first select
the sample URL text.
- In the file type settings window, when a type is added, it is now
automatically highlighted in the table.
- Added support for a new style of registration keys ("PSxxxx-xxxxx") which
are generated when purchasing PageSucker via the eSellerate online store.
- Message dialogs and the registration dialog can now be dismissed by
pressing the Return key (for Ok) and the Escape key (for Cancel).
Version 3.0.1 (5th August, 2001)
* BUG FIXES:
- Fixed an obscure bug which could cause certain tags to go unnoticed if the preceding tag ended
with a % symbol, e.g. "<TD WIDTH=50%>". This could cause some links to get skipped.
- Corrected a problem with blocks of JavaScript code that have not been commented out with standard
HTML comment tags: such blocks would be parsed as HTML, which could lead to unexpected results.
- When skipping JavaScript code that includes strings which contain escaped quote characters
(quote marks preceded by a backslash), PageSucker will no longer get confused.
- Fixed a bug related to URLs which feature a filepath in the query string (after the question mark
symbol), e.g. "http://www.example.com/show.cgi?pictures/new/house.jpg". This kind of URL would
not download correctly and could produce "File Not Found" error messages, and in some cases
could even lead to endless loops.
- Fixed a problem when the "modify filename extensions" was used in combination with URLs that
have no filename extensions by default, such as "http://www.example.com/forum/thread/1234".
Attempting to download such a URL would fail and only result in a cryptic error message being
written to the log window ("java.lang.NullpointerException" etc.)
- Fixed a bug which prevented the "Check For New Release" feature from working correctly. A new
release would be detected, but it could not be downloaded automatically with PageSucker.
- Corrected a minor cosmetic bug in the dialog shown on startup when PageSucker requires a
newer Java engine than the one currently installed.
* FEATURE ENHANCEMENTS:
- Added the "Allocated Memory" status display to the main window.
Version 3.0 (13th May, 2001)
* BUG FIXES:
- Domain names in URLs are now always treated in a case insensitive way. Before, PageSucker would consider "www.domain.com" to be different from "www.DOMAIN.com", which is not the case.
- If URLs are set to be considered case insensitive (via the option in the Miscellaneous window), all URL paths are now automatically converted to lowercase, so that base hierarchy checks etc. are also affected by that option. Before, only the "known URL table" (used to remember encountered URLs) would be affected.
- JavaScript strings which look like they might be a URL but which turn out not to be one after a failed connection attempt will now be remembered, so that no second connection attempt is made for those strings. This does use up more memory, but it can considerably speed up downloads when JavaScript support is enabled.
- Worked around a cosmetic problem with text fields in certain dialogs when running under MacOS 9.1.
- Fixed a bug which would cause links to an already downloaded file not to be modified such as to point to the local file if these links were at the limit of the recursion depth.
- Corrected a bug which would cause the "Don't allow '>' in HTML strings" option not to be taken into account in a specific case.
- Corrected a minor bug which cause error messages not to be recorded correctly in the log file.
- Fixed a bug which caused open connection not to be closed correctly when there was a problem accessing the remote file (e.g. when the page could not be found). Leaving connections open that way could cause the operating system to freeze or crash during large downloads.
- Fixed a bug which would cause the creation of a directory on the local disk to fail if a file with the same name already existed. The directory will now be renamed in order to be created.
- Corrected a bug which could in some cases lead to an endless loop with pages that contained indirect links to themselves, if the URLs used contained ".." relative links.
- Corrected the handling of URLs containing "./" path segments. Such segments are now quietly ignored as they add no meaning to the URL and could create problems depending on the version of the Java virtual machine being used.
- Proxy settings are now also honored when doing a "New Release Check" - previously this operation failed when attempted from behind a proxy server or firewall.
- When accessing RAM and M3U files in "shortcircuit mode", the proxy settings were not taken into account. This has been fixed.
- Fixed a minor bug in saving settings files, which could lead to problems if characters with a Unicode value > 255 were used in site authentication strings (usernames or passwords).
- Fixed a bug which would case PageSucker to crash on startup if there was a problem reading the default settings file.
- Fixed a bug which could in certain cases cause a download to start even if the "Cancel" button was pressed in the file dialog asking for the save directory.
- Fixed a bug which caused RAM or M3U file short-circuiting to only work once for a given URL. If the same URL was found a second time, it was ignored.
* FEATURE ENHANCEMENTS:
- The Windows version is now distributed with an (un)installer.
- Added support for background images in tables (for <TABLE>, <TR>, <TD> and <TH> tags).
- Added support for internal frames (defined via the <IFRAME> tag).
- Added the "Diagnostic Logging" option, which allows the tracking down of filter configuration problems by logging the reasons why certain URLs were not saved or parsed.
- Added a "favorite settings" menu, which lists all the settings files contained in a certain folder, for quick access.
- The HTTP "referer" request header field will now be sent with each page request. This information is needed by some servers in order to send the correct page.
- Added the ability to modify filename extensions on the fly while downloading files.
- Added support for MIME type recognition to identify download files' types more accurately.
- Optimized internal thread handling, which should result in a more efficient usage of system resources.
- Added support for IRIX filesystems (SGI workstations), so that long filenames are used on that platform, as appropriate for a UNIX environment.
- Added timestamps to log file entries.
- Improved the thread status list detail to show which connections are in the process of being opened and closed.
- Added the Pattern Matching Settings dialog, which gives more control over the use of regular expressions to decide if a URL should be parsed, saved, or ignored altogether.
- Added support for standard DOS pattern matching alongside the Perl 5 regular expression support already present in previous releases.
- Added an improved error handling mechanism: when a downloaded file appears to be incomplete, PageSucker can now be set to automatically try to complement it.
- Added the "complement file, if incomplete" option to the Local File Settings window. This allows to restart interrupted downloads, thereby completing files which had previously been downloaded incompletely. Also removed the "don't download and keep original URL" option, as it was not of much use.
- Added the "consider linked data files to be part of their host page" option.
- Added support for files embedded via the <EMBED> tag.
- Added a preferences dialog and moved the "Beep When Download Is Finished" and "Remember Window Positions" options from the Options Settings dialog to the preferences dialog. Preferences differ from settings in that they are automatically saved upon program exit.
- Added the option to choose between modal and modeless settings dialogs to the preferences. In modal mode, only one settings dialog can be visible on the screen at any one time, and each settings dialog has a "Cancel" and an "OK" button to dismiss it.
- Created a preferences panel to handle proxy parameters. Previously these parameters were handled via a settings dialog, which caused them not to be automatically saved at the end of a session.
- Added support for proxy servers needing authentication.
- Completely redesigned and reimplemented the user interface.
- Slightly optimized the memory usage of the internal dictionary which records already encountered pages.
- Added the dash to the list of characters considered "alphanumeric" for the purpose of converting local filenames to alphanumeric ones. Also deactivated the "Use alphanumeric local filenames" in the factory settings.
Version 2.2.2 for Macintosh (2nd April, 2000)
* FEATURE ENHANCEMENTS:
- Added limited support for Macintosh filetypes and creators.
Version 2.2.1 (12th March, 2000)
* BUG FIXES:
- Added compatibility for Internet Explorer to the auto-generated top level
index page. Before, the referenced page would not open automatically when
the index file was opened with IE.
- Worked around a bug in JRE for Solaris/CDE, which causes non resizable modal
dialogs to be displayed offscreen. Now, these dialogs are made resizable to
prevent the bug from being triggered.
- Fixed a bug in the top level "index.html" creation routine which was responsible
for an incorrect relative URL inside that page.
- Corrected a problem with the "Show MRJ Version" Applescript in the Mac version,
which prevented the script from recognizing MRJ on international (non US)
systems.
- Compiled a new version of the 2.2 Manual PDF file, as the one included with
version 2.2 was somewhat broken (some pictures were missing).
Version 2.2 (2nd December, 1999)
* BUG FIXES:
- Fixed a bug which prevented PageSucker to start up. It happened when the maximum number of threads had been set to a value which was lower than the default value (10 in a registered version, 3 in a demo version). When the default settings were then saved, PageSucker would refuse to start up the next time it was launched.
- When opening a connection to a Web server, there will be no more obscure error messages if the Web server happens to be misconfigured so that it doesn't return a valid status line. Previously such connection attempts failed, now the connection is silently supposed to be valid.
- Corrected a long standing bug which caused downloaded pages to become corrupted when they contained characters with the code 255. This is the case for certain pages using a non roman character set, like for example cyrillic pages.
- Disabled the pulldown menus until the application has fully started up. This prevents the user (on a slow machine especially) from selecting a menu command before the application has been completely initialized.
- Worked around a bug in JRE (Windows) which prevented menu shortcuts from working.
- By default, preferences, settings and registration files are saved in the current user's home directory. In previous releases, an error was displayed if that directory happened not to be accessible for some reason. Now, PageSucker falls back to its own directory in that case.
- Worked around a bug in JRE 1.2 for Windows, which caused the main PageSucker window to grow in height each time the application was restarted. Now the window size should be remembered correctly across restarts.
* FEATURE ENHANCEMENTS:
- Added support for URLs defined via single quoted strings, as in <A HREF='test.html'>.
- Added a small message window while PageSucker is loading to inform the user that something is going on. Also removed the possibility to produce an error message by choosing "About PageSucker..." (on the Mac) before the main window has appeared.
- A top level "index.html" file is automatically created in the local download directory whenever HTML pages are saved. When opened with a browser, the index file redirects the browser to the real index file, which may be hidden deeply within the downloaded file hierarchy.
- Added support for picture buttons in Web forms, i.e. <INPUT TYPE="image" SRC="..."> tags.
- Added support for long filenames when running on a Linux platform.
Version 2.1 (29th March, 1999)
* BUG FIXES:
- Fixed a bug introduced in version 2.0 which prevented hostnames of URLs with a port number to be recognized correctly.
- Fixed a bug that caused non-numerical Java version numbers (e.g. "1.1.7A") to be misinterpreted. The visible effect of this was a warning dialog displayed by PageSucker at startup claiming that a newer Java version was required.
- When a settings file containing a custom log file spec was created on a Macintosh, then transferred to a Windows system, the log file spec couldn't be changed as the file dialog wouldn't show due to characters in the file spec cosidered illegal by Windows. This has been worked around by reassigning a default log file spec when the one from the settings file contains illegal characters.
- Settings files with a version number higher than the one supported by the current version of PageSucker are now rejected. This wasn't actually a problem at the time being, as only version 1 settings files are currently in use. - Fixed a long standing bug which prevented the saving of local files into a directory the path of which contained non ASCII characters. Previously, a "File Not Found" error would be displayed. Apparently, only Macintosh users were affected by this bug. - URLs containing special encoded characters are now decoded correctly when creating local files. In earlier versions, encoded characters considered illegal by the local filesystem would slip through the filename cleaning process.
- Updated internal release check URL, as the homepage has moved.
- Worked around a bug in JRE for Windows, which could cause system crashes in certain circumstances. This apparently only happened with registered versions of PageSucker. - Updated to be Java 1.2 compatible on Windows.
- Updated to be compatible with MRJ 2.1 on Macintosh.
- Corrected two bugs which prevented local URLs (e.g. "file:///D:/example.html") from being downloaded.
- Corrected a bug related to the string representation of "file:" URLs, which may have caused various problems in specific situations (e.g. when using a RegExp filter in combination with a "file:" URL).
- Corrected registration code algorithm to accept certain codes which were not recognized correctly in the previous release. This problem affected only usernames containing special accented characters.
- Made dialogs non-resizable in the Macintosh version. They had been made resizable in version 2.0 to work around a bug in MRJ 2.0 which has been fixed in MRJ 2.1.
* FEATURE ENHANCEMENTS:
- An option was added to modify filenames on the fly to only contain alphanumeric characters. This allows the copying of the downloaded files to a different platform, which might have a different character set. With this option disabled, special (non ASCII) characters may interfere with such a copy operation, resulting in broken links in the copied page hierarchy. - Added the "Unterminated HTML Comments Are Single Line Comments" option. When enabled, unterminated or incorrectly terminated HTML comments are considered single line comments. This is identical to Netscape 4.5's behavior when faced with this HTML error. A correct comment termination is added at the end of the line in the downloaded page.
- When a URL with a port number (e.g. http://www.example.com:9999) is downloaded, the local directory created will have the port number appended (e.g. www.example.com_9999). In previous versions, only the hostname would be used as the local directory's name.
- Authentication support was added. In the "Authentication Settings" window, a list of domain names can be constructed with the associated login information (username, password).
- Added limited support for the OS/2 platform.
- Added support for FTP proxies.
- Added better window and dialog placement and the option to remember the window placement and size across sessions.
- Added menu shortcuts for some menu commands.
- Added a functional "About" menu item (in the Apple menu on Macintosh, in a new Help menu on Windows). - Added a "Close" menu command in the Macintosh version, which can be used to close the frontmost window.
- When the user tries to quit while a download is in progress, the application pops up a dialog to make sure the user actually intended to quit.
Version 2.0 (10th November, 1998)
* BUG FIXES:
- Corrected minor bug in local file creation routine. Now filenames won't
include query strings that may be present in a URL (when downloading CGI generated
data). This created a problem when viewing the downloaded pages with a Web
browser, as the browser removes the query string when looking for the file.
- Bug Fix: downloaded URLs will now be remembered with their query strings
included, so URLs with different query strings will be considered as different
URLs. They may thus be downloaded more than once. This only makes sense if
the existing file strategy is set to modify the filename, as the query string
is not included in the filename.
- Corrected cosmetic bug in user interface; special characters in the log
file name used to be displayed in encoded form (e.g. %20 for a space). Now
only control characters remain encoded.
- Lots of internal code restructuring, including replacing method calls
deprecated in Java 1.1.
- URLs with unrecogized protocols will now just be ignored silently, i.e.
they won't generate a "Malformed URL" error message as before. Currently,
the following protocols are recognized: http, ftp, file, gopher and javascript.
- Improved unexisting page detection. Some servers respond with a valid
HTML page describing an error condition when asked to provide a page that
does not exist. Previously, PageSucker didn't notice that there was a problem
with the requested page. Now, it should recogize the problem in most of the
cases (unless the server is misconfigured).
- <BASE TARGET="..."> tags are no longer removed from downloaded
pages as was the case in earlier releases. Now, just the HREF="..."
parameter is removed from the <BASE> tag, as it would prevent the downloaded
page from working locally.
- Due to a bug in the underlying Java routines (as of Java 1.1.6), relative
URLs with a fragment containing a colon (ex: "index.html#frag:ment")
would produce a "Malformed URL" error. This bug has been worked
around by rewriting part of the URL parsing routines.
* FEATURE ENHANCEMENTS:
- Added the untyped objects feature. Names having no extension can now
be considered to denote directories, non HTML files or HTML files and will
be handled accordingly.
- Added the maximum parsing threads feature to limit the number of threads
producing new URLs to look at. Also optimized internally so that any objects
not needing to be parsed are downloaded first. This reduces PageSucker's memory
needs.
- Added the option to consider frames to be on the same recursion level
than their host page.
- Added the filters "Parse HTML Pages Outside Of Hierarchy" and
"Parse HTML Pages On Remote Server Up To Recursion Depth x". This
allows PageSucker to jump from one server to another.
- Added support for indirectly linked RealAudio/RealVideo and MPEG layer
3 audio files. These files are linked via small helper files having the extensions
"RAM" resp. "M3U".
- Better progress indicator. A thread status list now shows exactly what
each thread is doing. The progress history list present in the previous release
has been removed.
- Added the option to inscribe the download progress into the log file,
instead of only logging errors or warnings.
- Added support for HTTP proxies.
- Reorganized user interface to allow for more options. There are now separate
windows accessed via pull down menus.
- Added the possibilty to save/restore settings to/from a file, plus a
default settings file loaded at startup.
- The user is now asked to set the page save directory each time when starting
a download. This setting is no longer memorized across downloads.
- Added the option for the user to specify which filetypes should be considered
to be HTML. By default, the following types are recognized "html, htm,
shtml, htmlx".
- Added support for guessing URLs contained in JavaScript code. Currently
supported are JavaScript blocks (<SCRIPT> ... </SCRIPT>), JavaScript
include files (<SCRIPT SRC="...">) and JavaScript event handlers
in tags (<A HREF="..." onClick="...">). PageSucker
also recognizes URLs using the "javascript:" protocol and can scan
<OPTION VALUE="..."> tags for potential URLs.
- Added registration routines. By default, PageSucker is now a demo version
which has some features disabled until a valid registration code is entered.
- Added the "Beep & Show Dialog When Download Is Finished"
option.
- Added the "Check For New Release" feature. The user can now
easily check if a new version of PageSucker has been released, then automatically
have the currently running PageSucker download the new release.
- Added an automatic check at startup to make sure that the correct Java
engine is installed.
Version 1.0.4 (11th February, 1998)
- Corrected a bug introduced in version 1.0.2. When "Don't allow '>' Characters
in HTML Strings" was on and a page containing incorrect HTML code was downloaded,
there was a case where the tag following a corrupt tag would be skipped. Thus,
certain links could get lost.
- PageSucker now recognizes dynamically if it runs on a Windows machine and
disables the 15 pixel window inset in that case, as JRE 1.1.5 doesn't seem
to like it. This feature was introduced in version 1.0.3 and was the reason
why 1.0.3 was never released for Windows systems.
Version 1.0.3 (14th January, 1998)
- Autoresizes window to optimal size when launching.
- Works a little better with Apple's MRJ 2.0 (at least cosmetically).
Version 1.0.2 (10th January, 1998)
- Much smaller application filesize. Previous version was unnecessarily bloated
due to a compiler bug.
- Extended the "Don't allow '>' Characters in HTML Strings" option to correct
more cases of broken HTML code.
- Files ending in ".shtml" are now recognized as HTML files.
Version 1.0.1 (30th September, 1997)