KWIC Implemented with Object-Oriented Architectural
Style (Assignment 2)
Table of Contents
1.1 Introduction (1)
1.1.1Definition of the Problem (1)
1.1.2Example (2)
1.1.3Assignment 2 (2)
1.2 The Existing System (3)
1.2.1Architecture (3)
1.2.2Properties of the Existing KWIC System (4)
1.2.3Source Code and Documentation (4)
1.2.4Test Data (4)
1.3 Static Class Diagram (5)
1.3.1What is a Static Class Diagram? (5)
1.3.2Static Class Diagram of the Existing System (5)
1.4 Dynamic Sequence Diagram (8)
1.4.1What is a Dynamic Sequence Diagram? (8)
1.4.2Dynamic Sequence Diagram of the Existing System (8)
1.5 Your Assignment (10)
1.5.1Programming Assignment (10)
1.5.2Understanding Check (10)
1.6 Modification of the Existing Data Representation (11)
1.7 Modification of the Existing Processing Algorithm (13)
1.8 Questions (14)
1.1Introduction
1.1.1Definition of the Problem
The Key Word In Context (KWIC) problem is defined as follows:
The KWIC index system accepts an ordered set of lines; each line is
an ordered set of words, and each word is an ordered set of
characters. Any line may be circularly shifted by repeatedly
removing the first word and appending it at the end of the line. The
KWIC index system outputs a listing of all circular shifts of all lines in
alphabetical order.
1.1.2Example
Consider the following set of lines:
?Star Wars
?The Empire Strikes Back
?The Return of the Jedi
The KWIC index system produces the following output (comparison is
case-sensitive, that is capital letters are greater than small letters): ?Back The Empire Strikes
?Empire Strikes Back The
?Jedi The Return of the
?Return of the Jedi The
?Star Wars
?Strikes Back The Empire
?The Empire Strikes Back
?The Return of the Jedi
?Wars Star
?of the Jedi The Return
?the Jedi The Return of
1.1.3Assignment 2
To accomplish the Assignment 2, you need first to get acquainted with
the existing KWIC system. After that you need to extend the
functionality of the existing system to meet the following
requirements. Finally, you need to answer to these three questions.
1.2The Existing System
1.2.1Architecture
The solution with object-oriented architectural style decomposes the
system into six modules:
?LineStorage module, responsible for holding all characters
from all words and lines.
?Input module, responsible for reading the data from a file and storing it in LineStorage module.
?CircularShifter module, responsible for producing circular
shifts of lines stored in LineStorage module.
?Alphabetizer module, responsible for sorting circular shifts
alphabetically.
?Output module, responsible for prinitng the sorted shifts.
?Master control module, responsible for controling the
sequence of method invokations in other modules.
Contrary to the main/subroutine solution, in the object-oriented solution the data is not shared between the computational components. Instead,
each module provides an interface that permits other components to
access data only by invoking methods in that interface. For example,
LineStorage module provides the public interface that allows other
modules to set character in a particular word in a particular line, read a
specific character, read, set or delete a particular word in a specific line,
read whole line at once, etc.
1.2.2Properties of the Existing KWIC System
Since the existing KWIC system is an object-oriented system, it can be
easily described by:
?Class Diagram, depicting the result of static analysis and
design of the system.
?Sequence Diagram, depicting the dynamics of the system.
1.2.3Source Code and Documentation
Here is the source code in Java for the existing KWIC system, and the
automatically generated javadoc documentation for the system.
1.2.4Test Data
Here is a sample input file for the KWIC system. Please, save the file
on your local drive. Specify the name of the file at the command line as an argumnet to the KWIC program.
1.3Static Class Diagram
1.3.1What is a Static Class Diagram?
Static class diagram is the result of static analysis and design of the
KWIC problem and proposed object-oriented architectural solution of the problem. Such diagram includes the following information:
?It describes how modules from the original architectural
solution maps onto classes from an object-oriented
programming language (Java in our case).
?It depicts the public interface of each identified class. (Since
we have an object-oriented system here, each object of a
class from the system encapsulates its data. That means that
all information contained in an object of a particular class is
hidden from the outside world, i.e. other objects in the
system can not access it directly. To be still able to
manipulate the data of an object, its class provides a public
interface, i.e., a number of public methods, which are called
by other objects.)
?It depicts instance variables (attributes) of classes, i.e., data representation from each class is shown in the diagram.
?Finally, the diagram shows how classes relate to each other.
Class relations are shown by means of so-called associations.
1.3.2Static Class Diagram of the Existing System
Here is the UML (Unified Modeling Language) class diagram of the
current system.
Here is a short description of the diagram:
Each module from the architectural design maps exactly onto one class. Thus, there exist KWIC class (Master Control
module), Input class, Output class, LineStorage class,
CicrularShifter class and Alphabetizer class. Each
class is represented as a rectangle box, having three
horizontally arranged sub-boxes. The name of each class is
written in the top sub-box.
?Public interface of each class is written into the bottom sub-box, as a list of method declarations. The plus (+) sign in
front of a method name means that the method is defined as public and may be invoked by objects of other classes. The argument list is written within parenthesis. Each argument is represented with the name, its type and the direction in
which the argument is passed. If there is no direction
modifier that means that this method takes this argument,
and if there is out modifier that means that the method
returns this argument as a return value to the calling object.
For example, +getChar(out c:char, position:int, word:int, line:int) method from the LineStorage
class takes as arguments three integere values: position of the character, index of the word, and index of the line, and returns the value of that character.
?Instance variables (attributes) of a class are shown in the middle sub-box. The minus (-) sign in front of an instance
variable means that the variable is defined as private and
may not be manipulated by other objects directly. Instead,
the public interface of the class should be applied to do so.
That is completely in accordance with object-oriented
principles of data encapsulation and information hiding,
discussed above. After the minus sign, the name and the type of a particular instance variable are shown. For example, the LineStorage class has one private instance variable:
lines_. This instance variable is of type ArrayList, and holds a list of lines.
?Finally, we have a number of associations between some of the classes. An association means that associated classes
have as instance variables the objects of the associated class.
For example, the Alphabetizer class has an association
with the CircularShift class. That means that an object of the Alphabetizer class has as an instance variable an object of the CircularShift class. This variable is also
depicted in the middle box of the Alphabetizer class. We
can spot there shifter_ variable, which is of type
CircularShifter. The numbering at each association end
means that exactly one object of the Alphabetiter class
holds exactly one object of the CircularShifter class.
1.4Dynamic Sequence Diagram
1.4.1What is a Dynamic Sequence Diagram?
Dynamic (interaction) diagrams describe how group of objects
collaborate to execute a certain task in a running program. In other
words these diagrams show graphically the processing algorithm and its
dynamic behaviour in a running system. Sequence diagram is a special
type of dynamic diagrams that describes the time sequence of method
invokations on the run-time in the following way:
?Objects are shown as boxes at the top of the diagram.
?Time is represented with the so-called object's lifeline -
vertical line beneath the object's box. The time increases
downward along the vertical object's lifeline.
?Method invokation is simbolized through a horizontal line
connecting two vertical object's lifelines. The name of the
method is written above such horizontal line. Obviously,
horizontal lines which are further downwardds on an object's
lifeline are invoked later in time than methods represented as
horizontal lines nearer to the top of the object's lifeline.
?Conditions for method invokation and iterative method
invokations may be simbolized with special signs. For
example, iterative method invokation is depicted with the *
sign.
1.4.2Dynamic Sequence Diagram of the Existing System
Here is the UML (Unified Modeling Language) sequence diagram of the
current system.
Here is a short description of the diagram:
1.An object of the KWIC class controls the whole execution of
the program. Firstly, the KWIC object invokes the parse
method of an object of the Input class. This method takes
as arguments the name of the file containing lines, and an
object of the LineStorage class to store the data.
2.The parse method of the Input object reads the data from
the file. For each line in that file it creates a new line in the
LineStorage object by invoking the addEmptyLine
method on that object. After a new line has been created the
Input object parses the line and adds all words (by means
of addWord), one after another into the new line. The
iterative property of this process is simbolized in the diagram
with * signs in the front of the method names. After all words
and lines has been processed, the parse method returns.
3.The KWIC object invokes the setup method on an object of
the CircularShifter class. This method takes as an
argument the LineStorage object holding the lines and
produces circular shifts of all lines at once. During the
execution of the setup method, the CircularShifter
object iterates through all lines in the LineStorage object,
simbolized in the diagram with * sign in the front of the
getLine method of the LineStorage object.
4.After circular shifts have been created, the KWIC object
invokes the alpha method of an object of the
Alphabetizer class. This method takes as an argument the
CircularShifter object and sorts circular shifts
alphabetically. During this process the Alphabetizer object
invokes getLineAsString method of the
CircularShifter class a number of times. Again, this is
simbolized in the diagram with * sign in the front of the
getLineAsString method of the CircularShifter
object.
5.Finally, the KWIC object invokes the print method on an
object of the Output class. This method takes as an
argument the Alphabetizer object from the previous step.
It iterates (* sign) then through all lines from the
Alphabetizer object by invoking getLineAsString on
that object. Notice that this method just forwards the request
to the CircularShifter object to retrieve the data. Of
course, the forwarded request does not include the original
index, but the sorted one instead. When the print method
returns, the execution of the program is finished.
1.5Your Assignment
1.5.1Programming Assignment
You need to modify the existing system in the following way:
?Modification of data representation
?Modification of processing algorithm
1.5.2Understanding Check
Finally, you need to answer to these three questions.
1.6Modification of the Existing Data Representation
In the current version of the LineStorage class we have the following data structure for storing lines:
?An instance variable with the name lines_ of type
ArrayList holds all lines. The ArrayList class
implements all standard dynamic list operations, i.e., we can
add elements at the end of the list, we can insert elements in
an arbitrary position in the list, we can delete elements, etc.
An element stored in an instance of the ArrayList class can
be any Java object. This can be done because instances of
the ArrayList class hold instances of the Object class,
which is the root of the single rooted Java class hierarchy, i.e.,
any Java class is a subclass of the Object class.
?In the current implementation lines_ variable holds objects of the ArrayList class. That is, each line is again
represented through an object of ArrayList class. Inside
this list we store words of that particular line. Words are
stored as instances of the String class.
You need to modify the data structure for storing lines in LineStorage objects. The new data representation should look as follows:
? A new class should be defined, providing the storage for a
single line and a public interface to manipulate the content of
a line. The name of the new class should be Line.
?The Line class stores a line into an object of the ArrayList class. The name of this instance variable should be words_.
Words should be stored as instances of the String class in
the words_ object.
?The LineStorage class keeps lines in the same lines_
object. However, this list is now filled with instances of the
Line class.
The following class diagram depicts the modifications you need to
implement. Note, that just those two classes that are relevant for the
modification are shown. All other classes remain the same.
1.7Modification of the Existing Processing Algorithm
First, you need to implement an interactive version of the KWIC index
system. That means the system won't read lines from a file but lines are inserted interactively by means of a simple command line user interface.
Here is a transcript of a sample session:
?Add, Print, Quit: a
?> Star Wars
?Add, Print, Quit: a
?> The Empire Strikes Back
?Add, Print, Quit: a
?> The Return of the Jedi
?Add, Print, Quit: p
?Back The Empire Strikes
?Empire Strikes Back The
?Jedi The Return of the
?Return of the Jedi The
?Star Wars
?Strikes Back The Empire
?The Empire Strikes Back
?The Return of the Jedi
?Wars Star
?of the Jedi The Return
?the Jedi The Return of
Further, you need to modify how line shifting is performed. In the
current version all lines are read at once, and line shifting is done on all lines at once after they are read. In the new version you need to perform line shifting on each line as it is read from the command line. The
following sequence diagram depicts the new situation.
Here is a short description of the diagram. The KWIC object controls the command line interface. After a command for adding a line has been
issued, the KWIC object invokes readLine method of the Input object.
The Input reads the line and passes it to the CircularShifter
object by invoking its makeShifts method. After shifts have been
made, makeShifts and readLine methods return and the KWIC
object is again in charge, waiting for a new command from the user.
1.8Questions
Please, answer the following questions:
1.Which modules from the original KWIC system did you need
to modify to implement the new data representation of lines
within the LineStorage class? What conclusion can you
draw here? How does the KWIC system implemented by
means of object-oriented architecture withstand design
changes in data representation within a particular module? Is
this true for any other object-oriented system?
2.Which modules from the original KWIC system did you need
to modify to implement a new processing algorithm in the
system? Can we conclude here that the KWIC system with
object-oriented architecture is robust to desing changes in
processing algorithm of the system?
3.Would it be difficult to reuse modules from the existing KWIC
system in other systems? For example, suppose that in
another software system we need to sort a number of lines.
Suppose that the sorting algorithm expects lines to be
represented as arrays of characters. Can you describe a
possible object-oriented solution to reuse the LineStorage class without need to modify the source code of the class.