FIT3155 DL-Distance Pattern Matching Assignment: Suffix Tree Implementation for Multi-Text Approximate Search

School

Monash University

*We aren't endorsed by this school

Assignment Type

Individual Assignment

Subject

FIT3155: Advanced data structures and algorithms

Uploaded by Malaysia Assignment Help

Date

05/16/2025

DL-distance ≤ 1 multiple pattern matching within a collection of texts

The statement of the task for this assignment is straightforward. Your program will be given a collection of text files and multiple pattern files. You will have to report all occurrences of each pattern that matches, under the DL-distance ≤ 1 threshold, the regions of texts in the collection.1

Note that you are constrained in this assignment to use the suffix tree data structure as your primary/foundational search data structure on which you will be running the DL-distance ≤ 1 searches. Further, to optimize the performance of this task, together with using a suffix tree, you may also employ supplementary data structure(s) or algorithm(s) chosen from this unit or its prerequisite units.

Details about the input text and pattern files:

  1. Each text/pattern file contains a string of characters drawn from the alphabet composed of 7-bit ASCII printable characters (32–126) and ASCII whitespace control codes (8–13, 32). You are free to assume ‘$’ symbol does not appear in text/pattern files.
  2. Each text/pattern file contains at least one character from the above alphabet. It is safe to assume that the longest pattern is at most as long as the shortest text.
  3. Multiple text files can contain identical contents. Further, a larger text file may contain the contents of a smaller text file. The same applies to the pattern files.

Stuck in This Assignment? Deadlines Are Near?

Strictly follow the following specification to address this task:

Program name: a2.py

Argument to your program: Your program/script will accept one argument: filename of a run-configuration file. This run-configuration file will contain all information needed for you to read and run your program: it specifies the list of filenames of the input text files followed by the list of filenames of the patterns. (The filenames themselves can include the full path.) The precise format of the run-configuration file is given below:

<N=number of text files> <M=number of pattern files>
1 <text_filename_1.txt>
2 <text_filename_2.txt>
. ...
. ...
N <text_filename_N.txt>
1 <pattern_filename_1.txt>
2 <pattern_filename_2.txt>
. ...
. ...
M <pattern_filename_M.txt>

Command line usage of your script:
a2.py <run-configuration-fileaname>

Output file name: output a2.txt

Output format:

<pattern number> <text number> <position of occurrence> <DL-distance>
...
...

Further notes about the output:

  1. The pattern number and text number in each line of the output file should follow the numbering given in the run-configuration file (first column).
  2. Pattern numbers and text numbers need not be in a sorted order – you can report them in any order your program identifies them.
  3. The position of occurrence is the 1-based position where the pattern (specified by pattern number) was observed in text (specified by the text number) under a DL-distance of either 0 or 1 (cf. Question 1 of Assignment 1).

Written PDF Report: a2report.pdf

Write a report addressing these questions:

  1. How did you handle multiple texts in your approach?
  2. How did you handle DL-distance = 0 search on multiple patterns in your approach?
  3. How did you handle DL-distance = 1 search on multiple patterns in your approach?
    • Report one subsection for each of the 4 edit operations: (1) substitution, (2)
      transposition, (3) insertion (of a character in a pattern), and (4) deletion (of a
      character in a pattern).
  4. Were changes made to the suffix tree data structure (compared to the version delivered in your lecture)? If, yes, state the changes you had to make and justify them.
  5. Did you use any other supporting data structures or algorithms not already covered in your responses to the above?
  6. What is the worst-case time AND space complexity of your suffix tree construction over multiple files?
  7. What is the worst-case time AND space complexity of your search approach under the DL-distance ≤ 1 threshold?

1Refer to Question 1 of Assignment 1 for the definition of DL-distance.

Get 30% Discount on This Assignment Answer Today!

Command-line Usage Tutorial for Assignments

In one or more of the questions in your assignments, you are required to input arguments to your program from the command line. Python provides the functionality to parse command-line options and arguments in multiple ways. The most common method is using sys.argv.

python programname.py argument1 argument2 argument3
python3 programname.py argument1 argument2 argument3

Example 1:

Consider a python program named program1.py that takes two files myfile1.txt and myfile2.txt as inputs to the program:

import sys

# this function reads a file and return its content
def read_file(file_path: str) -> str:
    f = open(file_path, 'r')
    line = f.readlines()
    f.close()
    return line

if __name__ == '__main__':
    #retrieve the file paths from the commandline arguments
    _, filename1, filename2 = sys.argv
    print("Number of arguments passed : ", len(sys.argv))
    # since we know the program takes two arguments
    print("First argument : ", filename1)
    print("Second argument : ", filename2)
    
    file1content = read_file(filename1)
    print("\nContent of first file : ", file1content)
    
    file2content = read_file(filename2)
    print("\nContent of second file : ", file2content)

Output:

Here we use the terminal to run the python program.

FIT3155 data structure and algorithm

Number of arguments passed : 3
First argument : myfile1.txt
Second argument : myfile2.txt

Content of first file : [‘This is content from file1\n’]

Content of second file : [‘This is content from file2\n’]

Example 2:

Consider a python program named program2.py that takes two files myfile1.txt and myfile2.txt as inputs to the program. Here we input the absolute paths to the two files in the command line.

import sys

# this function reads a file and return its content
def read_file(file_path: str) -> str:
    f = open(file_path, 'r')
    line = f.readlines()
    f.close()
    return line

if __name__ == '__main__':
    print("Number of arguments passed : ", len(sys.argv))
    # this is the program name
    print("Oth argument : ", sys.argv[0])
    # first argument/file path
    print("First argument : ", sys.argv[1])
    # second argument/file path
    print("Second argument : ", sys.argv[2])
    
    print("\nContent of first file : ", read_file(sys.argv[1]))
    print("\nContent of second file : ", read_file(sys.argv[2]))

Output:

Here we use the terminal to run the python program.

FIT3155 data structure and algorithm

Number of arguments passed : 3
0th argument : program2.py
First argument : /path/to/myfile1.txt
Second argument : /path/to/myfile2.txt

Content of first file : [‘This is content from file1\n’]

Content of second file : [‘This is content from file2\n’]

Get Solved Your Assignment( variable) and Earn A+ Grade!

Using PyCharm:

Select Edit configuration from the run tab in the menu list.

FIT3155 data structure and algorithm Pycharm

Type the file paths or arguments to your program in the text field next to parameters. Then press OK button.

FIT3155 data structure and algorithm

Run your program.

FIT3155 data structure and algorithm Pycharm

Output of the program1.py is shown below.FIT3155 data structure and algorithm Pycharm

The same procedure can be carried out to run the program2.py.

Stuck in This Assignment? Deadlines Are Near?

Get Help By Expert

Prepare your DL-Distance Pattern Matching Assignment report with the help of an AI-free assignment helper for Malaysia. You are assured that our report-writing service will make you productive and help you achieve high grades in your academic year. There are specialized professionals for all categories of assignments who offer you plagiarism-free and superior content. So contact us today and get your top-quality report!

Answer

UP TO 15 % DISCOUNT

Instant Paper Writing Services by Native Malaysia Writers

Plagiarism Free Solutions
100% Original Work
24*7 Online Assistance
Native PhD Experts
Hire a Writer Now
Convincing Features
Plagiarism Free Report
On-Time Delivery
Native Writers
A+ Quality
100% Confidential
24*7 Online Assistance

Get these features included in Your Assignment

Get Assistance for Assignments, online Exam, and Projects Writing