CSE5330 Homework Questions and Answers
CSE 5330/7330
Homework #1
1. [Points 5/5] If Amazon.com kept all of its data (i.e. 42 TB) on Hollerith 80 column punched cards, how long would it take to read all of the data if you used an IBM 2540 card reader which could read 1000 cards per minute?
Hollerith 80 column card can store = 12 rows * 80 column / 8 = 120bytes
Number of Hollerith 80 column cards required to store 42TB data = 42TB/120TB = 384829069722 cards
Number of cards read by IBM 2540 card reader in 1 minute = 1000 cards/minute
Time required to read 384829069722 cards = 384829069722/1000 = 384829069.722minutes = 6413818 hours
2. (Points 20/15] Consider a disk with a sector size of 512 bytes, 1000 tracks per surface, 400 sectors per track, 5 double-sided platters, average seek time of 6 milliseconds. Suppose that a file containing 100,000 records of 200 bytes each is to be stored on such a disk and that no record is allowed to span two sectors. Assume the disk rotates at 10,000 RPM and one track of data can be transferred per rotation.
a. How many records fit into one sector?
Number of records per sector = sector/record size = 512/200 = 2 records/ sector
b. How many sectors are required to store the entire file? If the file is arranged sequentially on disk, how many surfaces are needed?
Sectors required to store the entire file = total records per file / records per sector = 100,000/2 = 50,000 sectors
Number of surface required to store the file = Sectors required to store entire file / sectors per surface
Sectors per surface = 1000 * 400 = 400,000 sectors
Number of surface required to store the file = 50,000/400,000 = 0.125 = 1 surface
c. What is the maximum number of 200 byte records that can be stored using this disk?
Max number of records that can be stored in a disk = total sectors in a disk * records per sector
Total sectors in a disk = total surface * sectors per surface = 5 * 2 * 1000 * 400 = 4000000 sectors
Max number of records that can be stored in a disk = 4000000 sectors * 2 = 8000000 records/disk
d. What is the time required to completely read the entire disk? Assume ideal circumstances.
Time to read the entire disk = seek time + Rotational delay
Disk speed = 10,000 RPM
Data read per rotation = 1 track
Total tracks in a disk = total surface * tracks per surface = 10 * 1000 = 10,000
Number of disk Rotation required to read 10,000 tracks = 10,000 rotations
Time to read the entire disk = 6milli sec + 10,000 rotations/disk speed =
Time to read the entire disk = 6milli sec + 10,000/10,000 min = 6ms + 60,000ms = 60,000ms
3. [Points 10/10] PC Magazine has a list of 10 web hosting companies https://www.pcmag.com/article2/0,2817,2424725,00.asp Consider two companies on the list, and search their detailed information to determine what database systems are available to a web developer. Provide the names of the companies and database(s) they support.
Company |
Database |
HostGator |
MySQL |
Inmotion hosting |
MySQL |
4. [Points 20/15] Suppose we have a sequential ordered file of 200,000 records, where each record is 200 bytes. Assume blocksize = 2048 bytes (10 records per block), seek time = 16 ms, average rotational delay = 8.3 ms, and block transfer time = 0.8 ms. Suppose we want to make X independent random record reads from the file. This could be done in two different approaches.
a. read the entire file once and look for the X records of interest
Total access time for each block = Seek Time + Avg Rotational latency + Block transfer time
= 16ms + 8.3m + 0.8ms = 25.1ms
Total number of Blocks with file records = 200,000/10 = 20,000 blocks
Time to read entire file = Total access time for each block * Total number of Blocks with file records
= 25.1ms * 20,000 = 502,000ms = 502sec = 8.36 minutes
Total time = Time to read entire file + Time to read X records from the buffer
= 502sec + worst case Time complexity to read X records from buffer = 502 secs +X* O(200,000)
b. use a binary search to find a particular record, and repeat this for all X records of interest.
D->Avg time to read/write a disk page
B -> Number of blocks
C -> Average time to process a record.
Assume Avg time to process a record = 1ms.
Find = Binary search time to find one record = - DlogB + ClogB
= 25.1ms log 20,000 + 1ms log 20,000 = 0.0251 * 14.2877 + 1ms * 14.2877
Binary search time to find one record = 0.3586 + 0.01429 = 0.37288 sec
Binary search time to find X records = X * 0.37288 secs.
Linear vs Binary comparison graph
Binary search (option 2) seems to perform better then reading entire file first and doing a linear search(option 1) up to 1300 records after which option 1 is better.
5. [Points 20/15] A file with Part# as the hash key incudes records with the following Part# values: 2368, 3760, 4692, 4871, 5659, 1821, 1074, 7115, 1620, 2428, 3943, 4750, 6975, 4981 and 9208. The file uses 4 buckets, numbered 0 to 3. Each bucket is one disk sector and holds four records and appropriate pointer values. Load these records into the file in the given order, using the hash function h(k) = k mod 4.
a. Draw the diagram showing the final loading of the data into buckets.
b. Calculate the average number of sector accesses for a random retrieval on Part#.
Bucket Number |
key1 |
Key2 |
Key 3 |
Key 4 |
Overflow | |
0 |
2368 |
3760 |
4692 |
1620 |
2428 |
9208 |
1 |
1821 |
4981 | ||||
2 |
1074 |
4750 | ||||
3 |
4871 |
5659 |
7115 |
3943 |
6975 |
3 records out of 15 is in overflow. The 12 records that are not in overflow requires 1 sector access where as 3 overflowed records requires 2 sector access
The average time = 1*{12/15} + 2*{3/15} = 0.8 + 0.4 = 1.2 sector access
6. [Points 15/15] Assume any record can be read <= 10 milliseconds. Consider a database of 14,000,000 records containing information about Texas drivers http://www.statemaster.com/graph/trn_lic_dri_tot_num-transportation-licensed-drivers-total-number , with one record per disk sector. How long would it take to find a record using:
a. binary-search technique
Time required to find record = D log B
D-> Avg time to read/write to a disk
B-> Number of records
= 10ms * log (14,000,000) = 237.389ms
b. sequential search
Time required to find record = ½ * B(D+RC)
B-> Number of records
D-> Avg time to read/write to disk
R->Record per page
C -> Avg time to process a record
= ½ * 14,000,000(10ms + 1* 0) = 80,000 secs
c. B+ tree index assuming 100 keys per node
B+ tree index using 100 keys per node = The depth for the B+ tree is3 levels= hence time required is 30 ms
7. [Points 10/10] Assume you have sufficient space and resources, how long would it take to copy a 1 TB file from an external drive connected to a USB port to your computer? State your assumptions.
This depends on the disk speed and the USB speed as well. Assuming the Avg disk access speed is Xms and USB speed is 15Mbits/sec
= time required = Avg disk access time X + 1TB @ 15Mbits/sec = X+148hous
Graduate Task:
All of the above and:
8. [Points 0/15] Consider a disk with block size = 512 bytes. A block pointer is 6 bytes and a record pointer is 7 bytes long. A file has 30,000 records of fixed length. Each record has the following fields and byte size: Name (30), Ssn (9), Dept (9), Address (40), Phone (10), Bdate (8), Sex (1), Job (4) and Salary (4). An additional byte is used as a deletion marked.
a. Calculate the record size R in bytes
b. Calculate the blocking factor and the number of file blocks needed to store the data, assuming an unspanned organization.
Assume the file is ordered by the key field Ssn. You need to create a B+ tree primary index.
c. Calculate the index blocking factor
d. Calculate the number of levels needed
e. Calculate the total number of blocks needed to store the index.