Last Updated
Viewed 30 Times
           

I need to find a file somewhere in in the file system. I'll know the base folder to start looking for and the file name, but I'm not sure how to find it.

This would be dynamically done as part of the code, just like when using FileUtils or Dir.glob.

In our app we have a table called support_files which stores documents that have been uploaded , which are mostly PDFs.

I'd like to get a unique list of these files, often the same file is uploaded more than once. I thought that a way to do this would be to add a column to the database called "checksum", and then, for each file, calculate the checksum somehow and store it in the column. (This is obviously the slow part).

Once this is done then I can easily filter out duplicates from my table by examining the checksum column.

Can anyone recommend a method to generate this checksum/hash/whatever? Ideally I'd like to generate a hash/checksum that's large enough to guarantee uniqueness, but small enough to fit into a string field in my database.

My server's running on Ubuntu server, and the total number of files I need to checksum is currently around 12,000. For the sake of argument assume it won't grow over 100,000.

A bit of Googling reveals sha1sum, but this may be more suited to telling if a file has been accidentally changed rather than if two files are different?

I have a use case where I need to know the file type of a file to identify and blacklist the executables(exe,installers etc), archive files(zip, rar etc.). Therefore relying on the extension is not enough for me as the extension of a file can be changed but the file property will remain the same. I tried using the linux command:

file --b filename

The above solution is working perfectly with all the file types except the .xlsx and .docx file because the command is giving the following for the .xlsx and .docx

Zip archive data, at least v2.0 to extract

And because of this I end up blacklisting the .xlsx and .docx file as well.

Can anybody suggest me a way to get the file type without using the extension that works for the xlsx and docx as well.

Say you have a Rails 4 app with the following directory structure:

my_app
my_app/spec
my_app/spec/models/foo_spec.rb
my_app/spec/support/utilities.rb
my_app/spec/test_files/bar.txt

I have a function in the my_app/spec/support/utilities.rb file that reads data from my_app/spec/test_files/bar.txt to populate the test database with some records before testing the model.

my_app/spec/support/utilities.rb contains

this_path = File.expand_path(File.dirname(__FILE__))
fname = File.join(this_path, '../test_files/bar.txt')

File.open(fname, 'r').each_line do |line|
  # create entries in Foo from tab delimited data
end

This works for opening my_app/spec/test_files/bar.txt, but I was wondering if there was a better way to specify where the file I want to open is located.

Similar Question 4 (1 solutions) : Rails 4: How to save a file to a server

Similar Question 5 (1 solutions) : No such file or directory in saving a file

Similar Question 6 (2 solutions) : How to remove a specific file in Rails public folder?

Similar Question 7 (1 solutions) : Rails Linking Files from Directory

cc