HOW TO - Cleanup your UPLOAD directory

I've seen this question asked a lot and never seen any answers.  I found a way to reduce the size of my on-site "uploads" folder by almost 30GB.  Thats a lot of recovered file system storage especially if you perform regular backups of your file system.

 

Uploads are mentioned in the developer guide and it was enough to point me in the right direction however there wasn't enough information to complete the task.  

https://support.sugarcrm.com/Documentation/Sugar_Developer/Sugar_Developer_Guide_9.0/Architecture/Uploads/ 

 

Specifically what I was targeting here was "Email Attachments".  Any time an email is sent from Sugar with an attachment, Sugar stores a copy of that file in the uploads directory but renames it to a GUID matching the ID of a note record.  What?!?!  Well if you think about it, it makes sense.  Multiple copies of "MyDoc.pdf" with different content would be impossible for Sugar to identify which is the right "MyDoc.pdf".  They used this mechanism to create a unique ID for all documents in the uploads directory.  They just never gave us an easy way to delete or cull this info.  It just grows and grows.....

 

This process was performed on CentOS 6 on an on-site installation but the process will basically be the same for any other on-site platforms.  There is a query in the developer doc to locate email attachments but it is kind of flawed and just needs a little tweaking.  Here is the query Sugar suggests:

 

SELECT filename, file_mime_type FROM notes
INNER JOIN emails ON notes.parent_type = 'Emails' AND notes.parent_id = emails.id
INNER JOIN emails_text ON emails.id = emails_text.email_id;

That will produce a TON of results.  Some even duplicated.  Slightly change this query's first line
and add another line at the bottom.

SELECT notes.id FROM notes
INNER JOIN emails ON notes.parent_type = 'Emails' AND notes.parent_id = emails.id
INNER JOIN emails_text ON emails.id = emails_text.email_id
WHERE notes.date_entered < '2019-09-01';


What I did here was added the note ID (to find the matching document GUIDs) and set a time frame. In
my case, I wanted to delete all email attachments over 3 months old.  Export this list to CSV which
will give you one column with GUIDS.  Then its a simple formula in Excel.  In Column B:
="/path/to/upload/directory/" & A2  {basically the cell to the left of it}.  Now drag that column down
the whole list, highlight Column B, copy and paste values.  Now delete Column A and you now have a
listing of full path file names to be deleted.  Copy this list to a text file on your Sugar server.

Now for CentOS, RHEL... Lets say I named the file "clearuploads.txt"
sudo xargs rm < clearuploads.txt


Thats it! This will remove the email attachments but leave the note in place. The note record will
still exist in Sugar and it will still have the attachment filename for reference however clicking the
hyperlink just presents a "javascript void(0)" dialog and nothing happens when you click it.

  • I'm not sure why you would want to keep the note and not the attachment, it would seem to me that one would want to keep both or delete both. But perhaps I just don't know your scenario.

    Angel Magana has a nice modification to split the upload directory by date, which makes it a lot cleaner. I have yet to look into this myself but knowing his reputation it should be a really good one! 

    https://github.com/elchele/UploadSplit 

    FrancescaS

  • In our case, 98% of pdf’s come from the quotes module which can be regenerated at any time.  I wanted to leave the note because it’s not really a true “Note” that shows up in a Note subpanel…. It’s just Sugar’s weird mechanism to tie a unique GUID to a locally stored file.  It’s browsable by GUID manually in the browser but not searchable.  If the note is deleted, the file name disappears from the email.  If you leave the note intact, the email will still show that “MyQuote.pdf” was originally attached to the email even though you cant open it.  If the note is deleted, the archived email will make it appear as though there was never any attachment.  This way I get to keep the record of the filename without the added disk space for old files.

    If someone wanted to delete both, they could take the exported note ID’s and delete them that way through a temporary imported SQL table. (quick and dirty, and perhaps lazy) but it’s my preferred method.

    Create a 1 column table myTempTable.  Name the column ‘id’ as a 36 length VARCHAR and import the text file you originally exported.

    UPDATE notes, myTempTable, SET notes.deleted = ‘1’ WHERE notes.id = myTempTable.id

    Then let the prune job yank em out. And delete the temp table.  Lazier than just running DELETE’s but still does the trick.

  • I can see your point, though you would not know if the Quote was modified after it was sent, though perhaps that does not matter.

  • Haha yes. That's where I stick it to my users.  I disable the PDF buttons till a quote it set to "Final"  and then I lock down editing for non admin users!

  • Does anyone know if there is updated guidance on clearing old files from Uploads? (Sugar Sell Advanced v12 (cloud)).

    I think since this post (3yrs ago) SugarCRM now spilt the uploads folder in sub folders.

    I am on 78% File System & 87% of Database storage - so getting a bit anxious.

    Leaving the note record and removing the file attached to it is also preferable as there may be useful text in the note and we can then dump a backup uploads folder in our unlimited google storage, search for the ID in there to find the file... if ever needed...  so leaving some trace that the file has been removed (or relocated) seems like a good idea.

    Are files in Documents still stored in the same manner?

    I wonder if Data Archiver could help - but the uploads folder is not one of the options...  I could delete the document / note / or case attachment (a custom module)...    but leaving some trace for the old stuff is desirable.

    Any pointers would be very welcome...

    Thank you,
    Luke.

  •  this is common request. Several customers face similar issue and ask us to take safe actions which doesn't loose relevant data.

    The most effective action we realized is to identify and remove all emails as well as their attachments which are not associated to the relationship itself, literally (TO and FROM not related to Contact, Account, Lead, Prospect, User).

    The explanation: several customers don't realize that the corporate emails eventually are in charge for personal purpose and/or automatic emails (bounce, third party companies like Google, Apple etc) are stored, eventually with big attachments.

    For one specific customer we were able to reduce around 60% of upload folder and 20% of database size by removing those emails and attachments.

    It becomes a so life savior in such a way that we built a sql and php scripts to identify them all and generate sql scripts to remove the entries and txt files to be loaded with "xargs rm -rf < somefile.txt" in order to remove the files. So you just need to take the backup, identify those data to be removed and file a ticket to Sugar to remove them all accordingly.

    André Lopes
    Lampada Global
    Skype: andre.lampada
  • Hi ,

    Apologies for resurrecting this thread, but if you are still in need of managing your file storage more effectively, I just posted in another conversation about the File Deduplication add-on for our Upsert Deduplicate plug-in. It sounds like it would be a potential fit for what you needed, so please reach out if you'd like to learn more!

    Chris