Good morning. This is Fennec.
In this article, I will give you the specifications of COTONOCA, my personally developed crypto news roundup website.
This article is written for those who are interested in implementing and developing COTONOCA, and possibly developing with me, and for those who are working on Python scraping tools.
If you’ve never programmed before, this will be a difficult talk that you won’t enjoy at all. I’m going to talk about something not fun, so please run away! Please don’t fall down!
Operating Environment
Rental Server
The rental server on which the site was built is ConoHa Wing, which is not a ConoHa VPS, so su privileges are not available. ConoHa is implemented in Cloud Linux and Apache, and can run Python2/3, which is included in the original package.
Software
I’m using Anaconda to use Flask, matplotlib, Requests, Beautiful Soup, etc. Python is also not included in the default ConoHa package, but I’m using the libraries in the Anaconda package. Anaconda was installed after pyenv was installed on the ConoHa server. The installation was done via GitHub because the installer gave an error and could not be installed.
(For developers: This happened to me once when I was using numpy, but I didn’t change the program at all and suddenly I couldn’t run the program because “numpy C-extensions failed”. Reinstalling numpy resolved the error, so if you run into the same situation, try that.)
WSGI Server
The WSGI server embedded in Flask is for development use and is not safe to use in a production environment, so I used the official recommended WSGI server called Waitress. After installing Waitress with the conda command, I only had to change a few lines around the “name == ‘main’:” spell at the end of the Flask program.
When the web page is first accessed, “.htaccess” hits the CGI program, and the CGI sends the get method to the Flask program to display the screen. When the input form is submitted, the CGI sends a POST method, which is received by the Flask program. For this reason, a judgment statement such as “if os.environ[‘REQUEST_METHOD’] == ‘GET’:” is provided in the CGI program to split the GET/POST transmission process.
Periodic Auto-run
In the server management screen of ConoHa Wing, there is a menu called “Job Scheduler” that allows you to easily set up the cron command, and you can use it to specify the execution date, execution library (Anaconda’s Python), and execution file (main program).
(For developers: If you are used to using cron, you may find it easier to set up the GUI without using ConoHa’s GUI, because the settings were immediately reflected in the GUI even if you changed them by CUI operation after SSH connection.)
Operation Indication Function
Operand
Site Select
Check box. Multiple selections are possible.
When the “ALL” checkbox is turned on or off, the other items are immediately turned on or off.
Detect crypto
Radio buttons. Select “Include”, “Irrelevant”, or “Both”.
Detect range
Radio button. You can choose between “Direct only” and “Include indirect”.
You don’t have to use the radio buttons, but I did so because it looks easier to understand.
Category
Select box (pull-down). Five choices: “ALL,” “DeFi,” “NFT,” “Web3,” and “CBDC.
Asset
Text box. You can enter up to 30 characters in both Japanese and English.
Lowercase letters in English will be converted to uppercase.
Apply
button. All settings of the input form are confirmed and reflected in the screen display.
Restore All
button. Restores all settings of the input form to their initial state and reflects them in the screen display.
【TBD】Remove Individual Assets
Button. Multiple buttons can be displayed.
A button with the text string set in the Asset will be displayed only when the Asset (ticker) is set.
When the button is clicked, the Asset setting of the corresponding stock will be canceled and the button will disappear.
※To be supported in the future. Buttons that cannot be pressed in V1.0 will be displayed.
Display Decision
and conditions
All input form settings will be reflected in the screen display in &conditions.
Only when Asset (ticker) is not set, Asset settings are ignored.
or conditions
If you select “Detect range” -> “Direct only”, the stocks detected from the direct terms will be displayed.
When “Include indirect” is selected, stocks detected from indirect terms will also be displayed in addition to stocks detected from direct terms.
Input/Output function
Input/Output
Type
The types of databases and log files are as follows.
Article Database: This is a database of information about news articles that are displayed on the site.
Search Database: A database that contains search terms used to search for assets and categories.
Search Log: A log that stores the content entered in the form. It is used to improve the quality and performance of the site.
Article Log: In addition to the contents of the “Article Database,” this log stores the full text of news articles, which will be used for text mining by AI in the future.
I/O
The timing of the database log file input and output is as follows.
Article database: (Output) When the main program is executed / (Input) When the program is accessed for the first time or the browser is refreshed
Search database: (Output) Not for output / (Input) When main program is executed
Search log: (Output) When searching or refreshing the browser / (Input) Not for input
Article Log: (Output) When the main program is executed / (Input) Not applicable for input
Format
CSV
Article Database
It consists of a parent file “_all_news.csv” and child files for each site name.
The parent file contains the data of all the child files.
The figure above shows the data format for one article (one line).
Search Database
It is divided into three files: “Direct Terms”, “Indirect Terms”, and “Categories”.
Only pre-edited terms are used for searching, so dynamic acquisition of terms is not performed.
The above figure shows the data format for one stock or one category.
LOG
Search Log
The above figure shows the format for a single search.
Article Log
The above figure shows the format for one article.
Graph Drawing Function
Type
Indicates the volume of each stock in the target period.
Pie chart: Displays the top 10 stocks with the most articles in 1 month/6 months/12 months.
[TBD] Line graph: Will be supported in the future.
[TBD] The percentage of stocks in each site’s operating base.
It will be supported in the future.
Operation
Generation Timing
Images are generated when scraping is automatically executed, and when the site is accessed or display settings are changed.
Image Output
Only when scraping is run automatically on a regular basis. The image files output in this case are currently unused. It may or may not be used in the future to make the site more lightweight.
During user operation, only Base64 data is sent and received without outputting images as files.
Auxiliary Function
Both of the following two features are site creation aids for developers, for use in the local environment only.
Both are not integrated into the news site roundup/analysis site, and are managed in separate projects.
Asset Image Acquisition
This is a tool to automatically download images of crypto assets from Coin Market Cap.
It is implemented in Selenium and uses the Clome Driver to retrieve images in batches.
I have confirmed that the tool can retrieve all images of all stocks (about 8,000) at the time of use.
Get Search Words
This is a tool to retrieve the name and ticker of a crypto issue from the Coin Market Cap.
It works with Selenium and the Clome Driver as well as image acquisition.
I have confirmed that all names and tickers can be retrieved here as well.
At the end
If you have any questions about the COTONOCA website, please contact me on Twitter.
You can also contact me through the form on this blog, but please use Twitter as I can reply to you faster there.
Thank you so much for reading all the way through!