Ans: Teradata is A RDBMS(Relational Database Management system) with parallel processing. The parallel processing helps Teradata to rule the data warehousing industry. In Teradata, we have index instead of key based on which the records are distributed among the AMP's. Teradata has utilities such as Bteq, Fastload, Fast export, MLoad, TPump, TPT which are used to export and load data to/from the database to files. Because of its parallel architecture, the data distribution and retrieval is faster than any other database management system.
Ans: Aster Analytics was a product from Aster Data Systems, a company founded in 2005 and acquired by Teradata in 2011. Aster Analytics has three engines - SQL, Graph, and MapReduce - and focuses on analytics as opposed to OLTP. Teradata renamed the offering as Teradata Aster after the acquisition. Teradata acquired Aster Data Systems to buy its way into the NoSQL section of the Big Data world. Teradata has always been doing big data, but traditionally with SQL RDBMS-based OLTP and OLAP systems
Ans:
Ans: For Decision support systems. Let me give you some examples and experiences. I had a chance to rewrite a 2 Jobs, which used a lot of joins and the final table is closed to 130 million, with Oracle and SAS code these jobs used to run for around 18 hours and 12 hours respectively. Rewrite of this job in Teradata made the job run less than 30 minutes and 15 minutes respectively. This just shows the power of Teradata when you have large volumes of data. I have been also part of the team which migrated from Oracle to Teradata. Everyone in the company is very happy with the processing power of Teradata. Teradata is very good for OLAP but may not be that beneficial for OLTP especially due to its cost and architecture.
Ans: Teradata is made up of the following components –
Ans: In Teradata, we Generate Sequence by making use of Identity Column
Ans: All you have to do is use CSUM.
Ans: The most suggestible utility here has to be Trump. By making use of packet size decreasing or increasing, the traffic can be easily handled.
Ans: There are basically two ways of restarting in this case.
Ans: Some of the ETL tools which are commonly used in Teradata are DataStage, Informatica, SSIS, etc.
Inclined to build a profession as Teradata Developer? Then here is the blog post on, explore Teradata Training
Ans: Some of the advantages that ETL tools have over TD are: –
Ans: Caching is considered as an added advantage of using Teradata as it primarily works with the source which stays in the same order i.e. does not change on a frequent basis. At times, Cache is usually shared amongst applications.
Ans: Just give the command.SHOW VERSION.
Ans: The index sub-table row happens to be on the same Amp in the same way as the data row in NUSI. Thus, each Amp is operated separately and in a parallel manner.
Ans: The script has to be submitted manually so that it can easily load the data from the checkpoint that comes last.
Ans: The process is basically carried out from the last known checkpoint, and once the data has been carried out after the execution of the MLOAD script, the server is restarted.
Ans: A node basically is termed as an assortment of components of hardware and software. Usually, a server is referred to as a node.
Ans: We need to use BTEQ Utility in order to do this task. Skip 20, as well as Repeat 60, will be used in the script.
Ans: PDE basically stands for Parallel Data Extension. PDE basically happens to be an interface layer of software present above the operating system and gives the database a chance to operate in a parallel milieu.
Ans: TPD basically stands for Trusted Parallel Database, and it basically works under PDE. Teradata happens to be a database that primarily works under PDE. This is the reason why Teradata is usually referred to as Trusted Parallel or Pure Parallel database.
Ans: A channel driver is a software that acts as a medium of communication between PEs and all the applications that are running on channels that are attached to the clients.
Ans: Just like channel driver, Teradata Gateway acts as a medium of communication between the Parse Engine and applications that are attached to network clients. Only one Gateway is assigned per node.
Ans: Virtual Disk is basically a compilation of a whole array of cylinders which are physical disks. It is sometimes referred to as disk Array.
Ans: Amp basically stands for Access Module Processor and happens to be a processor working virtually and is basically used for managing a single portion of the database. This particular portion of the database cannot be shared by any other Amp. Thus, this form of architecture is commonly referred to as shared-nothing architecture.
Ans: Amp basically consists of a Database Manager Subsystem and is capable of performing the operations mentioned below.
Ans: PE happens to be a kind Vproc. Its primary function is to take SQL requests and deliver responses in SQL. It consists of a wide array of software components that are used to break SQL into various steps and then send those steps to AMPs.
Ans: Parsing is a process concerned with the analysis of symbols of string that are either in computer language or in natural language.
Ans: A Parser: –
Ans: The dispatcher takes a whole collection of requests and then keeps them stored in a queue. The same queue is being kept throughout the process in order to deliver multiple sets of responses.
Ans: PE can handle a total of 120 sessions at a particular point in time.
Ans: BYNET basically serves as a medium of communication between the components. It is primarily responsible for sending messages and also responsible for performing merging, as well as sorting operations.
Ans: A Clique is basically known to be an assortment of nodes that are being shared amongst common disk drives. The presence of Clique is immensely important since it helps in avoiding node failures.
Ans: Whenever there is a downfall in the performance level of a node, all the corresponding Vprocs immediately migrate to a new node from the fail node in order to get all the data back from common drives.
Ans: There are basically four types of LOCKS that fall under Teradata. These are: –
Ans:
Ans: Only one AMP is actively involved in a Primary Index.
Ans: UPSERT basically stands for Update Else Insert. This option is available only in Teradata.
Ans: PPI is basically used for Range-based or Category-based data storage purposes. When it comes to Range queries, there is no need of Full table scan utilization as it straightaway moves to the consequent partition thus skipping all the other partitions.
Ans:
A: Table1 has an index on NUMBER(10) and Table2 on NUMBER(22).
B: Table1 has an index on NUMBER(8) and Table2 on INTEGER.
C: Table1 has index on NUMBER and Table2 on NUMBER(15,2).
Hash join (no data redistribution) will occur only for cases A and C.
Ans: Collect stats is an important concept in Teradata, collect stats gives PE to come up with a plan with the least cost for a requested query. Collect stats defines the confidence level of PE in estimating "how many rows it is going to access? how many unique values does a table have, null values etc, and all this info is stored in the data dictionary. Once you submit a query in TD the parsing engine checks if the stats are available for the requested table if it has collected stats earlier PE generates a plan with "high confidence". in absence of collect stats plan will be with "low confidence". however Teradata's optimizer is very robust and intelligent, even if you do not collect stats on a table, column, indexes PE does a "Dynamic Amp sampling " which means it will select a random amp and this random amp comes up with the info about the table data it has, based upon this PE ( knows data demographics & available system components) estimates the workload and generates a plan.
Ans:
Ans: If you are absolutely looking for the differences then below are a few -
PRIMARY KEY PRIMARY INDEX
Ans: We have a rule i.e if a query takes more than one terabyte of spool we are supposed to abort it. My question is let's say the total spool is used by a query, what is the expected behavior of the system, will the system restart, or what can happen? The next question is related to the 1st line; If we have around 10 terabytes of spool , is this logical to abort the query that has just crossed 1tb of the spool. I think we should allow it more spool that can be up to 9tb or so if there are no other sessions. Please provide your analysis on the above cases,
Ans: Performance tuning in Teradata fundamentally done to distinguish every one of the bottlenecks and afterward settle them.
Ans: Actually, the bottleneck isn't a type of Error, yet it surely causes a specific measure of deferral in the framework..
Ans: Teradata skewing can be considered one of the worst problems on any Teradata system. A high skew factor means in effect, the parallelism of the system is degraded leading to: Poor CPU parallel efficiency on full table scans and bulk inserts. The AMP holding the most records of the many values will be the bottleneck, forcing all other AMPs to wait. Increased IO for updates and inserts of biased values, considering the extra workload for the AMP with a high number of multiple rows for the same NUPI value. The cause of Teradata skewing hides in many places. We will show you with this article 3 ways to discover your Teradata skewing problems.
You liked the article?
Like: 0
Vote for difficulty
Current difficulty (Avg): Medium
TekSlate is the best online training provider in delivering world-class IT skills to individuals and corporates from all parts of the globe. We are proven experts in accumulating every need of an IT skills upgrade aspirant and have delivered excellent services. We aim to bring you all the essentials to learn and master new technologies in the market with our articles, blogs, and videos. Build your career success with us, enhancing most in-demand skills in the market.